## Inserting data in MySQL using Python

First let's start with a basic piece of code that fetches the data that we want to insert in the database. For our example, we will get the data about the Citibike stations, using the correspoding API call provided by the Citibike website:

In [19]:
import requests

In [20]:
# Let's get the data from the Citibike API
url = 'http://www.citibikenyc.com/stations/json'
results = requests.get(url).json() 

In [21]:
# We only need a subset of the data in the JSON returned by the Citibike API, so we keep only we need
data = results["stationBeanList"]

In [22]:
len(data)

813

Now we will connect to our MySQL server. We will use the MySQLdb library of Python.

If you do not have the library, you need to install it by typing in the shell:


In [5]:
import MySQLdb as mdb

con = mdb.connect(host = 'localhost', 
                  user = 'root', 
                  passwd = 'dwdstudent2015', 
                  charset='utf8', use_unicode=True);

Once we have connected successfully, we need to create our database:

In [6]:
# Query to create a database
db_name = 'citibike_mysql_test'
create_db_query = "CREATE DATABASE IF NOT EXISTS {db} DEFAULT CHARACTER SET 'utf8'".format(db=db_name)

# Create a database
cursor = con.cursor()
cursor.execute(create_db_query)
cursor.close()

Then we create the table where we will store our data. For our example, we will just import three fields in the database: station_id, station_name, and number_of_docks

In [7]:
cursor = con.cursor()
table_name = 'Docks'
# Create a table
# The {db} and {table} are placeholders for the parameters in the format(....) statement
create_table_query = '''CREATE TABLE IF NOT EXISTS {db}.{table} 
                                (station_id int, 
                                station_name varchar(250), 
                                number_of_docks int,
                                available_docks int,
                                date datetime,
                                PRIMARY KEY(station_id, date)
                                )'''.format(db=db_name, table=table_name)
cursor.execute(create_table_query)
cursor.close()

Finally, we import the data into our table, using the INSERT command. 

In [8]:
from datetime import date, datetime, timedelta

query_template = '''INSERT INTO {db}.{table}(station_id, 
                                            station_name, 
                                            number_of_docks, 
                                            available_docks, 
                                            date) 
                    VALUES (%s, %s, %s, %s, %s)'''.format(db=db_name, table=table_name)
cursor = con.cursor()

# THIS IS PROHIBITED
# query = "INSERT INTO citibike.Docks(station_id, station_name, number_of_docks) VALUES ("+entry["id"]+", "+entry["stationName"]+", "+entry["totalDocks"]+")"
for entry in data:
    dockid = entry["id"]
    addr = entry["stationName"]
    docks = entry["totalDocks"]
    available = entry["availableDocks"]
    # date =  datetime.now()
    # lastcommunicationtime is a string of 
    # the form "2016-02-09 10:16:49 AM"
    # See https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
    # to see the documentation on how to parse 
    date = datetime.strptime(entry["lastCommunicationTime"], 
                             '%Y-%m-%d %I:%M:%S %p')
    print("Inserting station", dockid, "at", addr)
    query_parameters = (dockid, addr, docks, available, date)
    cursor.execute(query_template, query_parameters)

con.commit()
cursor.close()

Inserting station 304 at Broadway & Battery Pl
Inserting station 359 at E 47 St & Park Ave
Inserting station 402 at Broadway & E 22 St
Inserting station 3255 at 8 Ave & W 31 St
Inserting station 3443 at W 52 St & 6 Ave
Inserting station 72 at W 52 St & 11 Ave
Inserting station 79 at Franklin St & W Broadway
Inserting station 82 at St James Pl & Pearl St
Inserting station 83 at Atlantic Ave & Fort Greene Pl
Inserting station 119 at Park Ave & St Edwards St
Inserting station 120 at Lexington Ave & Classon Ave
Inserting station 127 at Barrow St & Hudson St
Inserting station 128 at MacDougal St & Prince St
Inserting station 143 at Clinton St & Joralemon St
Inserting station 144 at Nassau St & Navy St
Inserting station 146 at Hudson St & Reade St
Inserting station 150 at E 2 St & Avenue C
Inserting station 151 at Cleveland Pl & Spring St
Inserting station 157 at Henry St & Atlantic Ave
Inserting station 161 at LaGuardia Pl & W 3 St
Inserting station 164 at E 47 St & 2 Ave
Inserting station 

Inserting station 3203 at Hamilton Park
Inserting station 3205 at JC Medical Center
Inserting station 3206 at Hilltop
Inserting station 3207 at Oakland Ave
Inserting station 3209 at Brunswick St
Inserting station 3210 at Pershing Field
Inserting station 3211 at Newark Ave
Inserting station 3212 at Christ Hospital
Inserting station 3213 at Van Vorst Park
Inserting station 3214 at Essex Light Rail
Inserting station 3220 at 5 Corners Library
Inserting station 3221 at 47 Ave & 31 St
Inserting station 3223 at E 55 St & 3 Ave
Inserting station 3225 at Baldwin at Montgomery
Inserting station 3226 at W 82 St & Central Park West
Inserting station 3231 at E 67 St & Park Ave
Inserting station 3232 at Bond St & Fulton St
Inserting station 3233 at E 48 St & 5 Ave
Inserting station 3235 at E 41 St & Madison Ave
Inserting station 3236 at W 42 St & Dyer Ave
Inserting station 3241 at Monroe St & Tompkins Ave
Inserting station 3242 at Schermerhorn St & Court St
Inserting station 3243 at E 58 St & 1 Ave


Inserting station 3691 at 28 Ave & 44 St
Inserting station 3692 at 5 St & Market St
Inserting station 3693 at N 11 St & Kent Ave
Inserting station 3694 at Jackson Square
Inserting station 3697 at W 64 St & Thelonious Monk Circle
Inserting station 3699 at W 50 St & 9 Ave
Inserting station 3701 at Cliff St & Fulton St
Inserting station 3704 at 47 Ave & Skillman Ave
Inserting station 3707 at Lexington Ave & E 26 St
Inserting station 3708 at W 13 St & 5 Ave
Inserting station 3709 at W 15 St & 6 Ave
Inserting station 3711 at E 13 St & Avenue A
Inserting station 3712 at W 35 St & Dyer Ave
Inserting station 3714 at Division Av & Hooper St
Inserting station 3715 at Driggs Ave & N 9 St
Inserting station 3716 at 40 Ave & Crescent St
Inserting station 3718 at E 11 St & Avenue B


In [9]:
cur = con.cursor(mdb.cursors.DictCursor)
cur.execute("SELECT * FROM {db}.{table}".format(db=db_name, table=table_name))
rows = cur.fetchall()
cur.close()

In [10]:
for row in rows:
    print("Station ID:", row["station_id"])
    print("Station Name:", row["station_name"])
    print("Number of Docks:", row["number_of_docks"])
    print("Available Docks:", row["available_docks"])
    print("Last Communication:", row["date"])
    print("=============================================")
    


Station ID: 72
Station Name: W 52 St & 11 Ave
Number of Docks: 39
Available Docks: 17
Last Communication: 2018-12-04 16:37:50
Station ID: 79
Station Name: Franklin St & W Broadway
Number of Docks: 33
Available Docks: 3
Last Communication: 2018-12-04 16:40:41
Station ID: 82
Station Name: St James Pl & Pearl St
Number of Docks: 27
Available Docks: 1
Last Communication: 2018-12-04 16:40:38
Station ID: 83
Station Name: Atlantic Ave & Fort Greene Pl
Number of Docks: 0
Available Docks: 0
Last Communication: 2018-10-09 09:05:09
Station ID: 119
Station Name: Park Ave & St Edwards St
Number of Docks: 19
Available Docks: 7
Last Communication: 2018-12-04 16:38:17
Station ID: 120
Station Name: Lexington Ave & Classon Ave
Number of Docks: 19
Available Docks: 12
Last Communication: 2018-12-04 16:37:12
Station ID: 127
Station Name: Barrow St & Hudson St
Number of Docks: 31
Available Docks: 1
Last Communication: 2018-12-04 16:38:52
Station ID: 128
Station Name: MacDougal St & Prince St
Number of Docks

Available Docks: 15
Last Communication: 2018-12-04 16:40:07
Station ID: 459
Station Name: W 20 St & 11 Ave
Number of Docks: 49
Available Docks: 6
Last Communication: 2018-12-04 16:38:46
Station ID: 460
Station Name: S 4 St & Wythe Ave
Number of Docks: 23
Available Docks: 15
Last Communication: 2018-12-04 16:39:44
Station ID: 461
Station Name: E 20 St & 2 Ave
Number of Docks: 39
Available Docks: 28
Last Communication: 2018-12-04 16:40:24
Station ID: 462
Station Name: W 22 St & 10 Ave
Number of Docks: 47
Available Docks: 4
Last Communication: 2018-12-04 16:40:40
Station ID: 465
Station Name: Broadway & W 41 St
Number of Docks: 39
Available Docks: 18
Last Communication: 2018-12-04 16:40:19
Station ID: 466
Station Name: W 25 St & 6 Ave
Number of Docks: 35
Available Docks: 1
Last Communication: 2018-12-04 16:38:03
Station ID: 467
Station Name: Dean St & 4 Ave
Number of Docks: 34
Available Docks: 0
Last Communication: 2018-12-04 16:40:40
Station ID: 468
Station Name: Broadway & W 56 St
Numbe

Station Name: 5 Ave & E 93 St
Number of Docks: 43
Available Docks: 33
Last Communication: 2018-12-04 16:36:49
Station ID: 3293
Station Name: W 92 St & Broadway
Number of Docks: 24
Available Docks: 23
Last Communication: 2018-12-04 16:39:04
Station ID: 3294
Station Name: E 91 St & Park Ave
Number of Docks: 38
Available Docks: 34
Last Communication: 2018-12-04 16:38:50
Station ID: 3295
Station Name: Central Park W & W 96 St
Number of Docks: 59
Available Docks: 56
Last Communication: 2018-12-04 16:38:15
Station ID: 3296
Station Name: E 93 St & 2 Ave
Number of Docks: 42
Available Docks: 9
Last Communication: 2018-12-04 16:39:55
Station ID: 3297
Station Name: 6 St & 7 Ave
Number of Docks: 21
Available Docks: 16
Last Communication: 2018-12-04 16:38:07
Station ID: 3298
Station Name: Warren St & Court St
Number of Docks: 23
Available Docks: 16
Last Communication: 2018-12-04 16:38:31
Station ID: 3299
Station Name: E 98 St & Park Ave
Number of Docks: 25
Available Docks: 15
Last Communication: 20

Last Communication: 2018-12-04 16:39:15
Station ID: 3612
Station Name: 30 Ave & 21 St
Number of Docks: 15
Available Docks: 14
Last Communication: 2018-12-04 16:37:30
Station ID: 3613
Station Name: Center Blvd & 48 Ave
Number of Docks: 27
Available Docks: 11
Last Communication: 2018-12-04 16:38:17
Station ID: 3614
Station Name: Crescent St & 30 Ave
Number of Docks: 23
Available Docks: 22
Last Communication: 2018-12-04 16:38:57
Station ID: 3615
Station Name: 44 Dr & 21 St
Number of Docks: 21
Available Docks: 16
Last Communication: 2018-12-04 16:38:45
Station ID: 3616
Station Name: Steinway St & 28 Ave
Number of Docks: 25
Available Docks: 23
Last Communication: 2018-12-04 16:38:12
Station ID: 3617
Station Name: 28 Ave & 35 St
Number of Docks: 27
Available Docks: 26
Last Communication: 2018-12-04 16:36:52
Station ID: 3618
Station Name: 27 St & Hunter St
Number of Docks: 25
Available Docks: 20
Last Communication: 2018-12-04 16:37:59
Station ID: 3619
Station Name: Newtown Ave & 23 St
Number 

We can, of course, transform the results back into a DataFrame (see below) or we can use the data directly from the rows object (which is a tuple, containing one dictionary object for each line of the results).

In [11]:
import pandas as pd
cur = con.cursor(mdb.cursors.DictCursor)
cur.execute("SELECT * FROM {db}.{table}".format(db=db_name, table=table_name))
rows = cur.fetchall()
cur.close()

In [12]:
df_from_sql = pd.DataFrame(list(rows))
df_from_sql

Unnamed: 0,available_docks,date,number_of_docks,station_id,station_name
0,17,2018-12-04 16:37:50,39,72,W 52 St & 11 Ave
1,3,2018-12-04 16:40:41,33,79,Franklin St & W Broadway
2,1,2018-12-04 16:40:38,27,82,St James Pl & Pearl St
3,0,2018-10-09 09:05:09,0,83,Atlantic Ave & Fort Greene Pl
4,7,2018-12-04 16:38:17,19,119,Park Ave & St Edwards St
5,12,2018-12-04 16:37:12,19,120,Lexington Ave & Classon Ave
6,1,2018-12-04 16:38:52,31,127,Barrow St & Hudson St
7,1,2018-12-04 16:39:36,30,128,MacDougal St & Prince St
8,1,2018-12-04 16:38:23,24,143,Clinton St & Joralemon St
9,2,2018-12-04 16:39:20,19,144,Nassau St & Navy St


In [13]:
# We can then compute functions directly on the dataframe
sum(df_from_sql["number_of_docks"])

24531

In [14]:
# We can then compute functions directly on the dataframe
sum(df_from_sql["available_docks"])

13551

In [15]:
# And we can also create 
df_from_sql["bikes_docked"] = df_from_sql["number_of_docks"] - df_from_sql["available_docks"]

In [16]:
sum(df_from_sql['bikes_docked'])

10980

Finally, let's clean up and close our database connection.

In [17]:
create_db_query = "DROP DATABASE IF EXISTS {db}".format(db=db_name)

# Create a database
cursor = con.cursor()
cursor.execute(create_db_query)
cursor.close()

In [18]:
con.close()