# Introduction

In this notebook, we will show you how to work with a relational database using Python, and a SQL Connector. Connectors are typically provided by the database developer to help programmers write their own programs using that database. They may also be created by third-party developers to help with specific kinds of uses for the database. 

First, let's open our CSV (`Calgary_Public_Library_Locations_and_Hours.csv`) in pandas. 

This CSV is provided courtesy of the City of Calgary, under the Open Government License, and can be found here: https://data.calgary.ca/Recreation-and-Culture/Calgary-Public-Library-Locations-and-Hours/m9y7-ui7j

In [3]:
import pandas as pd

cpl_locations = pd.read_csv("HFA_39_EN.csv")
cpl_locations.head(5)

Unnamed: 0,COUNTRY,COUNTRY_GRP,SEX,YEAR,VALUE
0,ALB,,ALL,2001.0,7.43
1,ALB,,ALL,2008.0,9.82
2,ALB,,ALL,2011.0,12.0
3,AND,,ALL,2003.0,27.81
4,AND,,ALL,2004.0,30.51


# Using Python to create a table

First, we're going to create your table using the [MySQL Connector](https://dev.mysql.com/doc/connector-python/en/).

Always have the documentation for your connector available, as different implementations of database connectors may vary in terms of what methods are available and how they are used.

Although we are running MariaDB on the server, we are using the connector from MySQL on the _client side_ as it is compatible with MariaDB.

In [4]:
import mysql.connector
from mysql.connector import errorcode

# uncomment the lines below and fill in any relevant details that need to be changed here, such as if you set up a different user or password

#with open("password.txt") as f:
 #   passw = f.read()
    
# attempt a connection
myconnection = mysql.connector.connect(user='sean_anselmo', 
                                       password='4i1tawVQFvTUd',
                                       host='datasciencedb.ucalgary.ca', 
                                       database='sean_anselmo',
                                       allow_local_infile=True)
myconnection

<mysql.connector.connection_cext.CMySQLConnection at 0x7f4c4cf6cad0>

In [4]:
# CREATE TABLE STATEMENT
create_statement = '''create table sean_anselmo.library_locations (
    Library varchar(40) NOT NULL,
    Postal_Code varchar(7),
    Square_Feet int, 
    Phone_Number varchar(12),
    Monday_Open time,
    Monday_Close time,
    Tuesday_Open time,
    Tuesday_Close time,
    Wednesday_Open time,
    Wednesday_Close time,
    Thursday_Open time,
    Thursday_Close time,
    Friday_Open time,
    Friday_Close time,
    Saturday_Open time,
    Saturday_Close time,
    Sunday_Open time,
    Sunday_Close time,
    Address  varchar(100)
    );'''

# now we'll create a cursor and run our create statement
create_cursor = myconnection.cursor()
try:
    create_cursor.execute(create_statement)
except mysql.connector.Error as err:
    if err.errno == errorcode.ER_TABLE_EXISTS_ERROR:
        print("Ooops! We already have that table")
    else:
        print(err.msg)
else:
    print("Table created successfully!")

create_cursor.close()

Table created successfully!


True

# Using Python to insert data

Because we have already built a dataframe with data from our CSV, we are going to use this datafrarame by reading each line, and writing each line into its own insert command.

How would you re-write the block of code below to make it more efficient?

In [8]:
insertCursor = myconnection.cursor()

columnString = "`,`".join([str(currentColumn) for currentColumn in cpl_locations.columns.tolist()])
print (columnString)

# inserting rows one by one from the DataFrame is sufficient for now
for i, currentRow in cpl_locations.iterrows():
    print (tuple(currentRow))
    insertCommand = "INSERT INTO `library_locations` (`" + columnString + "`) VALUES (" + "%s,"*(len(currentRow)-1) + "%s)"
    print (insertCommand)
    print(tuple(currentRow))
    insertCursor.execute(insertCommand, tuple(currentRow))
    
myconnection.commit()

insertCursor.close()

Library`,`Postal_Code`,`Square_Feet`,`Phone_Number`,`Monday_Open`,`Monday_Close`,`Tuesday_Open`,`Tuesday_Close`,`Wednesday_Open`,`Wednesday_Close`,`Thursday_Open`,`Thursday_Close`,`Friday_Open`,`Friday_Close`,`Saturday_Open`,`Saturday_Close`,`Sunday_Open`,`Sunday_Close`,`Address
('W.R. Castell Central Library', 'T2G 2M2', 177532, '403-260-2600', '9:00', '20:00', '9:00', '20:00', '9:00', '20:00', '9:00', '20:00', '9:00', '17:00', '10:00', '17:00', '12:00', '17:00', '616 Macleod Tr SE\n(51.0470276, -114.0578995)')
INSERT INTO `library_locations` (`Library`,`Postal_Code`,`Square_Feet`,`Phone_Number`,`Monday_Open`,`Monday_Close`,`Tuesday_Open`,`Tuesday_Close`,`Wednesday_Open`,`Wednesday_Close`,`Thursday_Open`,`Thursday_Close`,`Friday_Open`,`Friday_Close`,`Saturday_Open`,`Saturday_Close`,`Sunday_Open`,`Sunday_Close`,`Address`) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)
('W.R. Castell Central Library', 'T2G 2M2', 177532, '403-260-2600', '9:00', '20:00', '9:00', '20:00'

True

In [9]:
# what if we don't want to loop through every single line and insert?
loadCursor = myconnection.cursor()
loadCursor.execute("LOAD DATA LOCAL INFILE '/Users/sean_anselmo/Calgary_Public_Library_Locations_and_Hours.csv' " "INTO TABLE library_locations "
"FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES")
myconnection.commit()
loadCursor.close()


DatabaseError: 2 (HY000): File '/Users/sean_anselmo/Calgary_Public_Library_Locations_and_Hours.csv' not found (OS errno 2 - No such file or directory)

In [10]:
tpls = [tuple(x) for i, x in cpl_locations.iterrows()]
columnString = "`,`".join([str(currentColumn) for currentColumn in cpl_locations.columns.tolist()])
insertCommand = "INSERT INTO library_locations VALUES (" + "%s,"*18 + "%s)"

cursor = myconnection.cursor()
try:
    cursor.executemany(insertCommand, tpls)
    myconnection.commit()
    print("Data inserted using execute_many() successfully...")
except mysql.connector.Error as err:
    print("Error while inserting to MySQL", err)
    cursor.close()

Data inserted using execute_many() successfully...


## Deleting tables

In [65]:
#Use the DELETE statement without specifying a WHERE clause
# If the statement executes, the table continues to exist (still can insert rows into it), but it's empty. 
# All existing views and authorizations on the table remain intact when using DELETE.

# uncomment if you would like to remove table
deletecursor = myconnection.cursor()
# Warning - if you DROP the table it will be removed completely and it will need to be created from scratch
#sql = "DROP TABLE IF EXISTS library_locations;"
sql = "DELETE FROM library_locations;"
deletecursor.execute(sql)
deletecursor.close()

True

# Using Python to retrieve data

You can also use cursors to read data from a database table. It is helpful to specify what kind of result set you would like the cursor to return. Try each of the following for the second arguement:
* `raw=True`
* `dictionary=True`
* `named_tuple=True`

In [78]:
# try changing the second argument in this method call.
read_cursor = myconnection.cursor(buffered=True, dictionary=True)

query_string = ("SELECT library FROM library_locations;")

read_cursor.execute(query_string)

for (library_value) in read_cursor:
    print(library_value)
    
read_cursor.close()

{'library': 'W.R. Castell Central Library'}
{'library': 'Alexander Calhoun Library'}
{'library': 'Bowness Library'}
{'library': 'Fish Creek Library'}
{'library': 'Forest Lawn Library'}
{'library': 'Glenmore Square Library'}
{'library': 'Louise Riley Library'}
{'library': 'Memorial Park Library'}
{'library': 'Nose Hill Library'}
{'library': 'Shawnessy Library'}
{'library': 'Signal Hill Library'}
{'library': 'Southwood Library'}
{'library': 'Judith Umbach Library'}
{'library': 'Village Square Library'}
{'library': 'Crowfoot Library'}
{'library': 'Country Hills Library'}
{'library': 'Saddletowne Library'}
{'library': 'Westbrook Library'}


True

# Parameters in SQL

_Parameterization_ is an important feature of cursors which let you create (or *prepare*) a general statement which you can "fill in" with specific values later. This is a more efficient way to use a database rather than submit specific queries.

For the MySQL Connector, the %s symbol is usually used to indicate when a parameter should be used. 

_Can you see where in this notebook we have already used this feature in a statement?_

In [11]:
import datetime

size_cursor = myconnection.cursor(buffered=True, dictionary=True)
library_sizes = [0, 10000, 25000, 100000]

query_string = ("SELECT library FROM library_locations WHERE square_feet < %s ;")
for current_size in library_sizes:
    print ("Number of libraries with square feet less than", current_size)
    size_cursor.execute(query_string, (current_size,))
    print (size_cursor.rowcount)

size_cursor.close()

Number of libraries with square feet less than 0
0
Number of libraries with square feet less than 10000
18
Number of libraries with square feet less than 25000
45
Number of libraries with square feet less than 100000
51


True

To do: Retrieve any library name, postal code, square feet, and address where the postal code is equal to T2T 3V8

In [12]:
postal_cursor = myconnection.cursor()

select_query = """
SELECT Library, Postal_Code, Square_Feet, Address
FROM library_locations
WHERE Postal_Code = %s
"""

postal_code = 'T2T 3V8'
postal_cursor.execute(select_query, (postal_code,))

for (Library, Postal_Code, Square_Feet, Address) in postal_cursor:
    print("Library: {}, Postal Code: {}, Square Feet: {}, Address: {}".format(Library, Postal_Code, Square_Feet, Address))

postal_cursor.close()


Library: Alexander Calhoun Library, Postal Code: T2T 3V8, Square Feet: 9256, Address: 3223 14 St SW
(51.0255318, -114.0947876)
Library: Alexander Calhoun Library, Postal Code: T2T 3V8, Square Feet: 9256, Address: 3223 14 St SW
(51.0255318, -114.0947876)
Library: Alexander Calhoun Library, Postal Code: T2T 3V8, Square Feet: 9256, Address: 3223 14 St SW
(51.0255318, -114.0947876)


True

To do: Retrieve the names of libraries that are open at 9:00 AM on Fridays

In [13]:
open_cursor = myconnection.cursor()

select_query = ("SELECT Library FROM library_locations WHERE Friday_Open <= '09:00:00' AND Friday_Close >= '09:00:00'")

open_cursor.execute(select_query)

for (library,) in open_cursor:
    print("The library {} is open at 9:00 AM on Fridays.".format(library))

open_cursor.close()

The library W.R. Castell Central Library is open at 9:00 AM on Fridays.
The library Fish Creek Library is open at 9:00 AM on Fridays.
The library Shawnessy Library is open at 9:00 AM on Fridays.
The library Village Square Library is open at 9:00 AM on Fridays.
The library Crowfoot Library is open at 9:00 AM on Fridays.
The library Westbrook Library is open at 9:00 AM on Fridays.
The library W.R. Castell Central Library is open at 9:00 AM on Fridays.
The library Fish Creek Library is open at 9:00 AM on Fridays.
The library Shawnessy Library is open at 9:00 AM on Fridays.
The library Village Square Library is open at 9:00 AM on Fridays.
The library Crowfoot Library is open at 9:00 AM on Fridays.
The library Westbrook Library is open at 9:00 AM on Fridays.
The library W.R. Castell Central Library is open at 9:00 AM on Fridays.
The library Fish Creek Library is open at 9:00 AM on Fridays.
The library Shawnessy Library is open at 9:00 AM on Fridays.
The library Village Square Library is ope

True

The address column isn't that clean, it has both a address string and a longitude and latitude. Can we extract this info?

In [82]:
address_cursor = myconnection.cursor()

select_query = "SELECT Address FROM library_locations"
address_cursor.execute(select_query)

for (address,) in address_cursor:
    if address and '(' in address and ')' in address:
        address_parts = address.split('(')
        address = address_parts[0].strip()
        lat_lon = address_parts[1].replace(')', '').strip()
        latitude, longitude = map(float, lat_lon.split(','))
        print("Address: {}, Latitude: {}, Longitude: {}".format(address, latitude, longitude))

address_cursor.close()

Address: 616 Macleod Tr SE, Latitude: 51.0470276, Longitude: -114.0578995
Address: 3223 14 St SW, Latitude: 51.0255318, Longitude: -114.0947876
Address: 7930 Bowness Rd NW, Latitude: 51.0872841, Longitude: -114.1830978
Address: 11161 Bonaventure Dr SE, Latitude: 50.9516296, Longitude: -114.0603409
Address: 4807 8 Av SE, Latitude: 51.045105, Longitude: -113.9652023
Address: 7740 18 St SE, Latitude: 50.9835968, Longitude: -114.0141449
Address: 1904 14 Av NW, Latitude: 51.0652428, Longitude: -114.1038132
Address: 1221 2 St SW, Latitude: 51.0412674, Longitude: -114.0683823
Address: 1530 Northmount Dr NW, Latitude: 51.0960922, Longitude: -114.1391296
Address: 333 Shawville Bv SE, Latitude: 50.8986015, Longitude: -114.062851
Address: 5994 Signal Hill Ce SW, Latitude: 51.0181122, Longitude: -114.1756439
Address: 924 Southland Dr SW, Latitude: 50.963623, Longitude: -114.0860367
Address: 6617 Centre St N, Latitude: 51.1121979, Longitude: -114.0633087
Address: 2623 56 St NE, Latitude: 51.0753708

True

In [14]:
# CLEANUP: always remember to release the resources you have used on the server. Always run this cell last!
myconnection.close()