## MySQL 
In this notebook, we created a connection to our database (ZomatoDB), and also created and populated the different tables.
This database is designed to store information about the restaurants like location, cuisine types, ratings, and the evaluation.
First, we created a "main_table" to store all the data that we cleaned in the Exploratory_Data_Analysis notebook.
Second, we created "restaurants", "cuisines", "location", "evaluation", and "ratings" tables.
Third, we migrated the original dataset "cleaned_zomato.csv" to "main_table"
Finally, we populated our tables from the data in "main_table", and added constraints about primary keys, and foreign keys.

In [20]:
import pymysql

In [2]:
import pymysql.cursors
# Connect to the database that we ran in our localhost
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='34006503'
                            )

#From our connection we need a cursor, which acts as our interface into the database
cur = connection.cursor()

In [3]:
# We can verity we are connected:
print(connection)

<pymysql.connections.Connection object at 0x7fad6553bb20>


In [29]:
# Create our database callde zomatoDB:

cur.execute("CREATE DATABASE ZomatoDB")

1

In [4]:
# Verify that the database was created
cur.execute('SHOW DATABASES')
for db in cur:
    print(db)

('information_schema',)
('mysql',)
('performance_schema',)
('sys',)
('testdb',)
('ZomatoDB',)


In [22]:
# Then we specify a connection to the ZomatoDBdatabase
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='34006503',
                            database='ZomatoDB')

#From our connection we need a cursor, which acts as our interface into the database
cur = connection.cursor()

Tables creation section:

In [23]:
# Create main table that contains all dataset
# Coordinates will be added separately for the moment
make_main_table = """CREATE TABLE main_table(
                        Restaurant_ID INT NOT NULL,
                        Restaurant_Name VARCHAR(255),
                        Country_Code INT,
                        City VARCHAR(255),
                        Address VARCHAR(255),
                        Locality VARCHAR(255),
                        Cuisine VARCHAR(255),
                        Average_Cost_for_two INT,
                        Currency VARCHAR(255),
                        Table_booking BOOL,
                        Online_delivery BOOL,
                        Now_delivering BOOL,
                        Switch_menu BOOL,
                        Price_range INT,
                        Rating FLOAT,
                        Rating_color VARCHAR(255),
                        Rating_text VARCHAR(255),
                        Votes VARCHAR(255),
                        Average_cost_USD INT);"""
                            
cur.execute(make_main_table)
connection.commit()

In [24]:
make_restaurant_table = """CREATE TABLE restaurants(
                            Restaurant_ID INT NOT NULL,
                            Restaurant_Name VARCHAR(255),
                            Country_Code INT,
                            City VARCHAR(255),
                            Address VARCHAR(255));"""
cur.execute(make_restaurant_table)
connection.commit()

In [35]:
## Space for cuisine table

In [None]:
## Space for Location table

In [46]:
make_evaluation_table = """CREATE TABLE evaluation(
                                Restaurant_ID INT,
                                Table_booking BOOL,
                                Online_delivery BOOL,
                                Now_delivering BOOL,
                                Switch_menu BOOL,
                                Price_range INT
                                );"""
cur.execute(make_evaluation_table)
connection.commit()

In [26]:
make_rating_table = """CREATE TABLE ratings(
                            Restaurant_ID INT,
                            Rating FLOAT, 
                            Rating_color VARCHAR(255),
                            Rating_text VARCHAR(255),
                            Votes VARCHAR(255)
                            );"""

cur.execute(make_rating_table)
connection.commit()

In [27]:
# Verify that tables were created:
cur.execute('SHOW TABLES')
for tb in cur:
    print(tb)

('evaluation',)
('main_table',)
('ratings',)
('restaurants',)


Data migration section:

In [29]:
import pandas as pd
import numpy as np

zomato_dataframe = pd.read_csv('cleaned_zomato.csv')

In [30]:
zomato_dataframe.drop('Unnamed: 0', axis=1, inplace=True)
zomato_dataframe.head(3)


Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Cuisines,Average Cost for two,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes,Average cost USD,geometry
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City",French,1100,Botswana Pula(P),Yes,No,No,No,3.0,4.8,Dark Green,Excellent,314.0,100.1,POINT (121.027535 14.565443)
1,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City",Japanese,1100,Botswana Pula(P),Yes,No,No,No,3.0,4.8,Dark Green,Excellent,314.0,100.1,POINT (121.027535 14.565443)
2,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City",Desserts,1100,Botswana Pula(P),Yes,No,No,No,3.0,4.8,Dark Green,Excellent,314.0,100.1,POINT (121.027535 14.565443)


In [31]:
# We forgot to format table booking, online delivery and switch to order menu as booleans:
d = {'Yes': True, 'No':False}
zomato_dataframe['Has Table booking'] = zomato_dataframe['Has Table booking'].map(d)
zomato_dataframe['Has Online delivery'] = zomato_dataframe['Has Online delivery'].map(d)
zomato_dataframe['Is delivering now'] = zomato_dataframe['Is delivering now'].map(d)
zomato_dataframe['Switch to order menu'] = zomato_dataframe['Switch to order menu'].map(d)

In [32]:
# We also applied dropna to double check that we get rid of all null values:
print('Dimensions of our dataframe before drop:',zomato_dataframe.shape)
zomato_dataframe.dropna(inplace=True)
print('Dimensions of our dataframe after drop:',zomato_dataframe.shape)

Dimensions of our dataframe before drop: (19712, 20)
Dimensions of our dataframe after drop: (19708, 20)


In [33]:
# We also need to transform the column Restaurant ID to integer type:
zomato_dataframe['Restaurant ID'] = zomato_dataframe['Restaurant ID'].astype(int)

In [34]:
# Quick check that our datatypes in the dataframe match with our DB:
zomato_dataframe.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 19708 entries, 0 to 19711
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Restaurant ID         19708 non-null  int64  
 1   Restaurant Name       19708 non-null  object 
 2   Country Code          19708 non-null  int64  
 3   City                  19708 non-null  object 
 4   Address               19708 non-null  object 
 5   Locality              19708 non-null  object 
 6   Cuisines              19708 non-null  object 
 7   Average Cost for two  19708 non-null  object 
 8   Currency              19708 non-null  object 
 9   Has Table booking     19708 non-null  object 
 10  Has Online delivery   19708 non-null  object 
 11  Is delivering now     19708 non-null  object 
 12  Switch to order menu  19708 non-null  object 
 13  Price range           19708 non-null  float64
 14  Aggregate rating      19708 non-null  object 
 15  Rating color       

In [36]:
# To populate "main_table" we first added all columns (except geometry) as a tuple:
subset = zomato_dataframe[zomato_dataframe.drop('geometry',axis=1).columns]
tuples = [tuple(x) for x in subset.to_numpy()]

# Then we wrote this generalized formula:
sqlFormula = """INSERT INTO main_table 
                    (
                    Restaurant_ID,
                    Restaurant_Name,
                    Country_Code,
                    City,
                    Address,
                    Locality,
                    Cuisine,
                    Average_Cost_for_two,
                    Currency,
                    Table_booking,
                    Online_delivery,
                    Now_delivering,
                    Switch_menu,
                    Price_range,
                    Rating,
                    Rating_color,
                    Rating_text,
                    Votes,
                    Average_cost_USD
                    ) 
                    VALUES (
                    %s, %s, %s, %s, %s, %s, 
                    %s, %s, %s, %s, %s, %s, 
                    %s, %s, %s, %s, %s, %s, %s
                    );""" #%s placeholders

# And executed:
cur.executemany(sqlFormula, tuples)
connection.commit()

In [37]:
# Migrate data to the restaurants table so we can create primary keys on the Restauran Id column
insert_into_restaurants = """INSERT INTO restaurants
                                SELECT DISTINCT 
                                    Restaurant_ID,
                                    Restaurant_Name,
                                    Country_Code,
                                    City,
                                    Address
                                FROM main_table;"""
cur.execute(insert_into_restaurants)
connection.commit()

In [38]:
# Aleter Restaurant_ID as primary key in restaurants table
cur.execute(""" ALTER TABLE restaurants
                    ADD CONSTRAINT restaurnt_pk PRIMARY KEY(Restaurant_ID);""")
connection.commit()

In [49]:
# Migrate data to evaluation table from main_table:
insert_into_evaluation = """INSERT INTO evaluation
                                SELECT DISTINCT
                                    Restaurant_ID,
                                    Table_booking,
                                    Online_delivery,
                                    Now_delivering,
                                    Switch_menu,
                                    Price_range
                                FROM main_table;"""
cur.execute(insert_into_evaluation)
connection.commit()



In [50]:
cur.execute("""ALTER TABLE evaluation
                ADD CONSTRAINT restaurant_fk FOREIGN KEY(Restaurant_ID)
                    REFERENCES restaurants (Restaurant_ID);""")
connection.commit()

In [43]:
# Migrate data to ratings table from main_table:
insert_into_ratings = """INSERT INTO ratings
                                SELECT DISTINCT 
                                    Restaurant_ID,
                                    Rating,
                                    Rating_color,
                                    Rating_text,
                                    Votes
                                FROM main_table;"""
cur.execute(insert_into_ratings)
connection.commit()

In [44]:
cur.execute("""ALTER TABLE ratings
                ADD CONSTRAINT rest_fk FOREIGN KEY(Restaurant_ID)
                    REFERENCES restaurants (Restaurant_ID);""")
connection.commit()