## Food Delivery App Data Analysis
Millions of food enthusiasts use the Zomato platform to order food and our main goal is to clean and to prepare the zomato dataset in order to draw insights. Through this analysis we plan on deciphering what makes a restaurant shine, identifying trends in dining preferences and uncovering the secrets behind customer choices.

Our work enhances the Zomato experience for users and restaurant owners alike. Our insights help restaurants refine their offerings, and diners discover the perfect place to satisfy their cravings.

## Module 1
### Task 1: Unlocking Zomato's Flavorful Universe
In this step, we import the main libraries that we will be using to clean the data. The first step will be reading the csv dataset and dropping irrelevant columns

In [1]:
#--- Import Pandas ---
import pandas as pd
# remove any future warning 
import warnings
warnings.filterwarnings("ignore")
#--- Read in dataset ----
df = pd.read_csv("zomato.csv")

df.drop(["address","phone"],inplace=True,axis=1)
#--- Inspect data ---

df 

Unnamed: 0,name,online_order,book_table,rate,votes,location,rest_type,dish_liked,cuisines,approx_cost(for two people),listed_in(type)
0,Jalsa,Yes,Yes,4.1/5,775,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,Buffet
1,Spice Elephant,Yes,No,4.1/5,787,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,Buffet
2,San Churro Cafe,Yes,No,3.8/5,918,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800,Buffet
3,Addhuri Udupi Bhojana,No,No,3.7/5,88,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300,Buffet
4,Grand Village,No,No,3.8/5,166,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600,Buffet
...,...,...,...,...,...,...,...,...,...,...,...
56247,Best Brews - Four Points by Sheraton Bengaluru...,No,No,3.6 /5,27,Whitefield,Bar,,Continental,1500,Pubs and bars
56248,Vinod Bar And Restaurant,No,No,,0,Whitefield,Bar,,Finger Food,600,Pubs and bars
56249,Plunge - Sheraton Grand Bengaluru Whitefield H...,No,No,,0,Whitefield,Bar,,Finger Food,2000,Pubs and bars
56250,Chime - Sheraton Grand Bengaluru Whitefield Ho...,No,Yes,4.3 /5,236,"ITPL Main Road, Whitefield",Bar,"Cocktails, Pizza, Buttermilk",Finger Food,2500,Pubs and bars


### Task 2: Renaming Columns in a Dataframe
This step involves renaming the columns to make the analysis easier. 


In [2]:
rename_columns = {"rate":"rating","approx_cost(for two people)":"approx_cost","listed_in(type)":"type"}
df.rename(columns=rename_columns,inplace=True)

#--- Inspect data ---
df.head()

Unnamed: 0,name,online_order,book_table,rating,votes,location,rest_type,dish_liked,cuisines,approx_cost,type
0,Jalsa,Yes,Yes,4.1/5,775,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,Buffet
1,Spice Elephant,Yes,No,4.1/5,787,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,Buffet
2,San Churro Cafe,Yes,No,3.8/5,918,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800,Buffet
3,Addhuri Udupi Bhojana,No,No,3.7/5,88,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300,Buffet
4,Grand Village,No,No,3.8/5,166,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600,Buffet


### Task 3: Data Crafting
The step involves removing null values in the name column, filling null values in the online_order, location,rest_type,dish_liked,cuisines,types and booktable column to an appropriate string. In addition we will be converting null values in rating, votes and approx_cost to 0.

In [3]:
# deleting null values of name column
df.dropna(subset = ["name"],inplace=True)
# handling null values of online_order
df["online_order"].fillna(value="NA",inplace=True)
# changing null values of book_table
df["book_table"].fillna(value="NA",inplace=True)

# changing null values of rating to zero as it is a numerical datatype
df["rating"].fillna(value=0,inplace=True)

# changing null values of votes to zero as it is a numerical datatype
df["votes"].fillna(value=0,inplace=True)

# changing null values of location to NA
df["location"].fillna(value="NA",inplace=True)

# changing null values of rest_type to NA
df["rest_type"].fillna(value="NA",inplace=True)

# changing null values of dishliked to NA
df["dish_liked"].fillna(value="NA",inplace=True)

# changing null values of cuisines to NA
df["cuisines"].fillna(value="NA",inplace=True)

# changing null values of approxcost to 0 as it is a numerical value
df["approx_cost"].fillna(value=0,inplace=True)

# changing null values of type to NA
df["type"].fillna(value="NA",inplace=True)

#--- Inspect data ---
df

Unnamed: 0,name,online_order,book_table,rating,votes,location,rest_type,dish_liked,cuisines,approx_cost,type
0,Jalsa,Yes,Yes,4.1/5,775,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,Buffet
1,Spice Elephant,Yes,No,4.1/5,787,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,Buffet
2,San Churro Cafe,Yes,No,3.8/5,918,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800,Buffet
3,Addhuri Udupi Bhojana,No,No,3.7/5,88,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300,Buffet
4,Grand Village,No,No,3.8/5,166,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600,Buffet
...,...,...,...,...,...,...,...,...,...,...,...
56247,Best Brews - Four Points by Sheraton Bengaluru...,No,No,3.6 /5,27,Whitefield,Bar,,Continental,1500,Pubs and bars
56248,Vinod Bar And Restaurant,No,No,0,0,Whitefield,Bar,,Finger Food,600,Pubs and bars
56249,Plunge - Sheraton Grand Bengaluru Whitefield H...,No,No,0,0,Whitefield,Bar,,Finger Food,2000,Pubs and bars
56250,Chime - Sheraton Grand Bengaluru Whitefield Ho...,No,Yes,4.3 /5,236,"ITPL Main Road, Whitefield",Bar,"Cocktails, Pizza, Buttermilk",Finger Food,2500,Pubs and bars


### Task 4: Removing Duplicate Rows
Our exploration of the Zomato dataset continues as we embark on a mission to eliminate duplicates. 

In [4]:
# droping the duplicates value keeping the first
df.drop_duplicates(inplace=True,keep="first")
#--- Inspect data ---
df

Unnamed: 0,name,online_order,book_table,rating,votes,location,rest_type,dish_liked,cuisines,approx_cost,type
0,Jalsa,Yes,Yes,4.1/5,775,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,Buffet
1,Spice Elephant,Yes,No,4.1/5,787,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,Buffet
2,San Churro Cafe,Yes,No,3.8/5,918,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800,Buffet
3,Addhuri Udupi Bhojana,No,No,3.7/5,88,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300,Buffet
4,Grand Village,No,No,3.8/5,166,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600,Buffet
...,...,...,...,...,...,...,...,...,...,...,...
56247,Best Brews - Four Points by Sheraton Bengaluru...,No,No,3.6 /5,27,Whitefield,Bar,,Continental,1500,Pubs and bars
56248,Vinod Bar And Restaurant,No,No,0,0,Whitefield,Bar,,Finger Food,600,Pubs and bars
56249,Plunge - Sheraton Grand Bengaluru Whitefield H...,No,No,0,0,Whitefield,Bar,,Finger Food,2000,Pubs and bars
56250,Chime - Sheraton Grand Bengaluru Whitefield Ho...,No,Yes,4.3 /5,236,"ITPL Main Road, Whitefield",Bar,"Cocktails, Pizza, Buttermilk",Finger Food,2500,Pubs and bars


### Task 5: Refining Zomato's Culinary Palette.
In our ongoing quest through the Zomato dataset, we've now set our sights on refining the data even further. We're on a mission to eliminate any traces of 'RATED' or 'Rated' from multiple columns, ensuring that our data reflects the unadulterated essence of dining experiences.

In [5]:
columns_to_filter = ["name","type","approx_cost","cuisines","dish_liked","rest_type","location","votes","rating","book_table"]
# Remove rows in 'df' where any of the specific columns contain 'RATED' or 'Rated'
df = df[~df[columns_to_filter].apply(lambda x: x.str.contains('RATED', case=False)).any(axis=1)]


#--- Inspect data ---
df.shape

(34321, 11)

### Task 6: Clarifying Zomato's Culinary Data.
This steps further refines the zomato's data by filtereing rows where the online_order column contains either 'Yes' or 'No'.In addition, Replacing occurences of various strings in the rating column in order to make the analysis easier later on.

In [12]:
df["online_order"].unique()
# online order table should have only yes and no, remove other values
df=df.query("online_order == 'Yes'|online_order == 'No'")
# check for rating table and replace NEW,- values to 0 and remove /5
replacements = {"NEW":0,"-":0,"/5":""}
df["rating"] = df["rating"].replace(replacements,regex=True)
#--- Inspect data ---
df[["rating","online_order"]]

Unnamed: 0,rating,online_order
0,4.1,Yes
1,4.1,Yes
2,3.8,Yes
3,3.7,No
4,3.8,No
...,...,...
56247,3.6,No
56248,0,No
56249,0,No
56250,4.3,No


### Task 7: Data Cleaning and Exporting to Csv
By skillfully applying regular expressions, we've polished the restaurant names, removing any extraneous characters that may have marred their authenticity. Now, with the data in its purest form, we save it to 'zomatocleaned.csv. Doing so, we finally complete our data cleaning and refinement process.

In [13]:
df["approx_cost"].count()

34272

In [14]:
# remove unknown character from dataset
df["name"] = df["name"].replace(r'[^A-Za-z0-9\s]',"",regex=True)

df["approx_cost"] = df["approx_cost"].astype(str)

count_numeric = df["approx_cost"].str.replace(",","").str.isnumeric()

df.drop(index=df[count_numeric == False].index,inplace=True)

df["approx_cost"] = df["approx_cost"].str.replace(",","")
df["approx_cost"] = df["approx_cost"].astype(int)

# Export the dataset with following code
df.to_csv('zomatocleaned.csv', index = False)

### Task 8: Data Download, Import, and Database Connection.

In [1]:
# -- Load the sql extention ----
%load_ext sql

# --- Load your mysql db using credentials from the "DB" area ---
%sql mysql+pymysql://root:password@localhost/food_delivery_data