## ZOMATO STORE SPATIAL ANALYSIS
- In this Spatial Data Analysis problem, we're going to analyse the [Zomato Food Delivery Store](https://www.zomato.com/ncr) so as to increase their sales by covering larger market


- The basic idea of analyzing the Zomato dataset is to get a fair idea about the factors affecting the aggregate rating of each restaurant, establishment of different types of restaurant at different places, 
Bengaluru being one such city has more than 12,000 restaurants with restaurants serving dishes from all over the world.
With each day new restaurants opening the industry has'nt been saturated yet and the demand is increasing day by day. 
Inspite of increasing demand it however has become difficult for new restaurants to compete with established restaurants. Most of them serving the same food. 
Bengaluru being an IT capital of India. Most of the people here are dependent mainly on the restaurant food as they don't have time to cook for themselves. 
With such an overwhelming demand of restaurants it has therefore become important to study the demography of a location. What kind of a food is more popular in a locality. 
Do the entire locality loves vegetarian food. If yes then is that locality populated by a particular sect of people for eg. Jain, Marwaris, Gujaratis who are mostly vegetarian. 
These kind of analysis can be done using the data, by studying different factors.

In [50]:
from geopy.geocoders import Nominatim
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np


### so that u dont have warnings
from warnings import filterwarnings
filterwarnings('ignore')

In [51]:
df = pd.read_csv('data/zomato.csv')
df.head(2)

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,approx_cost(for two people),reviews_list,menu_item,listed_in(type),listed_in(city)
0,https://www.zomato.com/bangalore/jalsa-banasha...,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,Yes,Yes,4.1/5,775,080 42297555\r\n+91 9743772233,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari
1,https://www.zomato.com/bangalore/spice-elephan...,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,Yes,No,4.1/5,787,080 41714161,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari


In [52]:
df.shape

(51717, 17)

### Data Preprocessing:  Check if there is a Missing Values

In [53]:
df.isna().sum()

url                                0
address                            0
name                               0
online_order                       0
book_table                         0
rate                            7775
votes                              0
phone                           1208
location                          21
rest_type                        227
dish_liked                     28078
cuisines                          45
approx_cost(for two people)      346
reviews_list                       0
menu_item                          0
listed_in(type)                    0
listed_in(city)                    0
dtype: int64

### Drop All the missing Location Values for Spatial Analysis

In [54]:
df.dropna(axis='index',subset=['location'],inplace=True)

In [55]:
df.isna().sum()

url                                0
address                            0
name                               0
online_order                       0
book_table                         0
rate                            7754
votes                              0
phone                           1187
location                           0
rest_type                        206
dish_liked                     28057
cuisines                          24
approx_cost(for two people)      325
reviews_list                       0
menu_item                          0
listed_in(type)                    0
listed_in(city)                    0
dtype: int64

#### After Removing the Non-Geo tagged columns, we have 93 uniquey geotagged 

In [56]:
len(df['location'].unique())

93

In [57]:
locations = pd.DataFrame({"Name":df['location'].unique()})

In [58]:
locations.head(7)

Unnamed: 0,Name
0,Banashankari
1,Basavanagudi
2,Mysore Road
3,Jayanagar
4,Kumaraswamy Layout
5,Rajarajeshwari Nagar
6,Vijay Nagar


### Extracting Longitude and Latitude with Geopy from the Data

In [None]:
!pip3 install geopy

- [Nominatim](https://nominatim.org/) will help extract the x, y data from your data. You have to assign you user agent as app for this case

In [None]:
geolocator = Nominatim(user_agent="app")

In [None]:
"""lat_lon=[]
for location in locations['Name']:
    location = geolocator.geocode(location)
    if location is None:
        lat_lon.append(np.nan)
    else:    
        geo=(location.latitude,location.longitude)
        lat_lon.append(geo)"""


- Create a new empty list of lat and long, in which we will append our extracted geodata using geocode from [geopy](https://geopy.readthedocs.io/en/stable/#:~:text=geopy%20is%20a%20Python%20client,geocoders%20and%20other%20data%20sources.)

### Alternative solution to [OSM](link)

- At the time of initializing Nominatim , u have to set Timeout parameter as None.It's is good to note taht the Nominatim only performs well in small dataset.For huge dataset, it will crash and die in the fly.

In [None]:
sample = locations[0:100]
sample
from geopy.geocoders import Nominatim
geolocator=Nominatim(user_agent="app",timeout=None)
 
lat=[]
lon=[]
for location in sample['Name']:
    location = geolocator.geocode()    
    if location is None:
        lat.append(np.nan)
        lon.append(np.nan)
    else:
        lat.append(location.latitude)
        lon.append(location.longitude)

In [None]:
lat = []
lon = []
for location in locations['Name']:
    location = geolocator.geocode(location)    
    if location is None:
        lat.append(np.nan)
        lon.append(np.nan)
    else:
        lat.append(location.latitude)
        lon.append(location.longitude)

In [None]:
locations['lat'] = lat
locations['lon'] = lon

In [None]:
locations.head(7)

In [None]:
locations.to_csv('data/zomato_locations.csv',index=False)

In [None]:
##locations=pd.read_csv('E:\Spatial Analysis\Zomato/zomato_locations.csv')

In [None]:
##locations.head()

#### We have found out latitude and longitude of each location listed in the dataset using geopy
#### This is used to plot maps.

In [None]:
Rest_locations=pd.DataFrame(df['location'].value_counts().reset_index())

In [None]:
Rest_locations.columns=['Name','count']
Rest_locations.head()

#### now combine both the dataframes

In [None]:
locations.shape

In [None]:
Rest_locations.shape

In [None]:
Restaurant_locations=Rest_locations.merge(locations,on='Name',how="left").dropna()
Restaurant_locations.head()

In [None]:
Restaurant_locations['count'].max()

In [None]:
!pip install folium

In [59]:
def generateBaseMap(default_location=[12.97, 77.59], default_zoom_start=12):
    base_map = folium.Map(location=default_location, zoom_start=default_zoom_start)
    return base_map

In [None]:
import folium
from folium.plugins import HeatMap
basemap=generateBaseMap()

In [None]:
basemap

In [None]:
Restaurant_locations[['lat','lon','count']]

#### Heatmap of Bengalore Restaurants

In [None]:
HeatMap(Restaurant_locations[['lat','lon','count']],zoom=20,radius=15).add_to(basemap)

In [None]:
basemap

##### Geo Analysis: where are the restaurants located in Bengaluru using Marker Cluster?

In [None]:
from folium.plugins import FastMarkerCluster

In [None]:
# Plugin: FastMarkerCluster
FastMarkerCluster(data=Restaurant_locations[['lat','lon','count']].values.tolist()).add_to(basemap)

basemap

#### Heat Map: where are the restaurants with high average rate?

In [None]:
df.head()

In [None]:
len(df['location'].unique())

In [None]:
df['rate'].unique()

In [None]:
df.dropna(axis=0,subset=['rate'],inplace=True)

In [None]:
df['rate'].unique()

In [None]:
def split(x):
    return x.split('/')[0]

In [None]:
df['rating']=df['rate'].apply(split)

In [None]:
df['rating'].unique()

In [None]:
df.replace('NEW',0,inplace=True)

In [None]:
df.replace('-',0,inplace=True)

In [None]:
df.head()

In [None]:
df.groupby(['location'])['rating'].sum()

In [None]:
df.dtypes

In [None]:
df['rating']=pd.to_numeric(df['rating'])

In [None]:
df['rating'].dtype

In [None]:
df.groupby(['location'])['rating'].mean().sort_values(ascending=False)

In [None]:
df.groupby(['location'])['rating'].mean()

In [None]:
avg_rating=df.groupby(['location'])['rating'].mean().values

In [None]:
avg_rating

In [None]:
loc=df.groupby(['location'])['rating'].mean().index
loc

In [None]:
geolocator=Nominatim(user_agent="app")

In [None]:
lat=[]
lon=[]
for location in loc:
    location = geolocator.geocode(location)    
    if location is None:
        lat.append(np.nan)
        lon.append(np.nan)
    else:
        lat.append(location.latitude)
        lon.append(location.longitude)

In [None]:
rating=pd.DataFrame()

In [None]:
rating['location']=loc
rating['lat']=lat
rating['lon']=lon
rating['avg_rating']=avg_rating

In [None]:
rating.head()

In [None]:
rating.isna().sum()

In [None]:
rating=rating.dropna()

In [None]:
HeatMap(rating[['lat','lon','avg_rating']],zoom=20,radius=15).add_to(basemap)
basemap

### Above are the restaurants with high average rate

#### Heatmap of North Indian restaurants

In [None]:
df.head()

In [None]:
df2= df[df['cuisines']=='North Indian']
df2.head()

In [None]:
north_india=df2.groupby('location')['url'].count().reset_index()
north_india.columns=['Name','count']
north_india.head()

In [None]:
north_india=north_india.merge(locations,on="Name",how='left').dropna()

In [None]:
north_india.head()

In [None]:
basemap=generateBaseMap()
HeatMap(north_india[['lat','lon','count']].values.tolist(),zoom=20,radius=15).add_to(basemap)
basemap

#### Automate Above Stuffs, & create for South India, & many other zones

In [None]:
def Heatmap_Zone(zone):
    df3=df[df['cuisines']==zone]
    df_zone=df3.groupby(['location'],as_index=False)['url'].agg('count')
    df_zone.columns=['Name','count']
    df_zone=df_zone.merge(locations,on="Name",how='left').dropna()
    basemap=generateBaseMap()
    HeatMap(df_zone[['lat','lon','count']].values.tolist(),zoom=20,radius=15).add_to(basemap)
    return basemap

In [None]:
df['cuisines'].unique()

In [None]:
Heatmap_Zone('South Indian')