# IBM Data Science Capstone Project 
### Opening a New Restaurant in the Historic City of Aurangabad, India
- Build a dataframe of neighborhoods in Aurangabad, by gathering the data from Government sites, Private real-estate sites
- Get the geographical coordinates of the neighborhoods
- Obtain the venue data for the neighborhoods from Foursquare API
- Explore and cluster the neighborhoods
- Select the best cluster to open a new restaurant

**First, Let's import libraries for processing datasets**

In [33]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!pip install BeautifulSoup4
from bs4 import BeautifulSoup
import requests

import json # library to handle JSON files

!pip install geopy 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
#!pip install -U scikit-learn scipy matplotlib
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('\nAll libraries imported successfully..!')


All libraries imported successfully..!


*The data required for this project was not readily available over the internet and is present in fragments. Data was collected from various sites then it was consolidated using a spreadsheet tool (Google Sheets) and data cleaning, data wrangling was performed using a python notebook.*
 

*Here's **[link]** to the python notebook, repository also has all the acquired datasets*

[link]: https://github.com/ChinmayGaikwad12/Coursera_Capstone/blob/master/Capstone%20DB%20.ipynb
**Let's load and explore the dataset of Aurangabad's Neighbourhoods**

In [34]:
df = pd.read_csv('Aurangabad Ward-Wise Data.csv')
df.head()

Unnamed: 0,Neighborhood,Ward,Latitude,Longitude,Borough,Total Population,Population Male,Population Female,Sex Ratio,No.of Houses,Area Hectare,Population Density,Avg. Price,SqftPrice Min,SqftPrice Max,Properties
0,Harsul,1,19.917838,75.340586,N-A,11304,6548,4756,726,1257,2894.8,3,4455,3076,5833,3
1,Bhagatsingh Nagar,2,19.919366,75.355821,N-A,16339,8015,8324,1039,2786,232.38,70,4854,3120,6587,5
2,Mhasoba Nagar,2,19.913759,75.350888,N-A,15245,8658,6587,761,4275,365.5,41,2932,2351,3512,3
3,Radhaswami Colony,3,19.918989,75.340617,N-A,10226,5374,4852,903,2238,295.12,34,2250,1258,3241,2
4,Ambar Hill,4,19.930736,75.324129,N-A,21406,11586,9820,848,4002,458.7,46,1299,854,1744,1


**Get Geographical co-ordinates of Aurangabad City**

In [35]:
# get the coordinates of Aurangabad
address = 'Aurangabad, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Aurangabad, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Aurangabad, India 19.877263, 75.3390241.


**Now, Let's visualize each `Neighbourhood` on the map using `Folium`**

In [36]:
# create map of Aurangabad using latitude and longitude values
map_abd = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_abd)  
    
map_abd

**Let's set `Client ID` and `Client Secret`**

In [37]:
# define Foursquare Credentials and Version
CLIENT_ID = '##' # your Foursquare ID
CLIENT_SECRET = '##' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ##CLIENT_SECRET:##

**Now, let's get the top `100 venues` that are within a radius of `5 km`**

In [38]:
radius = 5000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

**Convert the obatained results into pandas `Dataframe`**

In [39]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2854, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Harsul,19.917838,75.340586,Vivanta By Taj,19.907636,75.346034,Hotel
1,Harsul,19.917838,75.340586,Naturals,19.87633,75.3454,Ice Cream Shop
2,Harsul,19.917838,75.340586,Ellora Caves,19.910304,75.365656,Cave
3,Harsul,19.917838,75.340586,Kream & Krunch,19.876641,75.346088,Restaurant
4,Harsul,19.917838,75.340586,Domino's Pizza,19.876,75.337,Pizza Place


**Let's check how many venues were returned for each `Neighorhood` (*Showing first 5 results*)**

In [40]:
venues_df.groupby(["Neighborhood"]).count().head()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aarati Nagar,20,20,20,20,20,20
Ajabnagar,24,24,24,24,24,24
Altamash Colony,27,27,27,27,27,27
Ambar Hill,6,6,6,6,6,6
Ambedkarnagar,27,27,27,27,27,27


**Let's find out how many unique `Categories` can be curated from all the returned venues**

In [41]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 30 uniques categories.


In [42]:
# print out the list of categories
venues_df['VenueCategory'].unique()

array(['Hotel', 'Ice Cream Shop', 'Cave', 'Restaurant', 'Pizza Place',
       'Café', 'Indian Restaurant', 'Historic Site', 'Bed & Breakfast',
       'Shopping Mall', 'Multiplex', 'Bus Station', 'Coffee Shop',
       'Fast Food Restaurant', 'City', 'Smoke Shop', 'Mobile Phone Shop',
       'Department Store', 'Stadium', 'Athletics & Sports', 'Food Court',
       'Snack Place', 'Airport', 'Airport Terminal', 'ATM', 'Bank',
       'Motorcycle Shop', 'Italian Restaurant', 'Motel', 'Platform'],
      dtype=object)

**Print out dataframe of the categories (*Showing first 5 results*)**

In [43]:
col=["Category"]
category_df = pd.DataFrame(data = venues_df['VenueCategory'].unique(),columns=col)
category_df.head()

Unnamed: 0,Category
0,Hotel
1,Ice Cream Shop
2,Cave
3,Restaurant
4,Pizza Place


**Let's encode the results**

In [44]:
# one hot encoding
abd_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
abd_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [abd_onehot.columns[-1]] + list(abd_onehot.columns[:-1])
abd_onehot = abd_onehot[fixed_columns]

print(abd_onehot.shape)
abd_onehot.head()

(2854, 31)


Unnamed: 0,Neighborhoods,ATM,Airport,Airport Terminal,Athletics & Sports,Bank,Bed & Breakfast,Bus Station,Café,Cave,City,Coffee Shop,Department Store,Fast Food Restaurant,Food Court,Historic Site,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Mobile Phone Shop,Motel,Motorcycle Shop,Multiplex,Pizza Place,Platform,Restaurant,Shopping Mall,Smoke Shop,Snack Place,Stadium
0,Harsul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Harsul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Harsul,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Harsul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,Harsul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


**Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [45]:
abd_grouped = abd_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(abd_grouped.shape)
abd_grouped.head()

(141, 31)


Unnamed: 0,Neighborhoods,ATM,Airport,Airport Terminal,Athletics & Sports,Bank,Bed & Breakfast,Bus Station,Café,Cave,City,Coffee Shop,Department Store,Fast Food Restaurant,Food Court,Historic Site,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Mobile Phone Shop,Motel,Motorcycle Shop,Multiplex,Pizza Place,Platform,Restaurant,Shopping Mall,Smoke Shop,Snack Place,Stadium
0,Aarati Nagar,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.1,0.0,0.05,0.0,0.05,0.0,0.05,0.15,0.05,0.1,0.0,0.0,0.0,0.0,0.1,0.05,0.0,0.05,0.1,0.0,0.05,0.0
1,Ajabnagar,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.083333,0.0,0.041667,0.041667,0.0,0.041667,0.0,0.083333,0.125,0.041667,0.125,0.0,0.0,0.0,0.0,0.041667,0.083333,0.0,0.083333,0.041667,0.041667,0.0,0.041667
2,Altamash Colony,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.074074,0.037037,0.037037,0.037037,0.0,0.037037,0.0,0.074074,0.111111,0.037037,0.111111,0.0,0.0,0.0,0.0,0.074074,0.074074,0.0,0.074074,0.037037,0.037037,0.037037,0.037037
3,Ambar Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.333333,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0
4,Ambedkarnagar,0.0,0.037037,0.037037,0.037037,0.0,0.037037,0.0,0.074074,0.074074,0.037037,0.037037,0.0,0.037037,0.0,0.037037,0.111111,0.037037,0.074074,0.0,0.0,0.0,0.0,0.074074,0.074074,0.0,0.037037,0.074074,0.0,0.037037,0.037037


In [46]:
len(abd_grouped[abd_grouped["Restaurant"] > 0])

123

**Let's create a new DataFrame for `Restaurant` data only**

In [47]:
abd_restaurant = abd_grouped[["Neighborhoods","Restaurant"]]
abd_restaurant.head()

Unnamed: 0,Neighborhoods,Restaurant
0,Aarati Nagar,0.05
1,Ajabnagar,0.083333
2,Altamash Colony,0.074074
3,Ambar Hill,0.0
4,Ambedkarnagar,0.037037


### **Clustering Neighborhoods**

**Run `k-means` to cluster the neighborhoods in `Aurangabad` into 5 clusters.**

In [48]:
# set number of clusters
kclusters = 5

abd_clustering = abd_restaurant.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(abd_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 3, 3, 1, 4, 3, 0, 3, 0, 4], dtype=int32)

In [49]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
abd_merged = abd_restaurant.copy()

# add clustering labels
abd_merged["Cluster Labels"] = kmeans.labels_

abd_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
abd_merged.head()

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels
0,Aarati Nagar,0.05,4
1,Ajabnagar,0.083333,3
2,Altamash Colony,0.074074,3
3,Ambar Hill,0.0,1
4,Ambedkarnagar,0.037037,4


**Let's merge `abd_grouped` data with Aurangabad Ward-Wise data to add latitude/longitude for each neighborhood**

In [50]:
abd_merged = abd_merged.join(df.set_index("Neighborhood"), on="Neighborhood")

print(abd_merged.shape)
abd_merged.head() 

(141, 18)


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Ward,Latitude,Longitude,Borough,Total Population,Population Male,Population Female,Sex Ratio,No.of Houses,Area Hectare,Population Density,Avg. Price,SqftPrice Min,SqftPrice Max,Properties
0,Aarati Nagar,0.05,4,19,19.904502,75.366124,NE-A,9096,4817,4279,888,2122,34.39,264,1260,720,1800,2
1,Ajabnagar,0.083333,3,66,19.875788,75.328989,C-A,9975,5384,4591,853,2245,30.16,330,2275,1204,3345,2
2,Altamash Colony,0.074074,3,61,19.885573,75.350363,E-A,9142,4592,4550,991,1905,86.22,106,3068,3114,3021,3
3,Ambar Hill,0.0,1,4,19.930736,75.324129,N-A,21406,11586,9820,848,4002,458.7,46,1299,854,1744,1
4,Ambedkarnagar,0.037037,4,24,19.896226,75.36477,NE-A,9709,5005,4704,940,2142,61.33,158,1255,721,1789,1


**Sort the results by `Cluster Labels`**

In [51]:
print(abd_merged.shape)
abd_merged.sort_values(["Cluster Labels"], inplace=True)
abd_merged.head()

(141, 18)


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Ward,Latitude,Longitude,Borough,Total Population,Population Male,Population Female,Sex Ratio,No.of Houses,Area Hectare,Population Density,Avg. Price,SqftPrice Min,SqftPrice Max,Properties
132,Vedant Nagar,0.0625,0,99,19.861578,75.314988,S-A,16272,8551,7721,903,3774,5.92,2748,4099,2316,5882,10
133,Vidya Nagar,0.068966,0,75,19.872183,75.351476,S-A,9848,5118,4730,924,1974,22.14,444,3319,3230,3408,3
101,Radhaswami Colony,0.0625,0,3,19.918989,75.340617,N-A,10226,5374,4852,903,2238,295.12,34,2250,1258,3241,2
28,Buddi lane,0.052632,0,32,19.89081,75.321308,C-A,11523,5915,5608,948,1943,110.82,103,5229,4578,5879,5
65,Khadkeshwar,0.055556,0,34,19.886429,75.317573,C-A,10109,5135,4974,969,1946,33.78,299,10723,8987,12458,32


### **Finally, let's visualize the resulting clusters on the map**

In [52]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(abd_merged['Latitude'], abd_merged['Longitude'], abd_merged['Neighborhood'], abd_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### **Let's examine the clusters**

`Cluster 0`

In [53]:
cluster0 = abd_merged.loc[abd_merged['Cluster Labels'] == 0].copy()
print('Number of Neighbourhoods:',cluster0.shape[0])
cluster0.head()

Number of Neighbourhoods: 27


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Ward,Latitude,Longitude,Borough,Total Population,Population Male,Population Female,Sex Ratio,No.of Houses,Area Hectare,Population Density,Avg. Price,SqftPrice Min,SqftPrice Max,Properties
132,Vedant Nagar,0.0625,0,99,19.861578,75.314988,S-A,16272,8551,7721,903,3774,5.92,2748,4099,2316,5882,10
133,Vidya Nagar,0.068966,0,75,19.872183,75.351476,S-A,9848,5118,4730,924,1974,22.14,444,3319,3230,3408,3
101,Radhaswami Colony,0.0625,0,3,19.918989,75.340617,N-A,10226,5374,4852,903,2238,295.12,34,2250,1258,3241,2
28,Buddi lane,0.052632,0,32,19.89081,75.321308,C-A,11523,5915,5608,948,1943,110.82,103,5229,4578,5879,5
65,Khadkeshwar,0.055556,0,34,19.886429,75.317573,C-A,10109,5135,4974,969,1946,33.78,299,10723,8987,12458,32


`Cluster 1`

In [54]:
cluster1 = abd_merged.loc[abd_merged['Cluster Labels'] == 1].copy()
print('Number of Neighbourhoods:',cluster1.shape[0])
cluster1.head()

Number of Neighbourhoods: 18


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Ward,Latitude,Longitude,Borough,Total Population,Population Male,Population Female,Sex Ratio,No.of Houses,Area Hectare,Population Density,Avg. Price,SqftPrice Min,SqftPrice Max,Properties
63,Kanchanwadi,0.0,1,104,19.836215,75.279771,S-A,14999,7510,7489,997,1541,30.1,498,4949,2250,7647,15
137,Waluj,0.0,1,116,19.83267,75.19974,WAL,31000,15550,15450,994,2340,32.0,969,5147,2462,7831,21
93,Padegaon,0.0,1,7,19.886441,75.274969,NW-A,12411,6200,6211,1002,1410,89.57,138,4202,1866,6538,26
3,Ambar Hill,0.0,1,4,19.930736,75.324129,N-A,21406,11586,9820,848,4002,458.7,46,1299,854,1744,1
37,Datta Nagar,0.0,1,45,19.848795,75.210749,C-A,11397,5915,5482,927,2019,79.92,142,2393,1245,3541,1


`Cluster 2`

In [55]:
cluster2 = abd_merged.loc[abd_merged['Cluster Labels'] == 2].copy()
print('Number of Neighbourhoods:',cluster2.shape[0])
cluster2.head()

Number of Neighbourhoods: 11


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Ward,Latitude,Longitude,Borough,Total Population,Population Male,Population Female,Sex Ratio,No.of Houses,Area Hectare,Population Density,Avg. Price,SqftPrice Min,SqftPrice Max,Properties
113,Sangram Nagar,0.142857,2,114,19.840713,75.324624,S-A,20276,10154,10122,997,980,21.01,965,2796,2062,3529,25
114,Sanjay Nagar,0.117647,2,57,19.870068,75.37731,E-A,9973,5112,4861,951,1817,15.37,648,2455,1255,3654,2
112,Sangharsh Nagar,0.105263,2,81,19.866236,75.378514,E-A,19930,10559,9371,887,4473,7.37,2704,1695,1248,2142,1
82,Mukundnagar,0.1,2,86,19.859842,75.377845,E-A,13639,6987,6652,952,2677,6.38,2137,10035,9856,10214,21
72,MIDC Chikalthana,0.105263,2,23,19.877772,75.384753,E-A,14523,7534,6989,928,2814,36.94,393,32500,32500,32500,13


`Cluster 3`

In [56]:
cluster3 = abd_merged.loc[abd_merged['Cluster Labels'] == 3].copy()
print('Number of Neighbourhoods:',cluster3.shape[0])
cluster3.head()

Number of Neighbourhoods: 58


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Ward,Latitude,Longitude,Borough,Total Population,Population Male,Population Female,Sex Ratio,No.of Houses,Area Hectare,Population Density,Avg. Price,SqftPrice Min,SqftPrice Max,Properties
122,Shivaji Nagar,0.086957,3,110,19.85959,75.341022,S-A,17086,8574,8512,993,1401,18.24,936,3854,3264,4444,13
77,Mayurban Colony,0.083333,3,107,19.855268,75.333102,S-A,8368,4254,4114,967,840,12.1,691,1969,1248,2689,2
120,Shantipura,0.083333,3,10,19.883667,75.301438,NW-A,3279,1868,1411,755,968,398.2,8,2110,1010,3210,2
121,Sharif Colony,0.076923,3,47,19.890083,75.342011,NE-A,10697,5652,5045,893,2323,92.1,116,2343,1235,3451,2
70,Kranti Chowk,0.083333,3,70,19.873326,75.326358,C-A,14816,7702,7114,924,3488,54.02,274,10043,9827,10259,2


`Cluster 4`

In [57]:
cluster4 = abd_merged.loc[abd_merged['Cluster Labels'] == 4].copy()
print('Number of Neighbourhoods:',cluster4.shape[0])
cluster4.head()

Number of Neighbourhoods: 27


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Ward,Latitude,Longitude,Borough,Total Population,Population Male,Population Female,Sex Ratio,No.of Houses,Area Hectare,Population Density,Avg. Price,SqftPrice Min,SqftPrice Max,Properties
118,Shahganj,0.04,4,36,19.891792,75.334796,C-A,14400,7840,6560,837,1321,85.1,169,5568,3230,7905,6
126,Sudarshan Nagar,0.047619,4,26,19.905027,75.352195,N-A,10581,5563,5018,902,2360,15.36,688,4362,2140,6584,4
123,Shivneri Colony,0.038462,4,41,19.898045,75.359005,NE-A,9506,4860,4646,956,1620,55.39,171,1167,980,1354,1
0,Aarati Nagar,0.05,4,19,19.904502,75.366124,NE-A,9096,4817,4279,888,2122,34.39,264,1260,720,1800,2
108,Rauza Bagh,0.047619,4,13,19.905089,75.340982,N-A,9208,4521,4687,1037,1201,13.2,697,10034,9857,10210,2


### **Conclusion:**

In [58]:
clusts = [cluster0,cluster1,cluster2,cluster3,cluster4]
mean_res = []
i=0
for c in clusts:
    mean_res.append([i,np.round(c['Restaurant'].mean(),4)])
    i=i+1
mean_res
col = ['Cluster','Mean Result']
res_mean_df = pd.DataFrame(data=mean_res,columns=col).set_index('Cluster')
res_mean_df

Unnamed: 0_level_0,Mean Result
Cluster,Unnamed: 1_level_1
0,0.0604
1,0.0
2,0.114
3,0.0792
4,0.0437


Most of the restaurants are concentrated in the central area of **Aurangabad** city, with the **highest number in `cluster 2`** , **moderate number in  `cluster 0` , `cluster 3`** and  **`cluster 4`**. On the other hand, **`cluster 1` has very low number to totally no** restaurants in the neighborhoods. 

This represents a great opportunity and high potential areas to open a new Restaurant as there is  moderate competition from existing restaurants. Meanwhile, restaurants in **`cluster 2`** are likely suffering from intense competition due to oversupply and high concentration of restaurants. From another perspective, this also shows that the oversupply of restaurants mostly happened in the central area of the city, with the suburb area still have very few Restaurants. 

Therefore, this project recommends to capitalize on these findings to open a new restaurant in neighborhoods in **`cluster 3`** with moderate competition. New franchise with unique service propositions can stand out from the competition can also open new restaurants in neighborhoods in **`cluster 3`** with moderate competition. 

Lastly, new comers to this industry are advised to avoid neighborhoods in **`cluster 2`** which already have high concentration of restaurants and suffering from intense competition.

In [77]:
df_result = abd_merged.loc[abd_merged['Cluster Labels'] == 3].copy().reset_index(drop=True)
df_result.head()

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Ward,Latitude,Longitude,Borough,Total Population,Population Male,Population Female,Sex Ratio,No.of Houses,Area Hectare,Population Density,Avg. Price,SqftPrice Min,SqftPrice Max,Properties
0,Shivaji Nagar,0.086957,3,110,19.85959,75.341022,S-A,17086,8574,8512,993,1401,18.24,936,3854,3264,4444,13
1,Mayurban Colony,0.083333,3,107,19.855268,75.333102,S-A,8368,4254,4114,967,840,12.1,691,1969,1248,2689,2
2,Shantipura,0.083333,3,10,19.883667,75.301438,NW-A,3279,1868,1411,755,968,398.2,8,2110,1010,3210,2
3,Sharif Colony,0.076923,3,47,19.890083,75.342011,NE-A,10697,5652,5045,893,2323,92.1,116,2343,1235,3451,2
4,Kranti Chowk,0.083333,3,70,19.873326,75.326358,C-A,14816,7702,7114,924,3488,54.02,274,10043,9827,10259,2


In [78]:
df_result2 = df_result.sort_values(['Restaurant','Avg. Price'],ascending=[0,1])
df_final = df_result2

In [79]:
df_final.shape

(58, 18)

In [80]:
df_final.head()

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Ward,Latitude,Longitude,Borough,Total Population,Population Male,Population Female,Sex Ratio,No.of Houses,Area Hectare,Population Density,Avg. Price,SqftPrice Min,SqftPrice Max,Properties
41,Kotla Colony,0.095238,3,67,19.872954,75.32284,C-A,10660,5634,5026,892,2246,73.68,144,2251,2145,2356,2
5,Nandanvan Colony,0.090909,3,10,19.888834,75.300768,NW-A,6129,2843,3286,1156,796,661.06,9,2767,2547,2986,3
8,Pratap nagar,0.090909,3,106,19.85784,75.324355,S-A,7031,3542,3489,985,498,10.2,689,4476,2368,6584,1
53,Garkheda,0.090909,3,93,19.860812,75.346055,S-A,7307,3969,3338,841,1416,7.65,955,4750,500,9000,21
45,Kabra Nagar,0.090909,3,109,19.85556,75.343598,S-A,15509,7854,7655,975,970,21.25,729,9050,7854,10245,5


### Let's visualize the cluster that is most suitable to open a new restaurant in the city

In [81]:
# create map
map_result = folium.Map(location=[latitude, longitude], zoom_start=12.5)

# add markers to the map
for lat, lon, poi, ward in zip(df_final['Latitude'], df_final['Longitude'], df_final['Neighborhood'], df_final['Ward']):
    label = folium.Popup(str(poi) +' '+ str(ward), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color=blue,
        fill=True,
        #fill_color=yellow,
        fill_opacity=0.7).add_to(map_result)
       
map_result

### Out of all places in the cluster, let's see top 10 prospect places to open a new reataurant

In [82]:
df_final_10 = df_final.head(10)

# create map
map_result = folium.Map(location=[latitude, longitude], zoom_start=12.5)

# add markers to the map
for lat, lon, poi, ward in zip(df_final_10['Latitude'], df_final_10['Longitude'], df_final_10['Neighborhood'], df_final_10['Ward']):
    label = folium.Popup(str(poi) +' '+ str(ward), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color=blue,
        fill=True,
        #fill_color=yellow,
        fill_opacity=0.7).add_to(map_result)
       
map_result

# Thank You !