# Capstone Project -The Battle of the Neighborhoods
### Find the best places to build a cafe in Delhi with help of Machine Learning

# Note- If the Maps are not showing on GitHub then please redirect to this link:- 
https://nbviewer.jupyter.org/github/daivikbhatia/Data-Science-Projects/blob/master/Delhi-Cafe-Capstone/Final_Capstone.ipynb

# Introduction: To problem <a name="introduction"></a>
All of the great Cafe are densely concentrated in one or two places in the city. So people usually have to move from one corner of the city to another just to hangout in a good cafe. Also opening a new cafe in the already existing competitive area will only increase the dense competition and decrease the profit. This project concentrates on what parameters all the Cafe exist in a certain area and then the goal is to find similar areas with similar parameters but with lesser Cafe or competition to solve the need of both Customers as well as Owners.

## The Project
#### In this project I used Machine Learning Clustering Algorithm i.e k-means to find the most suitable place to build a Cafe in Delhi.
#### The clusters are categorised on basis of five parameters: Neighboring Cafes, Population of that area, Nearby Metro Station, Nearby Market and Nearby Universities

## References 
#### https://censusindia.gov.in(for places and population data)
#### Foursquare(for finding nearby venues and cafes)

### Importing required Libraries

In [1]:
import pandas as pd #for data cleaning
import numpy as np #for forming arrays and for math operations
pd.set_option("display.max_columns",None) #for displaying maximum columns in Data frames
pd.set_option("display.max_rows",None) #for displaing maximum rows in Data frames
from geopy.geocoders import Nominatim #for importing locations
import geocoder # for importing locations
import folium #for forming maps
import requests # for handeling requests
from sklearn.cluster import KMeans #for performing Clustering algorithms
import matplotlib.colors as colors #for planting different color markers on map
import matplotlib.cm as cm #for makin a color maps

## 1) Data Cleaning <a name="data_cleaning"></a>

#### importing the data from excel file

In [2]:
df = pd.read_excel("Delhi_data.xls")

In [3]:
df.dropna(inplace=True)

In [4]:
df.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46,Unnamed: 47,Unnamed: 48,Unnamed: 49,Unnamed: 50,Unnamed: 51
4,1,2,Number of Households,Persons,Males,Females,7.0,8.0,9.0,10,11,12,13.0,14.0,15.0,16,17,18,19.0,20,21,22,23.0,24.0,25.0,26,27,28,29,30,31,32.0,33,34,35,36.0,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52.0
5,NCT of Delhi,Total,3435999,16787941,8987326,7800615,21.2081,820.702,867.957,2012454,1075440,937014,11.9875,868.114,871.284,2812309,1488800,1323509,16.752,12737767,7194856,5542911,86.2088,90.9373,80.7581,5587049,4762026,825023,5307329,4562710,744619,31.6139,279720,199316,80404,1.6662,33398,27458,5940,39475,31352,8123,181852,152758,29094,5332324,4550458,781866,11200892,4225300,6975592,66.7199
8,North West District,Total,736253,3656539,1960922,1695617,27.8122,820.251,864.704,449894,241169,208725,12.3038,856.59,865.472,697237,371546,325691,19.0682,2707855,1541952,1165903,84.4451,89.6612,78.4121,1188545,1022419,166126,1135126,983392,151734,31.0437,53419,39027,14392,1.46092,11433,9650,1783,13289,10495,2794,35896,30655,5241,1127927,971619,156308,2467994,938503,1529491,67.4954
11,Narela,Total,160132,809913,439576,370337,61.4946,788.358,842.487,109475,59122,50353,13.5169,828.079,851.68,155299,83528,71771,19.1748,569830,333642,236188,81.3534,87.6958,73.8124,255167,220922,34245,240311,209991,30320,29.6712,14856,10931,3925,1.83427,7205,5987,1218,8494,6515,1979,6440,5256,1184,233028,203164,29864,554746,218654,336092,68.4945
14,Saraswati Vihar,Total,455011,2250816,1202149,1048667,24.8905,832.276,872.327,270717,144679,126038,12.0275,858.166,871.156,434589,230582,204007,19.3081,1696468,959242,737226,85.6759,90.711,79.9049,732316,629154,103162,700787,605930,94857,31.1348,31529,23224,8305,1.40078,3922,3441,481,4054,3397,657,23829,20595,3234,700511,601721,98790,1518500,572995,945505,67.4644


#### Cleaning data frame

In [5]:
df.drop(4, inplace = True)

In [6]:
df.reset_index(inplace = True)

In [7]:
df.drop(["index"],axis = 1, inplace = True)

In [8]:
df.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46,Unnamed: 47,Unnamed: 48,Unnamed: 49,Unnamed: 50,Unnamed: 51
0,NCT of Delhi,Total,3435999,16787941,8987326,7800615,21.2081,820.702,867.957,2012454,1075440,937014,11.9875,868.114,871.284,2812309,1488800,1323509,16.752,12737767,7194856,5542911,86.2088,90.9373,80.7581,5587049,4762026,825023,5307329,4562710,744619,31.6139,279720,199316,80404,1.6662,33398,27458,5940,39475,31352,8123,181852,152758,29094,5332324,4550458,781866,11200892,4225300,6975592,66.7199
1,North West District,Total,736253,3656539,1960922,1695617,27.8122,820.251,864.704,449894,241169,208725,12.3038,856.59,865.472,697237,371546,325691,19.0682,2707855,1541952,1165903,84.4451,89.6612,78.4121,1188545,1022419,166126,1135126,983392,151734,31.0437,53419,39027,14392,1.46092,11433,9650,1783,13289,10495,2794,35896,30655,5241,1127927,971619,156308,2467994,938503,1529491,67.4954
2,Narela,Total,160132,809913,439576,370337,61.4946,788.358,842.487,109475,59122,50353,13.5169,828.079,851.68,155299,83528,71771,19.1748,569830,333642,236188,81.3534,87.6958,73.8124,255167,220922,34245,240311,209991,30320,29.6712,14856,10931,3925,1.83427,7205,5987,1218,8494,6515,1979,6440,5256,1184,233028,203164,29864,554746,218654,336092,68.4945
3,Saraswati Vihar,Total,455011,2250816,1202149,1048667,24.8905,832.276,872.327,270717,144679,126038,12.0275,858.166,871.156,434589,230582,204007,19.3081,1696468,959242,737226,85.6759,90.711,79.9049,732316,629154,103162,700787,605930,94857,31.1348,31529,23224,8305,1.40078,3922,3441,481,4054,3397,657,23829,20595,3234,700511,601721,98790,1518500,572995,945505,67.4644
4,Model Town,Total,121110,595810,319197,276613,6.94349,810.879,866.59,69702,37368,32334,11.6987,880.445,865.286,107349,57436,49913,18.0173,441557,249068,192489,83.929,88.3756,78.7988,201062,172343,28719,194028,167471,26557,32.5654,7034,4872,2162,1.18058,306,222,84,741,583,158,5627,4804,823,194388,166734,27654,394748,146854,247894,66.254


#### Renaming reqired columns

In [9]:
df.rename(columns={"Unnamed: 2":"Total households",
                  "Unnamed: 3":"Total population",
                   "Unnamed: 4":"Total males",
                  "Unnamed: 5":"Total females",
                  "Unnamed: 0":"Area"}, inplace = True)


In [10]:
df.drop(["Unnamed: 1"],axis = 1,inplace =True)

In [11]:
df.shape

(37, 51)

In [12]:
df.head()

Unnamed: 0,Area,Total households,Total population,Total males,Total females,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46,Unnamed: 47,Unnamed: 48,Unnamed: 49,Unnamed: 50,Unnamed: 51
0,NCT of Delhi,3435999,16787941,8987326,7800615,21.2081,820.702,867.957,2012454,1075440,937014,11.9875,868.114,871.284,2812309,1488800,1323509,16.752,12737767,7194856,5542911,86.2088,90.9373,80.7581,5587049,4762026,825023,5307329,4562710,744619,31.6139,279720,199316,80404,1.6662,33398,27458,5940,39475,31352,8123,181852,152758,29094,5332324,4550458,781866,11200892,4225300,6975592,66.7199
1,North West District,736253,3656539,1960922,1695617,27.8122,820.251,864.704,449894,241169,208725,12.3038,856.59,865.472,697237,371546,325691,19.0682,2707855,1541952,1165903,84.4451,89.6612,78.4121,1188545,1022419,166126,1135126,983392,151734,31.0437,53419,39027,14392,1.46092,11433,9650,1783,13289,10495,2794,35896,30655,5241,1127927,971619,156308,2467994,938503,1529491,67.4954
2,Narela,160132,809913,439576,370337,61.4946,788.358,842.487,109475,59122,50353,13.5169,828.079,851.68,155299,83528,71771,19.1748,569830,333642,236188,81.3534,87.6958,73.8124,255167,220922,34245,240311,209991,30320,29.6712,14856,10931,3925,1.83427,7205,5987,1218,8494,6515,1979,6440,5256,1184,233028,203164,29864,554746,218654,336092,68.4945
3,Saraswati Vihar,455011,2250816,1202149,1048667,24.8905,832.276,872.327,270717,144679,126038,12.0275,858.166,871.156,434589,230582,204007,19.3081,1696468,959242,737226,85.6759,90.711,79.9049,732316,629154,103162,700787,605930,94857,31.1348,31529,23224,8305,1.40078,3922,3441,481,4054,3397,657,23829,20595,3234,700511,601721,98790,1518500,572995,945505,67.4644
4,Model Town,121110,595810,319197,276613,6.94349,810.879,866.59,69702,37368,32334,11.6987,880.445,865.286,107349,57436,49913,18.0173,441557,249068,192489,83.929,88.3756,78.7988,201062,172343,28719,194028,167471,26557,32.5654,7034,4872,2162,1.18058,306,222,84,741,583,158,5627,4804,823,194388,166734,27654,394748,146854,247894,66.254


#### extracting sub divisions from Area Column

In [13]:
delhi_data =[]
for i in df["Area"]:
    if "District" in i:
        delhi_data.append(np.nan)
    else:
        delhi_data.append(i)
        
    
        
        
    

In [14]:
delhi_data = pd.DataFrame(delhi_data, columns=["sub division"])

In [15]:
delhi_data.head()

Unnamed: 0,sub division
0,NCT of Delhi
1,
2,Narela
3,Saraswati Vihar
4,Model Town


In [16]:
delhi_data = pd.concat([delhi_data,df], axis = 1)

In [17]:
delhi_data.head()

Unnamed: 0,sub division,Area,Total households,Total population,Total males,Total females,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46,Unnamed: 47,Unnamed: 48,Unnamed: 49,Unnamed: 50,Unnamed: 51
0,NCT of Delhi,NCT of Delhi,3435999,16787941,8987326,7800615,21.2081,820.702,867.957,2012454,1075440,937014,11.9875,868.114,871.284,2812309,1488800,1323509,16.752,12737767,7194856,5542911,86.2088,90.9373,80.7581,5587049,4762026,825023,5307329,4562710,744619,31.6139,279720,199316,80404,1.6662,33398,27458,5940,39475,31352,8123,181852,152758,29094,5332324,4550458,781866,11200892,4225300,6975592,66.7199
1,,North West District,736253,3656539,1960922,1695617,27.8122,820.251,864.704,449894,241169,208725,12.3038,856.59,865.472,697237,371546,325691,19.0682,2707855,1541952,1165903,84.4451,89.6612,78.4121,1188545,1022419,166126,1135126,983392,151734,31.0437,53419,39027,14392,1.46092,11433,9650,1783,13289,10495,2794,35896,30655,5241,1127927,971619,156308,2467994,938503,1529491,67.4954
2,Narela,Narela,160132,809913,439576,370337,61.4946,788.358,842.487,109475,59122,50353,13.5169,828.079,851.68,155299,83528,71771,19.1748,569830,333642,236188,81.3534,87.6958,73.8124,255167,220922,34245,240311,209991,30320,29.6712,14856,10931,3925,1.83427,7205,5987,1218,8494,6515,1979,6440,5256,1184,233028,203164,29864,554746,218654,336092,68.4945
3,Saraswati Vihar,Saraswati Vihar,455011,2250816,1202149,1048667,24.8905,832.276,872.327,270717,144679,126038,12.0275,858.166,871.156,434589,230582,204007,19.3081,1696468,959242,737226,85.6759,90.711,79.9049,732316,629154,103162,700787,605930,94857,31.1348,31529,23224,8305,1.40078,3922,3441,481,4054,3397,657,23829,20595,3234,700511,601721,98790,1518500,572995,945505,67.4644
4,Model Town,Model Town,121110,595810,319197,276613,6.94349,810.879,866.59,69702,37368,32334,11.6987,880.445,865.286,107349,57436,49913,18.0173,441557,249068,192489,83.929,88.3756,78.7988,201062,172343,28719,194028,167471,26557,32.5654,7034,4872,2162,1.18058,306,222,84,741,583,158,5627,4804,823,194388,166734,27654,394748,146854,247894,66.254


In [18]:
delhi_data.dropna(inplace=True)

In [19]:
delhi_data.reset_index(inplace=True)

In [20]:
delhi_data.drop(["index"],axis = 1,inplace=True)

#### Forming data frame of reqired data only

In [21]:
pop_data = delhi_data[["sub division","Total households","Total population"]]

In [22]:
pop_data.head()

Unnamed: 0,sub division,Total households,Total population
0,NCT of Delhi,3435999,16787941
1,Narela,160132,809913
2,Saraswati Vihar,455011,2250816
3,Model Town,121110,595810
4,Civil Lines,139116,688616


#### Getting coordinates for sub districts

In [23]:
def location_(location):
    lat_lng_coords = None
    while lat_lng_coords == None:
        g = geocoder.arcgis("{}, New Delhi, India".format(location))
        lat_lng_coords = g.latlng
    return lat_lng_coords
        
    

In [24]:
coordinates =  [location_(location) for location in pop_data["sub division"].tolist()]
coordinates[0:5]

[[28.63095000000004, 77.21721000000008],
 [28.83979000000005, 77.07696000000004],
 [28.692070000000058, 77.10385000000008],
 [28.705010000000073, 77.18950000000007],
 [28.67671000000007, 77.21767000000006]]

In [25]:
#making a data frame of above coordinates
coords = pd.DataFrame(coordinates,columns = ["Latitude","Longitude"])

In [26]:
coords.head()


Unnamed: 0,Latitude,Longitude
0,28.63095,77.21721
1,28.83979,77.07696
2,28.69207,77.10385
3,28.70501,77.1895
4,28.67671,77.21767


#### Adding coordinates to Data Frame

In [27]:
delhi_df = pd.concat([pop_data,coords],axis = 1)

In [28]:
delhi_df.drop(0,inplace=True)

In [29]:
delhi_df.reset_index(inplace=True)

In [30]:
delhi_df.head()

Unnamed: 0,index,sub division,Total households,Total population,Latitude,Longitude
0,1,Narela,160132,809913,28.83979,77.07696
1,2,Saraswati Vihar,455011,2250816,28.69207,77.10385
2,3,Model Town,121110,595810,28.70501,77.1895
3,4,Civil Lines,139116,688616,28.67671,77.21767
4,5,Sadar Bazar,25576,130188,28.59028,77.12014


In [31]:
#Getting location of New Delhi
del_loc = location_("delhi")
del_lat = del_loc[0]
del_lng = del_loc[1]

In [32]:
print(del_lat,del_lng)

28.63095000000004 77.21721000000008


#### Forming Map of delhi with sub divisions as markers

In [33]:
delhi_map = folium.Map(location=[del_lat,del_lng],zoom_start=10)
for lat, lng, name, popu in zip(delhi_df["Latitude"],delhi_df["Longitude"],delhi_df["sub division"],delhi_df["Total population"]):
    label_ = ("{},{} people".format(name, popu))
    label_ = folium.Popup(label_,parse_html=True)
    folium.CircleMarker([lat,lng],
                       radius = 7,
                       popup=label_,
                       color = "Yellow",
                       fill = True,
                       fill_color = "3186cc",
                       fill_opacity = 0.7,
                       parse_html=False).add_to(delhi_map)
delhi_map


#### Setting up Foursquare API

In [37]:
client_id = "JMTGCDC0QSE4QZP32C1JHCRHTQDIZZYCVZC0DLTPK0PMURZX"
client_secret = "IVLDBLT0VZUVM5SVYZSKSW4RV0MDTHXZX31E3CLSLYOMKK5E"
version =  "20180605"
print(client_id)
print(client_secret)
print(version)


JMTGCDC0QSE4QZP32C1JHCRHTQDIZZYCVZC0DLTPK0PMURZX
IVLDBLT0VZUVM5SVYZSKSW4RV0MDTHXZX31E3CLSLYOMKK5E
20180605


#### Making a function to get 100 Nearby Venues of each sub division in 4KM radius

In [38]:
def get_nearby_venues(latitude, longitude, names, radius = 4000, limit = 100):
    venue_list = []
    for name, lat, lng in zip(names, latitude, longitude):
        url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(client_id,client_secret,version,lat,lng,radius,limit)
        results = requests.get(url).json()["response"]["groups"][0]["items"]
        venue_list.append([(
        name,
        lat,
        lng,
        v["venue"]["name"],
        v["venue"]["location"]["lat"],
        v["venue"]["location"]["lng"],
        v["venue"]["categories"][0]["name"]) for v in results])
        
        nearby_venues = pd.DataFrame([item for venues in venue_list for item in venues])
        nearby_venues.columns = ["sub division",
                                "division latitude",
                                "division longitude",
                                "venue name",
                                "venue latitude",
                                "venue longitude",
                                "venue category"]
    return(nearby_venues)
    

#### Importing nearby venues 

In [40]:
delhi_nearby = get_nearby_venues(delhi_df["Latitude"],delhi_df["Longitude"],delhi_df["sub division"])

In [41]:
print(delhi_nearby.shape)
delhi_nearby.head()

(1760, 7)


Unnamed: 0,sub division,division latitude,division longitude,venue name,venue latitude,venue longitude,venue category
0,Narela,28.83979,77.07696,vicky traders,28.84618,77.083427,Furniture / Home Store
1,Narela,28.83979,77.07696,Batra matching center,28.845399,77.089483,Women's Store
2,Narela,28.83979,77.07696,Axis Bank ATM,28.82587,77.08643,ATM
3,Narela,28.83979,77.07696,Axis Bank ATM,28.83872,77.10288,ATM
4,Saraswati Vihar,28.69207,77.10385,Starbucks Punjabi Bagh,28.690264,77.109809,Coffee Shop


#### displaying all the Venue Categories in Delhi

In [42]:
dtgs = delhi_nearby["venue category"].unique()

In [43]:
venue_type = pd.DataFrame(dtgs)

In [44]:
venue_type.transpose()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148
0,Furniture / Home Store,Women's Store,ATM,Coffee Shop,Indian Restaurant,Donut Shop,Food Truck,Gym / Fitness Center,Pizza Place,Ice Cream Shop,Sandwich Place,Fast Food Restaurant,Department Store,Hotel,Theme Park,Café,Multiplex,Park,Garden Center,Bar,Shopping Mall,Snack Place,Clothing Store,Light Rail Station,Hookah Bar,Chinese Restaurant,Playground,Bakery,Asian Restaurant,Breakfast Spot,Athletics & Sports,Miscellaneous Shop,Tibetan Restaurant,Diner,Dumpling Restaurant,Grocery Store,South Indian Restaurant,Convenience Store,Bus Station,Boutique,Monument / Landmark,Mosque,BBQ Joint,Restaurant,Airport,Multicuisine Indian Restaurant,Arcade,Airport Service,Pub,History Museum,Cafeteria,Australian Restaurant,Historic Site,Airport Terminal,University,Bistro,Big Box Store,Museum,Dessert Shop,Gym,Bank,Lake,Pool,Italian Restaurant,Shop & Service,Track,Food & Drink Shop,Movie Theater,Train Station,Market,Cricket Ground,Hardware Store,Stadium,Paper / Office Supplies Store,Road,Flea Market,Yoga Studio,Palace,Food Court,Smoke Shop,Hindu Temple,North Indian Restaurant,Garden,Electronics Store,Plaza,Lounge,Molecular Gastronomy Restaurant,Deli / Bodega,Spa,Theater,Spiritual Center,Art Gallery,Sculpture Garden,Art Museum,Music Venue,Irani Cafe,Concert Hall,Mediterranean Restaurant,French Restaurant,Japanese Restaurant,Bookstore,Hotel Bar,Performing Arts Venue,Karnataka Restaurant,Cocktail Bar,Northeast Indian Restaurant,Tea Room,Vietnamese Restaurant,Comfort Food Restaurant,Nightclub,Planetarium,Modern European Restaurant,Racetrack,Golf Course,Gastropub,Hockey Arena,Beer Garden,Portuguese Restaurant,Vegetarian / Vegan Restaurant,Sports Bar,Fried Chicken Joint,High School,American Restaurant,Indian Sweet Shop,Metro Station,Liquor Store,Speakeasy,Gourmet Shop,Tex-Mex Restaurant,Mexican Restaurant,Burger Joint,Sushi Restaurant,Toy / Game Store,Jazz Club,Korean Restaurant,Other Nightlife,Burmese Restaurant,Thai Restaurant,English Restaurant,Indie Movie Theater,Mughlai Restaurant,Neighborhood,Temple,Bengali Restaurant,Food,Trail,Middle Eastern Restaurant,Soccer Stadium,Salad Place


In [45]:
#total number of different venue categories
dtgs.shape

(149,)

## 2) Data Analysis<a name="analysis"></a>

#### Displaying all Venue Categories in Columns and each Sub Divisions in rows

In [46]:
delhi_onehot = pd.get_dummies(delhi_nearby["venue category"])

In [47]:
delhi_onehot.head()

Unnamed: 0,ATM,Airport,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bakery,Bank,Bar,Beer Garden,Bengali Restaurant,Big Box Store,Bistro,Bookstore,Boutique,Breakfast Spot,Burger Joint,Burmese Restaurant,Bus Station,Cafeteria,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gastropub,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,High School,Hindu Temple,Historic Site,History Museum,Hockey Arena,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jazz Club,Karnataka Restaurant,Korean Restaurant,Lake,Light Rail Station,Liquor Store,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Mosque,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,Nightclub,North Indian Restaurant,Northeast Indian Restaurant,Other Nightlife,Palace,Paper / Office Supplies Store,Park,Performing Arts Venue,Pizza Place,Planetarium,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Racetrack,Restaurant,Road,Salad Place,Sandwich Place,Sculpture Garden,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Stadium,South Indian Restaurant,Spa,Speakeasy,Spiritual Center,Sports Bar,Stadium,Sushi Restaurant,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [48]:
delhi_onehot.shape


(1760, 149)

#### Adding Sub division column to onehot data frame

In [49]:
sub_division = delhi_nearby["sub division"].values

In [50]:
sub_division.shape

(1760,)

In [51]:
delhi_onehot.head()

Unnamed: 0,ATM,Airport,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bakery,Bank,Bar,Beer Garden,Bengali Restaurant,Big Box Store,Bistro,Bookstore,Boutique,Breakfast Spot,Burger Joint,Burmese Restaurant,Bus Station,Cafeteria,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gastropub,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,High School,Hindu Temple,Historic Site,History Museum,Hockey Arena,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jazz Club,Karnataka Restaurant,Korean Restaurant,Lake,Light Rail Station,Liquor Store,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Mosque,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,Nightclub,North Indian Restaurant,Northeast Indian Restaurant,Other Nightlife,Palace,Paper / Office Supplies Store,Park,Performing Arts Venue,Pizza Place,Planetarium,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Racetrack,Restaurant,Road,Salad Place,Sandwich Place,Sculpture Garden,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Stadium,South Indian Restaurant,Spa,Speakeasy,Spiritual Center,Sports Bar,Stadium,Sushi Restaurant,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [52]:
delhi_onehot.insert(0,"sub division",sub_division)

In [53]:
delhi_onehot.head()

Unnamed: 0,sub division,ATM,Airport,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bakery,Bank,Bar,Beer Garden,Bengali Restaurant,Big Box Store,Bistro,Bookstore,Boutique,Breakfast Spot,Burger Joint,Burmese Restaurant,Bus Station,Cafeteria,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gastropub,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,High School,Hindu Temple,Historic Site,History Museum,Hockey Arena,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jazz Club,Karnataka Restaurant,Korean Restaurant,Lake,Light Rail Station,Liquor Store,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Mosque,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,Nightclub,North Indian Restaurant,Northeast Indian Restaurant,Other Nightlife,Palace,Paper / Office Supplies Store,Park,Performing Arts Venue,Pizza Place,Planetarium,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Racetrack,Restaurant,Road,Salad Place,Sandwich Place,Sculpture Garden,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Stadium,South Indian Restaurant,Spa,Speakeasy,Spiritual Center,Sports Bar,Stadium,Sushi Restaurant,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Women's Store,Yoga Studio
0,Narela,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Narela,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
2,Narela,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Narela,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Saraswati Vihar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Forming a new data frame with average occurance of each venue in each sub division

In [54]:
delhi_grouped = delhi_onehot.groupby("sub division").mean().reset_index()

In [55]:
delhi_grouped.shape

(27, 150)

In [56]:
delhi_grouped.head()

Unnamed: 0,sub division,ATM,Airport,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bakery,Bank,Bar,Beer Garden,Bengali Restaurant,Big Box Store,Bistro,Bookstore,Boutique,Breakfast Spot,Burger Joint,Burmese Restaurant,Bus Station,Cafeteria,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gastropub,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,High School,Hindu Temple,Historic Site,History Museum,Hockey Arena,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jazz Club,Karnataka Restaurant,Korean Restaurant,Lake,Light Rail Station,Liquor Store,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Mosque,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,Nightclub,North Indian Restaurant,Northeast Indian Restaurant,Other Nightlife,Palace,Paper / Office Supplies Store,Park,Performing Arts Venue,Pizza Place,Planetarium,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Racetrack,Restaurant,Road,Salad Place,Sandwich Place,Sculpture Garden,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Stadium,South Indian Restaurant,Spa,Speakeasy,Spiritual Center,Sports Bar,Stadium,Sushi Restaurant,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Women's Store,Yoga Studio
0,Chanakya Puri,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.07,0.05,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.05,0.02,0.0,0.14,0.0,0.0,0.01,0.02,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.01,0.0,0.01,0.03,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0
1,Civil Lines,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.0,0.075472,0.075472,0.0,0.0,0.075472,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.037736,0.018868,0.0,0.0,0.075472,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075472,0.0,0.0,0.075472,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09434,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.075472,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Connaught Place,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.01,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.08,0.02,0.01,0.0,0.02,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.06,0.01,0.01,0.19,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.03,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Darya Ganj,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.02,0.0,0.0,0.01,0.03,0.0,0.01,0.01,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.09,0.02,0.01,0.0,0.02,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.02,0.01,0.0,0.06,0.0,0.01,0.16,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Defence Colony,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.03,0.0,0.03,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.08,0.03,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.01,0.04,0.01,0.01,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.05,0.01,0.01,0.11,0.0,0.01,0.01,0.04,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.05,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Forming a data frame with Venues related to Cafe only

In [57]:
delhi_cafe = delhi_grouped[["sub division","Café","Metro Station","University","Market"]]

In [58]:
delhi_cafe.head()


Unnamed: 0,sub division,Café,Metro Station,University,Market
0,Chanakya Puri,0.07,0.0,0.01,0.02
1,Civil Lines,0.075472,0.0,0.0,0.0
2,Connaught Place,0.08,0.0,0.0,0.01
3,Darya Ganj,0.09,0.0,0.0,0.01
4,Defence Colony,0.08,0.0,0.0,0.05


#### Adding Normalized population data to the data frame

In [59]:
pop_data = pop_data.drop(0)

In [60]:
total_pop = pop_data.sort_values("sub division")

In [61]:
total_pop.reset_index(inplace=True)

In [62]:
total_pop.head()

Unnamed: 0,index,sub division,Total households,Total population
0,14,Chanakya Puri,15074,61382
1,4,Civil Lines,139116,688616
2,13,Connaught Place,6814,28228
3,16,Darya Ganj,53125,271108
4,26,Defence Colony,137677,637775


In [63]:
total_pop = total_pop.drop(["Total households","sub division","index"],axis = 1)

In [64]:
total_pop.head()

Unnamed: 0,Total population
0,61382
1,688616
2,28228
3,271108
4,637775


In [65]:
total_pop = total_pop.astype(float)

In [66]:
total_pop = total_pop/(100*total_pop.max())

In [67]:
total_pop.head()

Unnamed: 0,Total population
0,0.000273
1,0.003059
2,0.000125
3,0.001204
4,0.002834


In [68]:
delhi_cafe = pd.concat([delhi_cafe,total_pop],axis = 1)

In [69]:
delhi_cafe.head()

Unnamed: 0,sub division,Café,Metro Station,University,Market,Total population
0,Chanakya Puri,0.07,0.0,0.01,0.02,0.000273
1,Civil Lines,0.075472,0.0,0.0,0.0,0.003059
2,Connaught Place,0.08,0.0,0.0,0.01,0.000125
3,Darya Ganj,0.09,0.0,0.0,0.01,0.001204
4,Defence Colony,0.08,0.0,0.0,0.05,0.002834


## 3) Clustering<a name="clustering"></a>

#### making a data frame to fit in K-Means

In [70]:
cluster_data = delhi_cafe.drop(["sub division"],axis = 1)

In [71]:
cluster_data.head()

Unnamed: 0,Café,Metro Station,University,Market,Total population
0,0.07,0.0,0.01,0.02,0.000273
1,0.075472,0.0,0.0,0.0,0.003059
2,0.08,0.0,0.0,0.01,0.000125
3,0.09,0.0,0.0,0.01,0.001204
4,0.08,0.0,0.0,0.05,0.002834


#### Forming K-Means algorithm and fitting the data set

In [72]:
k = 5
k_means = KMeans(n_clusters=k,random_state=0)
k_means.fit(cluster_data)

KMeans(n_clusters=5, random_state=0)

In [73]:
#the cluster array
k_means.labels_

array([0, 0, 0, 0, 4, 3, 2, 2, 4, 0, 0, 0, 1, 2, 0, 0, 2, 0, 2, 2, 0, 2,
       0, 2, 0, 0, 2])

#### Data cleaning to form Final Data Frame

In [74]:
delhi_df.head()

Unnamed: 0,index,sub division,Total households,Total population,Latitude,Longitude
0,1,Narela,160132,809913,28.83979,77.07696
1,2,Saraswati Vihar,455011,2250816,28.69207,77.10385
2,3,Model Town,121110,595810,28.70501,77.1895
3,4,Civil Lines,139116,688616,28.67671,77.21767
4,5,Sadar Bazar,25576,130188,28.59028,77.12014


In [75]:
delhi_df = delhi_df.sort_values("sub division")

In [76]:
delhi_df.head()

Unnamed: 0,index,sub division,Total households,Total population,Latitude,Longitude
13,14,Chanakya Puri,15074,61382,28.59506,77.18573
3,4,Civil Lines,139116,688616,28.67671,77.21767
12,13,Connaught Place,6814,28228,28.63394,77.21968
15,16,Darya Ganj,53125,271108,28.62832,77.24727
25,26,Defence Colony,137677,637775,28.57298,77.23357


In [77]:
coords = delhi_df[["Latitude","Longitude"]]

In [78]:
coords.reset_index(drop=True,inplace=True)

#### Forming Final Data Frame with Clusters

In [79]:
delhi_cafe.insert(0,"Clusters",k_means.labels_)

In [80]:
delhi_cafe = pd.concat([delhi_cafe,coords],axis=1)

In [81]:
delhi_cafe

Unnamed: 0,Clusters,sub division,Café,Metro Station,University,Market,Total population,Latitude,Longitude
0,0,Chanakya Puri,0.07,0.0,0.01,0.02,0.000273,28.59506,77.18573
1,0,Civil Lines,0.075472,0.0,0.0,0.0,0.003059,28.67671,77.21767
2,0,Connaught Place,0.08,0.0,0.0,0.01,0.000125,28.63394,77.21968
3,0,Darya Ganj,0.09,0.0,0.0,0.01,0.001204,28.62832,77.24727
4,4,Defence Colony,0.08,0.0,0.0,0.05,0.002834,28.57298,77.23357
5,3,Delhi Cantonment,0.142857,0.0,0.035714,0.0,0.001271,28.59151,77.12945
6,2,Gandhi Nagar,0.046875,0.0,0.0,0.03125,0.001756,28.66091,77.26432
7,2,Hauz Khas,0.06,0.0,0.0,0.02,0.00547,28.55109,77.20399
8,4,Kalkaji,0.07,0.0,0.0,0.06,0.003834,28.53662,77.26094
9,0,Karol Bagh,0.082474,0.0,0.0,0.010309,0.000607,28.65045,77.18873


#### Displaying Clusters on map

In [82]:
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

delhi_cluster_map = folium.Map([del_lat,del_lng],zoom_start=10.5)
for lat,lng,name,clus in zip(delhi_cafe["Latitude"],delhi_cafe["Longitude"],delhi_cafe["sub division"],delhi_cafe["Clusters"]):
    label = "name: {},cluster: {}".format(name,clus)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker([lat,lng],
                       radius=5,
                       popup=label,
                       fill=True,
                       fill_color=rainbow[clus - 1],
                       color=rainbow[clus - 1],
                       fill_opacity=0.7
                       ).add_to(delhi_cluster_map)
    


In [83]:
delhi_cluster_map

## 4) Exploring Clusters <a name="exploring_clusters"></a>

### Cluster 0

In [84]:
delhi_cafe.loc[delhi_cafe["Clusters"]==0,delhi_cafe.columns[list(range(0,7))]]

Unnamed: 0,Clusters,sub division,Café,Metro Station,University,Market,Total population
0,0,Chanakya Puri,0.07,0.0,0.01,0.02,0.000273
1,0,Civil Lines,0.075472,0.0,0.0,0.0,0.003059
2,0,Connaught Place,0.08,0.0,0.0,0.01,0.000125
3,0,Darya Ganj,0.09,0.0,0.0,0.01,0.001204
9,0,Karol Bagh,0.082474,0.0,0.0,0.010309,0.000607
10,0,Kotwali,0.08,0.0,0.04,0.0,0.000307
11,0,Model Town,0.069444,0.0,0.0,0.0,0.002647
14,0,Pahar Ganj,0.06,0.0,0.0,0.0,0.000776
15,0,Parliament Street,0.1,0.0,0.0,0.01,0.000233
17,0,Preet Vihar,0.090909,0.0,0.0,0.015152,0.004736


### Cluster 1

In [85]:
delhi_cafe.loc[delhi_cafe["Clusters"]==1,delhi_cafe.columns[list(range(0,7))]]

Unnamed: 0,Clusters,sub division,Café,Metro Station,University,Market,Total population
12,1,Najafgarh,0.0,0.25,0.0,0.0,0.006065


### Cluster 2

In [86]:
delhi_cafe.loc[delhi_cafe["Clusters"]==2,delhi_cafe.columns[list(range(0,7))]]

Unnamed: 0,Clusters,sub division,Café,Metro Station,University,Market,Total population
6,2,Gandhi Nagar,0.046875,0.0,0.0,0.03125,0.001756
7,2,Hauz Khas,0.06,0.0,0.0,0.02,0.00547
13,2,Narela,0.0,0.0,0.0,0.0,0.003598
16,2,Patel Nagar,0.052632,0.0,0.0,0.013158,0.005608
18,2,Punjabi Bagh,0.032967,0.010989,0.0,0.010989,0.003552
19,2,Rajouri Garden,0.056818,0.022727,0.0,0.022727,0.00214
21,2,Saraswati Vihar,0.022222,0.0,0.0,0.0,0.01
23,2,Seema Puri,0.057143,0.0,0.0,0.0,0.002399
26,2,Vivek Vihar,0.040816,0.0,0.0,0.0,0.001101


### Cluster 3

In [87]:
delhi_cafe.loc[delhi_cafe["Clusters"]==3,delhi_cafe.columns[list(range(0,7))]]

Unnamed: 0,Clusters,sub division,Café,Metro Station,University,Market,Total population
5,3,Delhi Cantonment,0.142857,0.0,0.035714,0.0,0.001271


### Cluster 4

In [88]:
delhi_cafe.loc[delhi_cafe["Clusters"]==4,delhi_cafe.columns[list(range(0,7))]]

Unnamed: 0,Clusters,sub division,Café,Metro Station,University,Market,Total population
4,4,Defence Colony,0.08,0.0,0.0,0.05,0.002834
8,4,Kalkaji,0.07,0.0,0.0,0.06,0.003834


## 5) Observations<a name="observations"></a>

We observed that Cluster 2 has least number of existing Cafe and has some of the most densely populated areas with some good amount of market making Sub Divisions in Cluster 2 Highly Recommended to Open a New Cafe.

Cluster 0 has most number of existing Cafe because of High amount of Nearby Market. Opening a Cafe in sub divisions of this cluster will give fierce competition

Cluster 4 has Highest amount of nearby market and therefore has high amount of exisiting Cafe. Opening a Cafe here will also give tough competition 

Cluster 1 is an exception with no nearby cafe, good amount of nearby population and many nearby metro stations

Cluster 3 is an exception too. Cluster 3 has Highest Number of Cafe with surprisingly No nearby market. This happened because Cluster 3 has Highest Number of nearby universities making it a Honey Comb for Students to hangout. we observed a similar trend in kotwali sub district of cluster 0.


# 6) Result<a name="result"></a>

 Our analysis shows the amount of cafe, nearby market, nearby metro stations, nearby universities and population of each sub division and then merge the similar sub divisions into clusters whose locations are represented on the map. we also made it easy to choose a sub district to open a Cafe with maximum location advantage and showed the sub districts that you should avoid for opening a cafe.

# 7) Conclusion<a name="conclusion"></a>

From the above observations we conclude that Cluster 2 and Cluster 1 are best suited for opening a new cafe. Opening a cafe in a suitable sub district of these clusters will give less competition and has capabilitity to attract good amount of customers too.

Cluster 0 has the highest competition but if you are willing to ignore competition then Cluster 4 is better than Cluster 0 for opening a cafe. Not only Cluster 4 has equal amount of average competition as cluster 0 but Cluster 4 has way more amount of nearby market than cluster 0 which will attract lot more of customers.



# Thankyou!