# Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods

## Introduction

Mumbai (Old name Bombay) is a major city in India. It is the most populous city in India with an estimated population of 12.4 million as of 2011. It is the financial, commercial and entertainment capital of India. https://en.wikipedia.org/wiki/Mumbai

Objective:<br> We intend to open a new pizza place in Mumbai. There are many areas in Mumbai, classified on the basis of Pin Code. We intend to find the area which will be most suitable to opening a Pizza Place. The area should have relatively less completion and good amount of customers.

Target Audience:<br> Businessmen who is interested in starting a pizza shop in Mumbai. Analyst who want to understand the distribution on Pizza shops in Mumbai.

## Data

Following data will be required for the research:
* List of Pin Codes in Mumbai: This data can be obtained from the website 'Maps of India'. url: https://www.mapsofindia.com/pincode/india/maharashtra/mumbai
* The geo coordinates (Latitude and Longitude) for these areas can be obtained from Google API. This data will help in plotting the areas on the map.
* From Foresquare API we will need information regarding 
    * Pizza places in Mumbai based on area
    * Schools/Colleges/Universities in Mumbai based on area
    * Offices in Mumbai based on area
* Form the foresquare dat we will have to extract
    * Venue
    * Venue Latitude
    * Venue Longitude
    * Venue Category

## Methodology 

Initially we will require a list of Pincodes for Mumbai. We will extract them from a website. We will have to remove any duplicates.<br>
*-> We have 89 Pincodes*



We will design a metric called 'Pizza shop metric'. Each locality will have a value corresponding to this metric. 

Steps:
* The number of each venue type in a particular area will be added. 
* This sum will be further multiplied with the multiplier
* The addition of all such venue types will be the metric for the area.

Multipliers:<br>
* -1 : Pizza places in Mumbai based on area <br>
* +1 : Schools/Colleges/Universities in Mumbai based on area<br>
* +2 : Offices in Mumbai based on area<br>



#### Import Libraries

In [1]:
#Inport required libraries
import pandas as pd
import numpy as np
import requests
from geopy import geocoders
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
from geopy.geocoders import Nominatim
!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library
from sklearn.cluster import KMeans
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
print('Libraries imported')

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forge

altair-2.2.2-p 100% |################################| Time: 0:00:00  55.65 MB/s
branca-0.3.1-p 100% |################################| Time: 0:00:00  35.23 MB/s
vincent-0.4.4- 100% |################################| Time: 0:00:00  39.89 MB/s
folium-0.5.0-p 100% |################################| Time: 0:00:00  45.91 MB/s
Libraries imported


#### Obtain data of Pincodes

In [2]:
#This website contains Pin codes of various areas in Mumbai
website_url = requests.get("https://finkode.com/mh/mumbai.html").text
soup = BeautifulSoup(website_url,'lxml')
My_table_mumbai = soup.find('table',{'class':'plist'})
A=[]
B=[]
C=[]
D=[]
#Load the data
for row in My_table_mumbai.findAll("tr"):
    cells = row.findAll("td")
    if cells !=[]:
        A.append(cells[0].find(text=True))
        B.append(cells[2].find(text=True))
df=pd.DataFrame(A,columns=['Area'])
df['Pin Code']=B
#df = df.iloc[1:]
#df = df.ix[df['Pin Code'] != 'Pincode ']
df.drop_duplicates(subset ='Pin Code', keep = 'first', inplace = True)
#df = df.groupby('Pin Code').Area.agg([('Area', ', '.join)])
print('Dataframe containg areas and Pin codes')
print(df.shape)
df.head()

Dataframe containg areas and Pin codes
(89, 2)


Unnamed: 0,Area,Pin Code
0,A I Staff Colony S.O,400029
1,Aareymilk Colony S.O,400065
2,Agripada S.O,400011
3,Ambewadi S.O (Mumbai),400004
4,Andheri East S.O,400069


#### Obtain coordinates of these Pin Codes

In [3]:
C=[]
D=[]
geolocator = Nominatim(user_agent="specify_your_app_name_here")
for index, row in df.iterrows():
    
    try:
        location = geolocator.geocode(row['Pin Code'])
        C.append(location.latitude)
        D.append(location.longitude)
    except AttributeError:
        C.append('No data found')
        D.append('No data found') 
    
df['Latitude']=C
df['Longitude']=D

#Remove 'No data found' fields
#df = df.ix[df['Latitude'] != 'No data found']

print('Dataframe containg areas, Pin codes and coordinates')
print(df.shape)
df.head()

Dataframe containg areas, Pin codes and coordinates
(89, 4)


Unnamed: 0,Area,Pin Code,Latitude,Longitude
0,A I Staff Colony S.O,400029,48.52507,44.586426
1,Aareymilk Colony S.O,400065,19.16689,72.854364
2,Agripada S.O,400011,18.983293,72.826819
3,Ambewadi S.O (Mumbai),400004,18.954329,72.82173
4,Andheri East S.O,400069,19.119432,72.851426


#### Plot the map of Mumbai

##### Plotting map of Mumbai centered around its coordinates

In [7]:
#Obtain the coordinates of Mumbai
address = 'Mumbai, India'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of New York using latitude and longitude values
map_mumbai = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, area in zip(df['Latitude'], df['Longitude'], df['Area']):
    label = '{}'.format(area)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mumbai)  
    
#Display the map of Mumbai
print ('Map of Mumbai with areas mapped against Pin Codes')
map_mumbai

Map of Mumbai with areas mapped against Pin Codes


##### Function to add markers 

In [8]:
# function to add markers for given venues to map
def addToMap(df, color, existingMap):
    for lat, lng, area, venue, venueCat in zip(df['Venue Latitude'], df['Venue Longitude'], df['Area'], df['Venue'], df['Venue Category']):
        label = '{} ({}) - {}'.format(venue, venueCat, area)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.7).add_to(existingMap)

#### Get nearby venues using ForeSquare API

##### Function to get nearby venues

In [9]:
CLIENT_ID = 'XKCQNKBQZ5IIBB2SKC0OOS4SARS1NGA0I25GV4WT1WO4VRNK' # Foursquare ID
CLIENT_SECRET = 'BHYKLDW4ZDXZC5IB4VYX2KQEE0DSHC15VIBIXZXSO3LCI3LB' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

def getNearbyVenues(names, latitudes, longitudes, categoryID='', radius=1000):
    LIMIT = 100 # limit of number of venues returned by Foursquare API
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)

        if (categoryID != ''):
                url = url + '&categoryId={}'
                url = url.format(categoryID)
       
        # print(url)
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])


    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Area', 
                  'Area Latitude', 
                  'Area Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)           

##### Lets obtain Pizza Places in the area

In [10]:
# 4bf58dd8d48988d1ca941735 - Category ID for Pizza Place
pizza_venues = getNearbyVenues(names=df['Area'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'],
                                   categoryID='4bf58dd8d48988d1ca941735'
                                  )
print('Pizza venues containg areas, Pin codes and coordinates')
print(pizza_venues.shape)
print (pizza_venues['Venue Category'].unique())
pizza_venues.head()

Pizza venues containg areas, Pin codes and coordinates
(437, 7)
['Italian Restaurant' 'Fast Food Restaurant' 'Pizza Place' 'Ice Cream Shop'
 'Snack Place' 'Sandwich Place' 'Bakery' 'Mediterranean Restaurant'
 'Juice Bar' 'Pub' 'Diner' 'Shopping Mall' 'Café' 'Restaurant' 'Lounge'
 'Bagel Shop' 'Bar' 'Residential Building (Apartment / Condo)'
 'Indian Restaurant' 'Burger Joint' 'Gastropub' 'American Restaurant'
 'Breakfast Spot' 'Deli / Bodega' 'Hotel Bar' 'Mexican Restaurant'
 'Dessert Shop' 'Brewery' 'Falafel Restaurant' 'Coffee Shop'
 'Tex-Mex Restaurant' 'Performing Arts Venue']


Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Aareymilk Colony S.O,19.16689,72.854364,Prego,19.17265,72.860339,Italian Restaurant
1,Aareymilk Colony S.O,19.16689,72.854364,Taco Bell,19.173897,72.860184,Fast Food Restaurant
2,Aareymilk Colony S.O,19.16689,72.854364,Domino's Pizza,19.163121,72.845844,Pizza Place
3,Agripada S.O,18.983293,72.826819,The Tote,18.980391,72.820367,Italian Restaurant
4,Ambewadi S.O (Mumbai),18.954329,72.82173,Bachelorr's Ice Creams,18.954113,72.815396,Ice Cream Shop


In [11]:
# Plot pizza places on the map
map_mumbai_pizza_place = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(pizza_venues, 'red', map_mumbai_pizza_place)
print ('Map of Mumbai with Pizza Venues')
map_mumbai_pizza_place

Map of Mumbai with Pizza Venues


##### Lets obtain schools in the area

In [12]:
# 4bf58dd8d48988d13b941735 - Category ID for Schools
academic_venues_school = getNearbyVenues(names=df['Area'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'],
                                   categoryID='4bf58dd8d48988d13b941735'
                                  )
print('Academic venues - School containg areas, Pin codes and coordinates')
print(academic_venues_school.shape)
print (academic_venues_school['Venue Category'].unique())
academic_venues_school.head()

Academic venues - School containg areas, Pin codes and coordinates
(342, 7)
['High School' 'Nursery School' 'Religious School' 'Adult Education Center'
 'School' 'Preschool' 'Music School' 'Elementary School' 'Tattoo Parlor'
 'College Academic Building' 'Student Center' 'Private School' 'University'
 'Driving School' 'Church' 'Office' 'College Science Building'
 'Language School' 'General College & University' 'Flight School'
 'IT Services' 'Temple' 'Middle School' 'Monument / Landmark']


Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,A I Staff Colony S.O,48.52507,44.586426,Школа #113,48.522261,44.584923,High School
1,A I Staff Colony S.O,48.52507,44.586426,МДОУ Детский сад #223 (территория садика),48.523435,44.585895,Nursery School
2,Aareymilk Colony S.O,19.16689,72.854364,St Pius School,19.166435,72.855682,High School
3,Aareymilk Colony S.O,19.16689,72.854364,Pahadi Municipal school,19.16648,72.855106,High School
4,Aareymilk Colony S.O,19.16689,72.854364,ICIT College,19.163291,72.847749,High School


In [13]:
# Plot schools on the map
map_mumbai_school = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(academic_venues_school, 'red', map_mumbai_school)
print ('Map of Mumbai with schools')
map_mumbai_school

Map of Mumbai with schools


##### Lets obtain colleges/universities in the area

In [14]:
# 4d4b7105d754a06372d81259 - Category ID for College and University
academic_venues_college = getNearbyVenues(names=df['Area'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'],
                                   categoryID='4d4b7105d754a06372d81259'
                                  )
print('Academic venues - College/University containg areas, Pin codes and coordinates')
print(academic_venues_college.shape)
print (academic_venues_college['Venue Category'].unique())
academic_venues_college.head()

Academic venues - College/University containg areas, Pin codes and coordinates
(607, 7)
['University' 'Trade School' 'College Academic Building' 'High School'
 'General College & University' 'College Administrative Building'
 'College & University' 'College Classroom' 'Medical School'
 'College Cafeteria' 'College Soccer Field' 'Student Center' 'Courthouse'
 'College Technology Building' 'College Engineering Building'
 'College Science Building' 'College Library' 'Fraternity House'
 'College Auditorium' 'College Quad' 'College Gym' 'Library' 'College Lab'
 'Law School' 'College Math Building' 'College Arts Building' 'Theater'
 'Community College' 'College Residence Hall' 'School' 'College Stadium'
 'College Rec Center' 'College Football Field' 'College Track'
 'Advertising Agency' 'Bookstore' 'College Communications Building'
 'College Bookstore' 'Sorority House' 'College Tennis Court'
 'Medical Center']


Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,A I Staff Colony S.O,48.52507,44.586426,ГУК МФЮА,48.523869,44.57349,University
1,A I Staff Colony S.O,48.52507,44.586426,ПУ 36,48.521264,44.584839,Trade School
2,Aareymilk Colony S.O,19.16689,72.854364,Femina Believe,19.167201,72.853217,College Academic Building
3,Aareymilk Colony S.O,19.16689,72.854364,St Pius School,19.166435,72.855682,High School
4,Aareymilk Colony S.O,19.16689,72.854364,St. Pius College,19.166498,72.858134,College Academic Building


In [15]:
# Plot colleges/universities on the map
map_mumbai_college = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(academic_venues_college, 'red', map_mumbai_college)
print ('Map of Mumbai with colleges/universities')
map_mumbai_college

Map of Mumbai with colleges/universities


##### Lets obtain offices in the area

In [16]:
# 4d4b7105d754a06375d81259 - Category ID for Offices
office_venues = getNearbyVenues(names=df['Area'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'],
                                   categoryID='4d4b7105d754a06375d81259'
                                  )
print('Office venues containg areas, Pin codes and coordinates')
print(office_venues.shape)
print (office_venues['Venue Category'].unique())
office_venues.head()

Office venues containg areas, Pin codes and coordinates
(2182, 7)
['High School' 'Hospital' 'Factory' 'Advertising Agency' 'Post Office'
 'Government Building' 'Office' 'Building' 'Tech Startup' 'Church'
 'Coworking Space' 'Other Great Outdoors' 'Bank' 'Design Studio'
 'Housing Development' 'Club House' 'Police Station' 'Medical Center'
 'Temple' 'Event Space' 'Hindu Temple' 'Department Store' 'Neighborhood'
 'Social Club' 'Mosque' 'Convention Center' 'Parking' 'Business Center'
 'Monument / Landmark' 'Miscellaneous Shop' 'Medical Lab' 'Resort'
 "Dentist's Office" 'Campaign Office' 'Gym / Fitness Center' 'School'
 'Capitol Building' 'Non-Profit' 'Cultural Center'
 'Professional & Other Places' 'Salon / Barbershop' 'Hotel' 'Tattoo Parlor'
 'Library' 'Music School' 'Bar' 'Fire Station' 'Travel Agency'
 'Radio Station' 'Town Hall' 'Courthouse' 'Military Base' 'Trade School'
 "Doctor's Office" 'City Hall' 'Park' 'Field' 'Community Center'
 'College Academic Building' 'Spiritual Center' 'Au

Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,A I Staff Colony S.O,48.52507,44.586426,Школа #113,48.522261,44.584923,High School
1,A I Staff Colony S.O,48.52507,44.586426,Наркологический диспансер,48.520186,44.580013,Hospital
2,A I Staff Colony S.O,48.52507,44.586426,Редаелли ССМ,48.523579,44.576721,Factory
3,A I Staff Colony S.O,48.52507,44.586426,Рекламное агентство ALT,48.519709,44.577581,Advertising Agency
4,A I Staff Colony S.O,48.52507,44.586426,Почта России 400029,48.519733,44.577539,Post Office


In [17]:
# Plot offices on the map
map_mumbai_office = folium.Map(location=[latitude, longitude], zoom_start=12)
addToMap(office_venues, 'red', map_mumbai_office)
print ('Map of Mumbai with offices')
map_mumbai_office

Map of Mumbai with offices


#### Prepaaring the metric for area selection

##### Function to find count of venues

In [18]:
def addColumn(startDf, columnTitle, dataDf):
    grouped = dataDf.groupby('Area').count()
    
    for n in startDf['Area']:
        try:
            startDf.loc[startDf['Area'] == n,columnTitle] = grouped.loc[n, 'Venue']
        except:
            startDf.loc[startDf['Area'] == n,columnTitle] = 0

##### Adding venue count to data frame

In [19]:
df_venue_count = df.copy()
#df_metric.rename(columns={'Area':'Area'}, inplace=True)
addColumn(df_venue_count, 'PizzaPlace', pizza_venues)
addColumn(df_venue_count, 'School', academic_venues_school)
addColumn(df_venue_count, 'College', academic_venues_college)
addColumn(df_venue_count, 'Office', office_venues)

print('Dataframe containg count of venues')
print(df_venue_count.shape)
df_venue_count.head()

Dataframe containg count of venues
(89, 8)


Unnamed: 0,Area,Pin Code,Latitude,Longitude,PizzaPlace,School,College,Office
0,A I Staff Colony S.O,400029,48.52507,44.586426,0.0,2.0,2.0,6.0
1,Aareymilk Colony S.O,400065,19.16689,72.854364,3.0,6.0,7.0,27.0
2,Agripada S.O,400011,18.983293,72.826819,1.0,1.0,4.0,28.0
3,Ambewadi S.O (Mumbai),400004,18.954329,72.82173,9.0,5.0,4.0,17.0
4,Andheri East S.O,400069,19.119432,72.851426,4.0,6.0,8.0,42.0


##### Adding weighted average metric to dataframe

In [20]:
#defining weights
weight_pizza = -1
weight_school = 1
weight_college = 1
weight_office = 2

df_metric = df_venue_count[['Area']].copy()
df_metric['Metric'] = df_venue_count['PizzaPlace'] * weight_pizza + df_venue_count['School'] * weight_school + df_venue_count['College'] * weight_college + df_venue_count['Office'] * weight_office
df_metric = df_metric.sort_values(by=['Metric'], ascending=False)

print('Dataframe containg selection metric of venues')
print(df_metric.shape)
df_metric.head()

Dataframe containg selection metric of venues
(89, 2)


Unnamed: 0,Area,Metric
17,BARC S.O,200.0
59,Delisle Road S.O,147.0
153,New Prabhadevi Road S.O,131.0
34,Century Mill S.O,128.0
81,High Court Building S.O (Mumbai),126.0


In [21]:
map_mum_metric = folium.Map(location=[latitude, longitude], zoom_start=15)
top_area = 'Delisle Road S.O'

mumbai_rec = df[df['Area'] == top_area]

for lat, lng, local in zip(mumbai_rec['Latitude'], mumbai_rec['Longitude'], mumbai_rec['Area']):
    label = '{}'.format(local)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(map_mum_metric) 

addToMap(pizza_venues[pizza_venues['Area'] == top_area], 'red', map_mum_metric)
addToMap(academic_venues_school[academic_venues_school['Area'] == top_area], 'green', map_mum_metric)
addToMap(academic_venues_college[academic_venues_college['Area'] == top_area], 'gold', map_mum_metric)
addToMap(office_venues[office_venues['Area'] == top_area], 'fuchsia', map_mum_metric)

map_mum_metric

# Thanks for reviewing

# :)