# The Battle of Neighbourhood

<h2>Introduction</h2>

Mumbai is the financial capital of India and is one of the most densely populated 
cities in the world.
    It lies on the west coast of India and attracts heavy tourism 
from all over the globe every year. Personally, I have been brought up in Mumbai 
and have loved the city from the bottom of my heart. It is one of the major hubs 
of the world and is extremely diverse with people from various ethnicities 
residing here. The multi-cultural nature of the city of Mumbai has brought along 
with it numerous cuisines from all over the world. The people of India generally 
love food and I personally love to try different cuisines and experience different 
flavors. Thus, the aim of this project is to study the neighborhoods in Mumbai to 
determine possible locations for starting a restaurant. This project can be useful 
for business owners and entrepreneurs who are looking to invest and open a 
restaurant in Mumbai. The main objective of this project is to carefully analyze 
appropriate data and find recommendations for the stakeholders

# Data Collection


The data required for this project has been collected from multiple sources. A summary of the data required for this project is given below.

   #  Neighborhoods Data 

The data of the neighborhoods in Mumbai was scraped from https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai. The data is read into a pandas data frame using the read_html() method. The main reason for doing so is that the Wikipedia page provides a comprehensive and detailed table of the data which can easily be scraped using the read_html() method of pandas.

  # Geographical Coordinates

The geographical coordinates for Mumbai data has been obtained from the GeoPy library in python. This data is relevant for plotting the map of Mumbai using the Folium library in python. The geocoder library in python has been used to obtain latitude and longitude data for various neighborhoods in Mumbai. The coordinates of all neighborhoods in Mumbai are used to check the accuracy of coordinates given on Wikipedia and replace them in our data frame if the absolute difference is more than 0.001. These coordinates are then further used for plotting using the Folium library in python.

## Venue Data

The venue data has been extracted using the Foursquare API. This data contains venue recommendations for all neighborhoods in Mumbai and is used to study the popular venues of different neighborhoods.

## Importing required libraries

In [4]:
!pip install geopy
!pip install geocoder
!pip install folium

import numpy as np
import pandas as pd
import json
from geopy.geocoders import Nominatim
import geocoder
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import matplotlib.pyplot as plt
import seaborn as sns
from pandas.io.json import json_normalize
from sklearn.metrics import silhouette_score

%matplotlib notebook

print('All libraries imported.')

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 8.3 MB/s  eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 5.1 MB/s  eta 0:00:01
[?25hCollecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
All libraries imported.


## Data Retrieval


Scraping data from https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai and reading it into a dataframe.

In [5]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai')[-1]
df.rename(columns={'Area': 'Neighborhood'}, inplace=True)
df.head(10)

Unnamed: 0,Neighborhood,Location,Latitude,Longitude
0,Amboli,"Andheri,Western Suburbs",19.1293,72.8434
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.82721
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.82927
5,Marol,"Andheri,Western Suburbs",19.119219,72.882743
6,Sahar,"Andheri,Western Suburbs",19.098889,72.867222
7,Seven Bungalows,"Andheri,Western Suburbs",19.129052,72.817018
8,Versova,"Andheri,Western Suburbs",19.12,72.82
9,Mira Road,"Mira-Bhayandar,Western Suburbs",19.284167,72.871111


## Data Wrangling

Lets look at the different values for Location present in the Location column.

In [6]:
df['Location'].value_counts()

South Mumbai                       30
Andheri,Western Suburbs             8
Western Suburbs                     6
Eastern Suburbs                     4
Bandra,Western Suburbs              3
Powai,Eastern Suburbs               3
Ghatkopar,Eastern Suburbs           3
Mira-Bhayandar,Western Suburbs      3
Kandivali West,Western Suburbs      3
Malad,Western Suburbs               2
Mumbai                              2
Borivali (West),Western Suburbs     2
Goregaon,Western Suburbs            2
Kalbadevi,South Mumbai              2
Harbour Suburbs                     2
Vasai,Western Suburbs               2
Khar,Western Suburbs                2
Sanctacruz,Western Suburbs          1
Fort,South Mumbai                   1
Kamathipura,South Mumbai            1
Byculla,South Mumbai                1
Mulund,Eastern Suburbs              1
Kurla,Eastern Suburbs               1
Kandivali East,Western Suburbs      1
Antop Hill,South Mumbai             1
Dadar,South Mumbai                  1
Tardeo,South


We can see that there are many locations that appear only once or twice. This is because the main locations like "Western Suburbs" or "South Mumbai" are being further divided by the area within these locations. Lets clean the Location column to make it easier to understand.