# Expansion Strategy in Mumbai City for Beriyan Biryani
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

Biryani is one of the most consumed foods in India. It is estimated that the biryani market in India is estimated to be approximately Rs 1,500 crore (USD200M) in the organised sector, and Rs 15,000 crore (USD2B) in the unorganised sector.

**Beriyan Biryani**, a Biryani restaurant chain started out of Hyderabad, India has quickly garnered popularity and revenue growth. As they look to scale up, it sees Mumbai as the logical next choice of the city to expand to.

This project attempts to find optimal locations for restaurants in a city. Specifically, this report will be targeted to Beriyan Biryani stakeholders interested in opening an **Biryani restaurant** in **Mumbai**, India.

We will use data visualization and machine learning methods to generate a few most promissing neighborhoods based on this criteria. Each selection will be comprehensively outlined for the aforementioned stakeholders to make the choice

## Data <a name="data"></a>

Based on definition of our problem, some key factors that will influence the decision are:
* Density of Restaurants in a neighborhood
* Biryani Popularity in a neighborhood
* Commercial Space Rates in a neighborhood

A map encapsulating all the major neighborhoods would be a good starting point.

Also, pertinent information about the neighborhoods would be necessary

Following data sources will be needed to extract/generate the required information:
* Mumbai neigborhoods will be availed by web-scraping the **Mumbai Wikipedia** page
* Neighborhood co-ordinates (Latitude and Longitude) will be retrieved using **Nominatim** module from **GeoPy** library
* Neighborhood data related to restaurants, venues etc. will be retrieved using **Foursquare API**
* Commercial Space Rates will be availed by web-scraping **MagicBricks** (a major online real estate platform in India) database

### Mumbai Neighborhoods Dataframe

Mumbai Wikipedia page (https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai) lists all of the major neighborhoods in Mumbai, sorted by the suburbs, namely- Western, Eastern, Harbor and South Mumbai.

We have a basic csv file prepared made by copying this data. We start by importing it, along with the necessary libraries for data analysis and visualization.

In [1]:
# Importing Necessary Libraries
import numpy as np
import pandas as pd
import json
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium 
from sklearn.cluster import KMeans

In [2]:
# Create Mumbai Neighborhoods dataframe from csv data
mum = pd.read_csv("Mumbai_Neighborhoods.csv")
mum.head()

Unnamed: 0,Suburb,Neighborhood
0,Western,Andheri
1,Western,Mira Bhayandar
2,Western,Bandra
3,Western,Borivali
4,Western,Dahisar


In [3]:
# Get co-ordinates for each neighborhood and add them to the neighborhoods dataframe
geolocator = Nominatim(user_agent="foursquare_agent")
latitude = []
longitude = []
for i in mum['Neighborhood']:
    location = geolocator.geocode(i+',Mumbai')
    try:                                            # Co-ordinates for some neighborhoods may not be available, we replace
        latitude.append(location.latitude)          # those with NaN
        longitude.append(location.longitude)
    except:
        latitude.append(np.nan)
        longitude.append(np.nan)
mum['Latitude'] = pd.Series(latitude)
mum['Longitude'] = pd.Series(longitude)
mum.head()

Unnamed: 0,Suburb,Neighborhood,Latitude,Longitude
0,Western,Andheri,19.119698,72.84642
1,Western,Mira Bhayandar,,
2,Western,Bandra,19.054979,72.84022
3,Western,Borivali,19.229068,72.857363
4,Western,Dahisar,19.24945,72.859621


In [6]:
# Remove the neighborhoods with NaN co-ordinates
mum.dropna(inplace=True)
mum.reset_index(inplace=True)
mum.head()

Unnamed: 0,index,Suburb,Neighborhood,Latitude,Longitude
0,0,Western,Andheri,19.119698,72.84642
1,2,Western,Bandra,19.054979,72.84022
2,3,Western,Borivali,19.229068,72.857363
3,4,Western,Dahisar,19.24945,72.859621
4,5,Western,Goregaon,19.164803,72.850045


In [7]:
del mum['index']
mum.head()

Unnamed: 0,Suburb,Neighborhood,Latitude,Longitude
0,Western,Andheri,19.119698,72.84642
1,Western,Bandra,19.054979,72.84022
2,Western,Borivali,19.229068,72.857363
3,Western,Dahisar,19.24945,72.859621
4,Western,Goregaon,19.164803,72.850045


Let us create a map visualization of the Mumbai neighborhoods, so as to get the geographical overview of the city.

In [8]:
# Finding the latitude and longitude for Mumbai

address = 'Goregaon, Mumbai'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print("Mumbai co-ordinates are:",latitude, longitude)

Mumbai co-ordinates are: 19.1648029 72.8500454


In [10]:
# Creating Mumbai map and marking the neighborhoods from dataframe on it

map_mumbai = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, neighborhood in zip(mum['Latitude'], mum['Longitude'], mum['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#fbb8a9',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mumbai)  

map_mumbai