## Battle of Neighborhood - Final Project
### IBM - Applied Data Science Capstone Course

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Indian restaurant** in **San Francisco Bay Area**, California.

We have identified that **location** is one of the most important factor to consider while opening a restaurant, so we will focus on **locations that are less crowded with restaurants** and particularly **Indian** restaurants. 

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by entrepreneurs.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* existing restaurants and other public facilities in the neighborhood (any type)
* number of Indian restaurants in the neighborhood, if any

To solve this problem, we need folloiwng data:
* List of Neighborhood in San Francisco Bay Area
* Latitude and Longitude coordinates of those Neighborhood
* Venue data related to the Indian Restaurant. This will be required to find suitable Neighborhood to open Indian Restaurant

#### Let's get started by creating DataFrame for the Neighborhood with Latitude and Longitude information

Import all the required libraries

In [1]:
import numpy as np # To handle data in a vectorize manner

import pandas as pd # To get the data in DataFrame

import json # To get and read the JSON file

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # To convert an address into latitude and longitude values
!pip install geocoder
import geocoder

import requests # To handle JSON requests

from pandas.io.json import json_normalize # To conver json file data into DataFrame

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans # To create cluster by K-Means 

!conda install -c conda-forge folium=0.5.0 --yes
import folium # Map rendering library

from bs4 import BeautifulSoup # Install and import BeautifulSoup4 for scraping the Wikipedia page

import lxml.html as lh
import re

print('Libraries imported')

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 14.5MB/s ta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
   

### Read Wikipedia page and get the Neighborhood data into pandas DataFrame

In [2]:
# Read URL
input_url = 'https://en.wikipedia.org/wiki/List_of_cities_and_towns_in_the_San_Francisco_Bay_Area'

# Fetch URL contents
wiki_data = requests.get(input_url)

# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(wiki_data.content, "lxml")

# Create DataFrame with list of all Neighborhood
table_data = soup.find("table", class_="wikitable plainrowheaders sortable")

df_columns = ['City / Town','County']
city = []
county = []

# Fetch rows
table_rows = table_data.find_all('tr')

for tr in table_rows:
    th = tr.find_all('th')
    row1 = [re.split("\n", tr.text)[0] for tr in th]
    city.extend(row1)
    
    td = tr.find_all('td')
    row2 = [re.split("\n", tr.text)[0] for tr in td]
    if len(row2) != 0:
        county.append(row2[1])

city_list = city[8:]

city_df = pd.DataFrame(zip(city_list,county),columns=df_columns)
print('There are total {} Cities / Towns in San Francisco Bay Area'. format(city_df.shape[0]))
city_df.head(12)

There are total 101 Cities / Towns in San Francisco Bay Area


Unnamed: 0,City / Town,County
0,Alameda,Alameda
1,Albany,Alameda
2,American Canyon,Napa
3,Antioch,Contra Costa
4,Atherton,San Mateo
5,Belmont,San Mateo
6,Belvedere,Marin
7,Benicia,Solano
8,Berkeley,Alameda
9,Brentwood,Contra Costa


Now that we have all city names in San Francisco Bay Area, let's get the coordinates for them

In [3]:
Neighborhood_lat = []
Neighborhood_lng = []

for j in zip(city_list,county):
    g = geocoder.arcgis(j)
    Neighborhood_lat.append(g.latlng[0])
    Neighborhood_lng.append(g.latlng[1])

city_df['Latitude'] = Neighborhood_lat
city_df['Longitude'] = Neighborhood_lng

city_df.head()

Unnamed: 0,City / Town,County,Latitude,Longitude
0,Alameda,Alameda,37.76683,-122.2453
1,Albany,Alameda,42.65155,-73.75521
2,American Canyon,Napa,38.16805,-122.25277
3,Antioch,Contra Costa,38.01583,-121.81974
4,Atherton,San Mateo,53.52324,-2.48942


### Let's visualize the city locations on match

In [4]:
# Get latitude and longitude for Bay Area
address = 'San Francisco Bay Area, CA'

geolocator = Nominatim(user_agent="explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of San Francisco Bay Area are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of San Francisco Bay Area are 37.7884969, -122.3558473.


In [5]:
# create map of Bay Area using latitude and longitude values
map_bayarea = folium.Map(location=[latitude, longitude], zoom_start=8)

# add markers to map
for lat, lng, city, county in zip(city_df['Latitude'], city_df['Longitude'], city_df['City / Town'], city_df['County']):
    label = '{}, {}'.format(city, county)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bayarea)  
    
map_bayarea

### Now, let's gather the Venue details in neighborhood of the cities
#### To do the same, we need to collect Foursquare Credentials and Version

In [6]:
CLIENT_ID = 'OFKO1PZDZLJSWHGU4JCDGHV5RKLO5G3VGOOI5T1W1ABXHFKQ' # Foursquare ID
CLIENT_SECRET = '4324UQRBU2J3USV3KARAJRIPVCV2U5TMZ2UNGATC5JUBEMK1' # Foursquare Secret
VERSION = '20191018' # Foursquare API version

print('Foursquare credentials ready!')

Foursquare credentials ready!


In [7]:
# Let's create a function to search Venues
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City / Town', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

bayarea_venues = getNearbyVenues(names=city_df['City / Town'],
                                   latitudes=city_df['Latitude'],
                                   longitudes=city_df['Longitude']
                                  )

View the Venues DataFrame and start exploring the venues

In [8]:
print(bayarea_venues.shape)
bayarea_venues.head()

(3269, 7)


Unnamed: 0,City / Town,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Alameda,37.76683,-122.2453,Alameda Theatre & Cineplex,37.76468,-122.243946,Multiplex
1,Alameda,37.76683,-122.2453,American Oak,37.765739,-122.242426,Bar
2,Alameda,37.76683,-122.2453,Troy,37.764388,-122.243365,Middle Eastern Restaurant
3,Alameda,37.76683,-122.2453,Dan's Fresh Produce,37.764702,-122.244164,Farmers Market
4,Alameda,37.76683,-122.2453,Tucker's Ice Cream,37.763843,-122.243297,Ice Cream Shop


In [9]:
bayarea_venues.groupby('City / Town').count()

Unnamed: 0_level_0,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City / Town,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alameda,88,88,88,88,88,88
Albany,43,43,43,43,43,43
American Canyon,35,35,35,35,35,35
Antioch,6,6,6,6,6,6
Atherton,7,7,7,7,7,7
Belmont,56,56,56,56,56,56
Belvedere,39,39,39,39,39,39
Benicia,32,32,32,32,32,32
Berkeley,78,78,78,78,78,78
Brentwood,31,31,31,31,31,31


In [10]:
print('There are {} uniques categories.'.format(len(bayarea_venues['Venue Category'].unique())))
print(bayarea_venues['Venue Category'].unique())

There are 324 uniques categories.
['Multiplex' 'Bar' 'Middle Eastern Restaurant' 'Farmers Market'
 'Ice Cream Shop' 'Pizza Place' 'Diner' 'Arcade' 'Burmese Restaurant'
 'Afghan Restaurant' 'Sushi Restaurant' 'Mexican Restaurant' 'Kids Store'
 'Vietnamese Restaurant' 'Sandwich Place' 'Coffee Shop' 'Bookstore'
 'Toy / Game Store' 'German Restaurant' 'American Restaurant' 'Dive Bar'
 'Chinese Restaurant' 'Dessert Shop' 'Wine Bar' 'Thai Restaurant'
 'Burrito Place' 'Poke Place' 'New American Restaurant' 'Bubble Tea Shop'
 'Hot Dog Joint' 'Asian Restaurant' 'Cuban Restaurant' 'Taco Place' 'Café'
 'Clothing Store' 'Bridal Shop' 'Fried Chicken Joint' 'Spa'
 'Used Bookstore' 'Ethiopian Restaurant' 'Video Store' 'Breakfast Spot'
 'ATM' 'Nail Salon' 'Italian Restaurant' 'Japanese Restaurant' 'Pharmacy'
 'Auto Garage' 'Bagel Shop' 'Fast Food Restaurant' 'Performing Arts Venue'
 'Bus Line' 'Burger Joint' 'Food' 'Boutique' 'Dance Studio' 'Park'
 'Gift Shop' 'Market' 'Bus Station' 'Frame Store' 'Hot

#### We have total 325 unique categories to explore. Before doing so, let's check the kind of venues we have aorund each City / Town

In [11]:
# one hot encoding
bayarea_onehot = pd.get_dummies(bayarea_venues[['Venue Category']], prefix="", prefix_sep="")

# add 'City / Town' column back to dataframe
bayarea_onehot['City / Town'] = bayarea_venues['City / Town'] 

# move 'City / Town' column to the first column
fixed_columns = [bayarea_onehot.columns[-1]] + list(bayarea_onehot.columns[:-1])

bayarea_onehot = bayarea_onehot[fixed_columns]

print(bayarea_onehot.shape)
bayarea_onehot.head()

(3269, 325)


Unnamed: 0,City / Town,ATM,Accessories Store,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio
0,Alameda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Alameda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Alameda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Alameda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Alameda,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Get the mean value of each category to see frequency of occurance of each category

In [12]:
bayarea_grouped = bayarea_onehot.groupby('City / Town').mean().reset_index()
bayarea_grouped

Unnamed: 0,City / Town,ATM,Accessories Store,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio
0,Alameda,0.011364,0.000000,0.011364,0.000000,0.022727,0.0,0.011364,0.0,0.000000,...,0.0,0.000000,0.000000,0.022727,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,Albany,0.000000,0.000000,0.000000,0.000000,0.046512,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,American Canyon,0.028571,0.000000,0.000000,0.000000,0.028571,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,Antioch,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,Atherton,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
5,Belmont,0.017857,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
6,Belvedere,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.025641,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
7,Benicia,0.000000,0.000000,0.000000,0.000000,0.031250,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8,Berkeley,0.000000,0.000000,0.000000,0.012821,0.025641,0.0,0.000000,0.0,0.012821,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.012821,0.000000,0.000000,0.025641
9,Brentwood,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.032258,0.000000,0.000000


#### Check top 10 venues for each location with the help of mean calculated

In [13]:
# Create function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [15]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City / Town']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
bayarea_venues_sorted = pd.DataFrame(columns=columns)
bayarea_venues_sorted['City / Town'] = bayarea_grouped['City / Town']

for ind in np.arange(bayarea_grouped.shape[0]):
    bayarea_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bayarea_grouped.iloc[ind, :], num_top_venues)

bayarea_venues_sorted.head()

Unnamed: 0,City / Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alameda,Chinese Restaurant,Bubble Tea Shop,Mexican Restaurant,Bar,Sandwich Place,Italian Restaurant,Pharmacy,Spa,New American Restaurant,Middle Eastern Restaurant
1,Albany,Café,Coffee Shop,Park,Pub,Hotel,Donut Shop,Bank,New American Restaurant,American Restaurant,Pizza Place
2,American Canyon,Chinese Restaurant,Indian Restaurant,Spa,Shopping Mall,Mexican Restaurant,Pharmacy,Pizza Place,Convenience Store,Sandwich Place,Tea Room
3,Antioch,Train Station,Construction & Landscaping,Bakery,Park,Café,French Restaurant,Flower Shop,Furniture / Home Store,Frozen Yogurt Shop,Fish & Chips Shop
4,Atherton,Supermarket,Pub,Roller Rink,Soccer Field,Sandwich Place,Bar,Fountain,Flea Market,Fast Food Restaurant,Filipino Restaurant


From above table, we can say that there are many types of cafes, restaurants, bar, pub and pizza places as most common venues for the people in Bay Area. This makes a tough competition for any entrepreneur to open any kind of restaurant.

The best location for opening a restaurant will be the location having less number of restaurant and specifically less number of Indian cafes and restaurant.

It will also be beneficial to have restaurants near Multiplexes, Shopping Malls, Supermarkets, Business Centres, Museums, etc. where we can expect a lot of crowd visiting from that city or town and also Neighborhood.

#### Now that we know most common venues in the cities, let's check the frequency of restuarants in each cities or town.
#### Also, we can have a count of Indian Restaurants in the cities

In [16]:
# Create DataFrame with columns which contains only Restuarant columns
bayarea_restaurants = bayarea_venues[bayarea_venues['Venue Category'].str.contains('Restaurant')]

print('There are total {} Restaurants, Cafes and Pizza Places in San Francisco Bay Area'.format(bayarea_restaurants.shape[0]))
bayarea_restaurants.head()

There are total 864 Restaurants, Cafes and Pizza Places in San Francisco Bay Area


Unnamed: 0,City / Town,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
2,Alameda,37.76683,-122.2453,Troy,37.764388,-122.243365,Middle Eastern Restaurant
8,Alameda,37.76683,-122.2453,Burma Superstar,37.763652,-122.243411,Burmese Restaurant
9,Alameda,37.76683,-122.2453,Q's Halal Chicken,37.764583,-122.243813,Afghan Restaurant
10,Alameda,37.76683,-122.2453,Utzutzu,37.764925,-122.242121,Sushi Restaurant
11,Alameda,37.76683,-122.2453,La Penca Azul,37.76523,-122.241819,Mexican Restaurant


#### Let's get top 10 restaurants visited in the cities

In [17]:
bayarea_res_onehot = pd.get_dummies(bayarea_restaurants[['Venue Category']], prefix="", prefix_sep="")

# add 'City / Town' column back to dataframe
bayarea_res_onehot['City / Town'] = bayarea_restaurants['City / Town'] 

# move 'City / Town' column to the first column
fixed_columns = [bayarea_res_onehot.columns[-1]] + list(bayarea_res_onehot.columns[:-1])

bayarea_res_onehot = bayarea_res_onehot[fixed_columns]

print(bayarea_res_onehot.shape)

# Take mean 
bayarea_res_grouped = bayarea_res_onehot.groupby('City / Town').mean().reset_index()
bayarea_res_grouped.head()

# List top 10 restaurants visited
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City / Town']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
bayarea_res_sorted = pd.DataFrame(columns=columns)
bayarea_res_sorted['City / Town'] = bayarea_res_grouped['City / Town']

for ind in np.arange(bayarea_res_grouped.shape[0]):
    bayarea_res_sorted.iloc[ind, 1:] = return_most_common_venues(bayarea_res_grouped.iloc[ind, :], num_top_venues)

bayarea_res_sorted.head()

(864, 67)


Unnamed: 0,City / Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alameda,Chinese Restaurant,Mexican Restaurant,New American Restaurant,American Restaurant,Italian Restaurant,Thai Restaurant,Asian Restaurant,Middle Eastern Restaurant,Sushi Restaurant,Ethiopian Restaurant
1,Albany,American Restaurant,New American Restaurant,Mexican Restaurant,Restaurant,English Restaurant,German Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant
2,American Canyon,Chinese Restaurant,Indian Restaurant,Mexican Restaurant,American Restaurant,Fast Food Restaurant,Thai Restaurant,Xinjiang Restaurant,Gluten-free Restaurant,Filipino Restaurant,French Restaurant
3,Belmont,Sushi Restaurant,New American Restaurant,Chinese Restaurant,Italian Restaurant,Vietnamese Restaurant,Peruvian Restaurant,Thai Restaurant,Asian Restaurant,Falafel Restaurant,Dim Sum Restaurant
4,Belvedere,Restaurant,Austrian Restaurant,Greek Restaurant,Indian Restaurant,Hunan Restaurant,Himalayan Restaurant,Hawaiian Restaurant,Halal Restaurant,Dumpling Restaurant,Gluten-free Restaurant


From above table, we can get that most of the people visit Chinese, American, Mexican, Indian and Italian Restaurants. So, opening a good Indian Restaurant in San Francisco is definitely a good idea..!!
Let's check out some Indian restaurants in this area.

In [18]:
bayarea_restaurants_indian = bayarea_venues[bayarea_venues['Venue Category'].str.contains('Indian Restaurant')]

print('There are total {} Indian Restaurants in San Francisco Bay Area'.format(bayarea_restaurants.shape[0]))
bayarea_restaurants_indian.head()

There are total 864 Indian Restaurants in San Francisco Bay Area


Unnamed: 0,City / Town,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
143,American Canyon,38.16805,-122.25277,All Spice Indian Restaurant,38.166265,-122.25411,Indian Restaurant
151,American Canyon,38.16805,-122.25277,All Spice,38.166186,-122.254104,Indian Restaurant
246,Belvedere,48.19143,16.38051,Demi Tass,48.194001,16.37745,Indian Restaurant
281,Benicia,38.05285,-122.15351,Aroma Indian Cuisine,38.050871,-122.15772,Indian Restaurant
358,Berkeley,37.86988,-122.27054,East Bay Spice Company,37.870317,-122.265996,Indian Restaurant


#### Out of 858 restaurants, we have only 39 Indian Restaurants in San Francisco Bay Area. This number is very less as to all other Restaurants, cafes and other food joints.

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting Bay Area that have low restaurant density, particularly those with low number of Indian restaurants.

* In first step we have collected the required **data: location and type (category) in neighborhood of the city / town in Bay Area**. We have also **identified Indian restaurants** (according to Foursquare categorization).

* Second step in our analysis will focus on most promising areas and within those create **clusters of top 10 common venues** and figure out the most suitable location for Indian Restaurant. To do the clustering, we will use Machine Learning **K-Means Clustering algorithm**.

**NOTE:** Here, we will consider all the Common Venues, as we will be considering other locations as Multiplexs, Shopping Malls, Supermarkets, Business Centres, Museums, etc. into consideration while detecting the suitable location.

In [19]:
# set number of clusters
kclusters = 5

bayarea_grouped_clustering = bayarea_grouped.drop('City / Town', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bayarea_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 2, 1, 1, 0, 1, 1, 3, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [21]:
# add clustering labels
#bayarea_venues_sorted.drop('Cluster Labels', axis=1, inplace=True)
bayarea_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

bayarea_merged = city_df

# merge bayarea_grouped with bayarea_venues_sorted to add latitude/longitude for each 'City / Town'
bayarea_merged = bayarea_merged.join(bayarea_venues_sorted.set_index('City / Town'), on='City / Town', how='right')

bayarea_merged.head() # check the last columns!

Unnamed: 0,City / Town,County,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alameda,Alameda,37.76683,-122.2453,1,Chinese Restaurant,Bubble Tea Shop,Mexican Restaurant,Bar,Sandwich Place,Italian Restaurant,Pharmacy,Spa,New American Restaurant,Middle Eastern Restaurant
1,Albany,Alameda,42.65155,-73.75521,1,Café,Coffee Shop,Park,Pub,Hotel,Donut Shop,Bank,New American Restaurant,American Restaurant,Pizza Place
2,American Canyon,Napa,38.16805,-122.25277,1,Chinese Restaurant,Indian Restaurant,Spa,Shopping Mall,Mexican Restaurant,Pharmacy,Pizza Place,Convenience Store,Sandwich Place,Tea Room
3,Antioch,Contra Costa,38.01583,-121.81974,1,Train Station,Construction & Landscaping,Bakery,Park,Café,French Restaurant,Flower Shop,Furniture / Home Store,Frozen Yogurt Shop,Fish & Chips Shop
4,Atherton,San Mateo,53.52324,-2.48942,1,Supermarket,Pub,Roller Rink,Soccer Field,Sandwich Place,Bar,Fountain,Flea Market,Fast Food Restaurant,Filipino Restaurant


#### Let's visualize the resulting clusters and analyze each one of them
* We will use **Folium** to get the map and mark clusters on it

In [22]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=8)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bayarea_merged['Latitude'], bayarea_merged['Longitude'], bayarea_merged['City / Town'], bayarea_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Analysis <a name="analysis"></a>

#### Examine Clusters
Examine each cluster and determine the discriminating venue categories that distinguish each cluster

* **Cluster 1**

In [23]:
bayarea_merged.loc[bayarea_merged['Cluster Labels'] == 0, bayarea_merged.columns[[0] + list(range(5, bayarea_merged.shape[1]))]]

Unnamed: 0,City / Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Concord,Park,Dog Run,Pharmacy,ATM,Grocery Store,Food,Fish & Chips Shop,Fish Market,Fishing Store,Flea Market
49,Monte Sereno,Gas Station,Diner,Park,Pharmacy,Bakery,Dim Sum Restaurant,Discount Store,Fish & Chips Shop,Fish Market,Fishing Store
67,Portola Valley,Pet Store,Park,Farmers Market,American Restaurant,Tennis Court,Yoga Studio,Food,Fish & Chips Shop,Fish Market,Fishing Store
72,Ross,Park,Restaurant,Café,Theater,Pizza Place,Garden,Deli / Bodega,French Restaurant,Financial or Legal Service,Fish & Chips Shop
83,San Ramon,Park,Food Truck,Metro Station,Pizza Place,Food & Drink Shop,Department Store,Sandwich Place,Fountain,Frame Store,Food Court


* **Cluster 2**

In [24]:
bayarea_merged.loc[bayarea_merged['Cluster Labels'] == 1, bayarea_merged.columns[[0] + list(range(5, bayarea_merged.shape[1]))]]

Unnamed: 0,City / Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alameda,Chinese Restaurant,Bubble Tea Shop,Mexican Restaurant,Bar,Sandwich Place,Italian Restaurant,Pharmacy,Spa,New American Restaurant,Middle Eastern Restaurant
1,Albany,Café,Coffee Shop,Park,Pub,Hotel,Donut Shop,Bank,New American Restaurant,American Restaurant,Pizza Place
2,American Canyon,Chinese Restaurant,Indian Restaurant,Spa,Shopping Mall,Mexican Restaurant,Pharmacy,Pizza Place,Convenience Store,Sandwich Place,Tea Room
3,Antioch,Train Station,Construction & Landscaping,Bakery,Park,Café,French Restaurant,Flower Shop,Furniture / Home Store,Frozen Yogurt Shop,Fish & Chips Shop
4,Atherton,Supermarket,Pub,Roller Rink,Soccer Field,Sandwich Place,Bar,Fountain,Flea Market,Fast Food Restaurant,Filipino Restaurant
5,Belmont,Coffee Shop,Smoke Shop,Salon / Barbershop,Mobile Phone Shop,Pizza Place,Grocery Store,Sandwich Place,Sushi Restaurant,Pet Store,Pet Service
6,Belvedere,Hotel,Pizza Place,Café,Bakery,Garden,Supermarket,Business Service,Track,Theater,Botanical Garden
7,Benicia,Baseball Field,Pizza Place,Italian Restaurant,Comic Shop,Sushi Restaurant,Burger Joint,Café,Shopping Plaza,Shipping Store,Coffee Shop
8,Berkeley,Ice Cream Shop,Coffee Shop,Yoga Studio,Asian Restaurant,Brewery,Sushi Restaurant,Bubble Tea Shop,Thai Restaurant,Pizza Place,Bookstore
9,Brentwood,Clothing Store,Italian Restaurant,Pub,English Restaurant,Café,Bookstore,Pizza Place,Coffee Shop,Supermarket,Nightclub


**In Cluster 2, there are many restaurants of various cuisines. Particularly, there are atleast 3 locations where 'Indian Restaurant' is mostly visited.
So, opening an Indian Restaurant in above Cities or towns will not be a great idea.**

* **Cluster 3**

In [25]:
bayarea_merged.loc[bayarea_merged['Cluster Labels'] == 2, bayarea_merged.columns[[0] + list(range(5, bayarea_merged.shape[1]))]]

Unnamed: 0,City / Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
80,San Mateo,Department Store,Resort,Convenience Store,Snack Place,Yoga Studio,Food,Financial or Legal Service,Fish & Chips Shop,Fish Market,Fishing Store


**This cluster looks good as it has only 2 restaurants and 1 snack place along with public places. This makes very good location to open Indian Restaurant**

* **Cluster 4**

In [26]:
bayarea_merged.loc[bayarea_merged['Cluster Labels'] == 3, bayarea_merged.columns[[0] + list(range(5, bayarea_merged.shape[1]))]]

Unnamed: 0,City / Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
86,Saratoga,Coffee Shop,Wine Bar,Playground,Theater,Yoga Studio,Food,Financial or Legal Service,Fish & Chips Shop,Fish Market,Fishing Store


**In Cluster 4, there are only 2 restaurants in this cluster and there are many public places in this area, so this can be the best place to open Indian Restaurant**

* **Cluster 5**

In [27]:
bayarea_merged.loc[bayarea_merged['Cluster Labels'] == 4, bayarea_merged.columns[[0] + list(range(5, bayarea_merged.shape[1]))]]

Unnamed: 0,City / Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
42,Los Altos Hills,Playground,Music Venue,Food Court,Fish & Chips Shop,Fish Market,Fishing Store,Flea Market,Flower Shop,Food,Food & Drink Shop


**This cluster also similar to Cluster 3 and 4 i.e. 1 restaurant and many public places, which makes it one of the best place to open Indian Restaurant**

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify places in San Francisco Bay Area with low number of restaurants (particularly Indian restaurants) in order to provide the optimal location for a new Indian restaurant. 

Most of the restaurants are present with highest number in Cluster 2 and with moderate number in Cluster 1, so theses will be very competitive area for opening new restaurant of any cuisine.
While rest of the clusters i.e. 3, 4 and 5 has very less number of restaurants and a good number of public places. This represents a great opportunity and high potential areas to open new Indian Restaurant. This cluster areas are San Mateo, Saratoga and Las Altos Hills.

Final decission on optimal restaurant location will be made by entrepreneurs based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location, levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.