# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Results and Conclusion](#results)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a **Vietnamese Restaurant** in **Montreal**, Canada.

We will try to detect locations that aren't already crowded with restaurants and we are especially interested in areas that aren't crowded with Vietnamese Restaurants. We would prefer if the location is close to the center of the city, assuming the first two conditions are met.

We will be using data science tools to find a few promising neighborhoods where we can open the Vietnamese Restaurant based on the criteria above.

## Data <a name="data"></a>

The factors below are what is going to influence our decision:
* number of any type of restaurant in neighborhood
* number of Vietnamese restaurants in neighborhood
* distance of neighborhood from city center
    
We decided to just use a database found on https://worldpostalcode.com/canada/quebec/montreal for our neighborhoods that are linked to the postal codes. We will also need to use the following data sources:
* Google Maps Api Geocoding to get latitude and longitude
* Restaurant data from Foursquare API

### Neighborhood Candidates

Let's start by importing the libraries that we will need for this project

In [8]:
import pandas as pd
import numpy as np
import requests
import json
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import geocoder

Let's read the CSV file that contains the neighborhoods and their postal codes for the city of Montreal

In [9]:
df = pd.read_csv('MontrealNeighborhoods.csv')
df.head()

Unnamed: 0,Postal Code,Neighborhood
0,H9W,Beaconsfield
1,H3B,Downtown Montreal East
2,H3H,Downtown Montreal South & West
3,H9J,Kirkland
4,H1H,Montreal North South


We will get the coordinates of these neighborhoods

In [25]:
url = 'https://maps.googleapis.com/maps/api/geocode/json'

df['Latitude'] = ''
df['Longitude'] = ''
i = 0

for postal in df['Postal Code']:
    params = {'address': '{}, Montreal, Quebec'.format(postal), 'key': 'key'}
    r = requests.get(url, params=params)
    results = r.json()['results']
    location = results[0]['geometry']['location']
    df['Latitude'][i] = location['lat']
    df['Longitude'][i] = location['lng']
    i+=1

df.head(15)

Unnamed: 0,Postal Code,Neighborhood,Latitude,Longitude
0,H9W,Beaconsfield,45.429,-73.869
1,H3B,Downtown Montreal East,45.4999,-73.5689
2,H3H,Downtown Montreal South & West,45.5027,-73.5958
3,H9J,Kirkland,45.4486,-73.871
4,H1H,Montreal North South,45.5893,-73.642
5,H9R,Pointe-Claire,45.4593,-73.8115
6,H4W,Cote-Saint-Luc West,45.477,-73.6677
7,H3A,Downtown Montreal North,45.5035,-73.5769
8,H3G,Downtown Montreal Southeast,45.4995,-73.5826
9,H1B,Montreal East,45.6337,-73.515


From doing a little bit of research I found out that the center of Montreal is Downtown Montreal East

In [29]:
df.loc[[1]]

Unnamed: 0,Postal Code,Neighborhood,Latitude,Longitude
1,H3B,Downtown Montreal East,45.4999,-73.5689


###### Creating the Map of Montreal

In [30]:
map_montreal = folium.Map(location=[45.4999, -73.5689], zoom_start=10)

# add markers to map
for lat, lng, postal_code, neighborhood in zip(df['Latitude'], df['Longitude'], df['Postal Code'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, postal_code)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_montreal)  
    
map_montreal

We're only interested in the area near the city center, near Downtown Montreal East so we will go ahead and create a dataframe only with the downtown neighborhoods

In [35]:
downtown_df = df[df["Neighborhood"].str.contains('Downtown')].reset_index(drop=True)
downtown_df.head()

Unnamed: 0,Postal Code,Neighborhood,Latitude,Longitude
0,H3B,Downtown Montreal East,45.4999,-73.5689
1,H3H,Downtown Montreal South & West,45.5027,-73.5958
2,H3A,Downtown Montreal North,45.5035,-73.5769
3,H3G,Downtown Montreal Southeast,45.4995,-73.5826
4,H2Z,Downtown Montreal Northeast,45.5039,-73.5632


Let's create the new map only with the downtown neighborhoods

In [38]:
map_downtown = folium.Map(location=[45.4999, -73.5689], zoom_start=11)

# add markers to map
for lat, lng, label in zip(downtown_df['Latitude'], downtown_df['Longitude'], downtown_df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)  
    
map_downtown

###### Defining Foursquare Credentials and Version

In [39]:
CLIENT_ID = 'HLKKEGZNGWRLR1XPQZRTYZJVZIHICQG5KDQHODA5PWUZTZOY' # your Foursquare ID
CLIENT_SECRET = 'R0ZM51RNG0JSUQA0CI3UZ3QNFXKVV2FNSPMFGSWXOB1QM05G' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HLKKEGZNGWRLR1XPQZRTYZJVZIHICQG5KDQHODA5PWUZTZOY
CLIENT_SECRET:R0ZM51RNG0JSUQA0CI3UZ3QNFXKVV2FNSPMFGSWXOB1QM05G


#### Creating a function that gets top 100 venues that are in a neighborhood within a radius of 500 meters

In [40]:
LIMIT = 100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

###### Calling the function above

In [41]:
downtown_venues = getNearbyVenues(names=downtown_df['Neighborhood'],
                                   latitudes=downtown_df['Latitude'],
                                   longitudes=downtown_df['Longitude']
                                  )

Downtown Montreal East
Downtown Montreal South & West
Downtown Montreal North
Downtown Montreal Southeast
Downtown Montreal Northeast


In [42]:
downtown_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Downtown Montreal East,45.499914,-73.568918,Cathédrale Marie-Reine-du-Monde,45.499614,-73.569367,Church
1,Downtown Montreal East,45.499914,-73.568918,Nacarat,45.500773,-73.5682,Cocktail Bar
2,Downtown Montreal East,45.499914,-73.568918,Dominion Square Tavern,45.500405,-73.571636,Gastropub
3,Downtown Montreal East,45.499914,-73.568918,The Keg Steakhouse & Bar,45.50073,-73.568971,Steakhouse
4,Downtown Montreal East,45.499914,-73.568918,Square Dorchester,45.499474,-73.570848,Park


In [54]:
downtown_venues.groupby('Neighborhood').count()


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Downtown Montreal East,100,100,100,100,100,100
Downtown Montreal North,75,75,75,75,75,75
Downtown Montreal Northeast,100,100,100,100,100,100
Downtown Montreal South & West,4,4,4,4,4,4
Downtown Montreal Southeast,50,50,50,50,50,50


In [65]:
downtown_venue_grouped = downtown_venues.groupby('Venue Category').count().sort_values(by=['Neighborhood'], ascending=False)

downtown_venue_grouped.head(25)

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Hotel,26,26,26,26,26,26
Coffee Shop,22,22,22,22,22,22
Café,16,16,16,16,16,16
French Restaurant,11,11,11,11,11,11
Clothing Store,9,9,9,9,9,9
Japanese Restaurant,8,8,8,8,8,8
Asian Restaurant,7,7,7,7,7,7
Restaurant,7,7,7,7,7,7
Bakery,6,6,6,6,6,6
Gym,6,6,6,6,6,6


## Analyze each neighborhood

In [52]:
# one hot encoding
downtown_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
downtown_onehot['Neighborhood'] = downtown_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [downtown_onehot.columns[-1]] + list(downtown_onehot.columns[:-1])
downtown_onehot = downtown_onehot[fixed_columns]

downtown_onehot.head()

Unnamed: 0,Yoga Studio,Art Museum,Arts & Crafts Store,Asian Restaurant,Bakery,Bar,Belgian Restaurant,Bistro,Bookstore,Breakfast Spot,...,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tattoo Parlor,Tea Room,Theater,Toy / Game Store,Vegetarian / Vegan Restaurant,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Group neighborhoods by taking the mean of the frequency of occurrence of each category

In [71]:
downtown_grouped = downtown_onehot.groupby('Neighborhood').mean().reset_index()
downtown_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Art Museum,Arts & Crafts Store,Asian Restaurant,Bakery,Bar,Belgian Restaurant,Bistro,Bookstore,...,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tattoo Parlor,Tea Room,Theater,Toy / Game Store,Vegetarian / Vegan Restaurant,Women's Store
0,Downtown Montreal East,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.01,...,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0
1,Downtown Montreal North,0.013333,0.013333,0.013333,0.0,0.013333,0.0,0.0,0.0,0.013333,...,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.013333
2,Downtown Montreal Northeast,0.01,0.0,0.0,0.06,0.03,0.02,0.0,0.01,0.0,...,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.02,0.0
3,Downtown Montreal South & West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Downtown Montreal Southeast,0.02,0.08,0.0,0.0,0.02,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.02,0.02


#### Print each neighborhood along with the top 5 most common venues

In [47]:
num_top_venues = 5

for hood in downtown_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_grouped[downtown_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Downtown Montreal East----
            venue  freq
0     Coffee Shop  0.10
1           Hotel  0.06
2  Clothing Store  0.04
3      Restaurant  0.04
4            Café  0.04


----Downtown Montreal North----
            venue  freq
0           Hotel  0.11
1  Clothing Store  0.07
2  Sandwich Place  0.05
3     Coffee Shop  0.05
4            Café  0.05


----Downtown Montreal Northeast----
               venue  freq
0              Hotel  0.08
1  French Restaurant  0.06
2   Asian Restaurant  0.06
3              Plaza  0.05
4        Coffee Shop  0.05


----Downtown Montreal South & West----
           venue  freq
0       Bus Stop  0.25
1           Lake  0.25
2  Historic Site  0.25
3       Mountain  0.25
4         Museum  0.00


----Downtown Montreal Southeast----
                venue  freq
0               Hotel  0.08
1          Art Museum  0.08
2                Café  0.08
3         Coffee Shop  0.06
4  Italian Restaurant  0.04




#### Putting the above in a pandas dataframe

In [48]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### New dataframe with top 10 venues for each neighborhood

In [49]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_grouped['Neighborhood']

for ind in np.arange(downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Montreal East,Coffee Shop,Hotel,Clothing Store,Restaurant,Café,Gym,Deli / Bodega,Cosmetics Shop,Gastropub,Pub
1,Downtown Montreal North,Hotel,Clothing Store,Coffee Shop,Sandwich Place,Café,Cosmetics Shop,Gym,Japanese Restaurant,French Restaurant,Pizza Place
2,Downtown Montreal Northeast,Hotel,Asian Restaurant,French Restaurant,Plaza,Chinese Restaurant,Coffee Shop,Café,Japanese Restaurant,Bakery,Vegetarian / Vegan Restaurant
3,Downtown Montreal South & West,Bus Stop,Historic Site,Lake,Mountain,Women's Store,Cuban Restaurant,Cupcake Shop,Cycle Studio,Deli / Bodega,Department Store
4,Downtown Montreal Southeast,Art Museum,Café,Hotel,Coffee Shop,Italian Restaurant,Burger Joint,Middle Eastern Restaurant,Hawaiian Restaurant,Park,Jewelry Store


# Results <a name="results"></a>

From the table above we can see the frequency of each type of venue for each of the neighborhoods. 
We can see that Vietnamese Restaurants are not in the Top 10 Venues for any of the neighborhoods. There is a real chance that we can open a Vietnamese Restaurant in any of the neighborhoods since there are no well established Vietnamese Restaurants in the Top 10.

Downtown Montreal East, Downtown Montreal North, Downtown Montreal Northeast, and Downtown Montreal Southeast:
* Compared to other neighborhoods these neighborhoods have higher frequency of restaurants
* They do not have any Vietnamese Restaurants which is a good thing

Downtown Montreal South & West:
* Compared to other neighborhoods has lower frequency of restaurants
* Does not have any Vietnamese Restaurant in the Top 10 Venues
* Very close to city center

If I were to suggest to open a Vietnamese Restaurant in Montreal, Canada to stakeholders I would pick Downtown Montreal South & West because it fits all of our needs that we mentioned at the business intro.