# Capstone Project - The Battle of the Neighborhoods (Week 2)

### Applied Data Science Capstone IBM/Coursera

# Table of Contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal neighborhood in **Ottawa, Canada** to open a **Greek restaurant of a fast food style**.

The client is looking to open a second location for a very successful Greek restaurant chain in Ottawa.
Keeping in mind that this is a ”fast food” type of chain where customers call in and order ahead before
simply picking up their order to go the client would prefer an are with **high population, lot of residential
homes**. That way the customer can grab food on their way home. But also looking at areas with **less fast
food restaurants**, specifically fast food because they are going to be the competition.

## Data <a name="data"></a>

Based on our defined criteria above we will be looking at:
* Population in neighborhoods
* Residential Dwellings in Neighborhoods
* Lower number of fast food restaurants

We will be using postal code location data to plot neighborhhods of Ottawa retrieved online and available on the github repo for this project in csv format. We will be using the FourSquare API to retrieve venues within each neighborhood.Using all this data we will plot clusters on a map to see how the different areas of Ottawa compare in terms of venues. Then having seen the clusters in conjunction with population in those postal code areas we can work out a suitable area to open our restaurant. After having picked an area the next steps would be to find suitable buildings for sale or lease to house the business.

In [69]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from folium import plugins
from folium.plugins import HeatMap # heatmap plugin

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


#### Import postal code data from github into a pandas dataframe

In [70]:
url = 'https://raw.githubusercontent.com/SunpreetGarcha/IBM-Applied-Data-Science-Capstone-Project/master/Canadian%20Postal%20Codes.csv'
df = pd.read_csv(url)
df.head()

Unnamed: 0,Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9,Column10,Column11,Column12
0,CA,T0A,Eastern Alberta (St. Paul),Alberta,AB,,,,,54.766,-111.7174,6.0
1,CA,T0B,Wainwright Region (Tofield),Alberta,AB,,,,,53.0727,-111.5816,6.0
2,CA,T0C,Central Alberta (Stettler),Alberta,AB,,,,,52.1431,-111.6941,5.0
3,CA,T0E,Western Alberta (Jasper),Alberta,AB,,,,,53.6758,-115.0948,5.0
4,CA,T0G,North Central Alberta (Slave Lake),Alberta,AB,,,,,55.6993,-114.4529,6.0


#### Import population and dwelling data from github into a pandas dataframe

In [71]:
url2 = 'https://raw.githubusercontent.com/SunpreetGarcha/IBM-Applied-Data-Science-Capstone-Project/master/Population.csv'
pop = pd.read_csv(url2)
pop.head()

Unnamed: 0,Geographic code,Geographic name,Province or territory,"Incompletely enumerated Indian reserves and Indian settlements, 2016","Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016"
0,1,Canada,,T,35151728.0,15412443.0,14072079.0
1,A0A,A0A,Newfoundland and Labrador,,46587.0,26155.0,19426.0
2,A0B,A0B,Newfoundland and Labrador,,19792.0,13658.0,8792.0
3,A0C,A0C,Newfoundland and Labrador,,12587.0,8010.0,5606.0
4,A0E,A0E,Newfoundland and Labrador,,22294.0,12293.0,9603.0


#### Rename and drop columns appropriately 

In [72]:
pop1 = pop.drop(['Geographic name','Incompletely enumerated Indian reserves and Indian settlements, 2016','Province or territory'],axis=1)
pop1 = pop1.rename(columns = {'Geographic code':'Postal Code'})
pop1.head()

Unnamed: 0,Postal Code,"Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016"
0,1,35151728.0,15412443.0,14072079.0
1,A0A,46587.0,26155.0,19426.0
2,A0B,19792.0,13658.0,8792.0
3,A0C,12587.0,8010.0,5606.0
4,A0E,22294.0,12293.0,9603.0


#### Rename and drop columns appropriately

In [73]:
df1 = df.drop(['Column1','Column7','Column8','Column9','Column12'],axis=1)
df1.columns = ['Postal Code', 'Area', 'Province', 'Province Code','Subdivision', 'Latitude','Longitude']
df1.head()

Unnamed: 0,Postal Code,Area,Province,Province Code,Subdivision,Latitude,Longitude
0,T0A,Eastern Alberta (St. Paul),Alberta,AB,,54.766,-111.7174
1,T0B,Wainwright Region (Tofield),Alberta,AB,,53.0727,-111.5816
2,T0C,Central Alberta (Stettler),Alberta,AB,,52.1431,-111.6941
3,T0E,Western Alberta (Jasper),Alberta,AB,,53.6758,-115.0948
4,T0G,North Central Alberta (Slave Lake),Alberta,AB,,55.6993,-114.4529


#### Merge the two dataframes using postal codes

In [74]:
dfm = pd.merge(df1, pop1, on='Postal Code', how='left')
dfm.head()

Unnamed: 0,Postal Code,Area,Province,Province Code,Subdivision,Latitude,Longitude,"Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016"
0,T0A,Eastern Alberta (St. Paul),Alberta,AB,,54.766,-111.7174,59234.0,27713.0,21711.0
1,T0B,Wainwright Region (Tofield),Alberta,AB,,53.0727,-111.5816,64072.0,28009.0,24427.0
2,T0C,Central Alberta (Stettler),Alberta,AB,,52.1431,-111.6941,62701.0,28739.0,23081.0
3,T0E,Western Alberta (Jasper),Alberta,AB,,53.6758,-115.0948,43729.0,22179.0,16779.0
4,T0G,North Central Alberta (Slave Lake),Alberta,AB,,55.6993,-114.4529,42905.0,18519.0,15103.0


#### Pull only the Ottawa data that we are using

In [75]:
df2 = dfm[dfm['Area'].str.contains('Ottawa')].reset_index(drop=True)
df2.head()

Unnamed: 0,Postal Code,Area,Province,Province Code,Subdivision,Latitude,Longitude,"Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016"
0,K1A,Government of Canada Ottawa and Gatineau offices,Ontario,ON,Ottawa,45.4207,-75.7023,589.0,484.0,352.0
1,K1G,Ottawa (Riverview / Hawthorne),Ontario,ON,Ottawa,45.3548,-75.5773,34075.0,15128.0,14195.0
2,K1H,Ottawa (Alta Vista),Ontario,ON,Ottawa,45.3876,-75.6593,15796.0,7180.0,6804.0
3,K1K,Ottawa (Overbrook),Ontario,ON,Ottawa,45.4448,-75.6431,29499.0,14212.0,13482.0
4,K1L,Ottawa (Vanier),Ontario,ON,Ottawa,45.44,-75.663,17021.0,10452.0,9369.0


#### See what the unique postal codes are (just for curiosities sake)

In [76]:
df2['Postal Code'].unique()

array(['K1A', 'K1G', 'K1H', 'K1K', 'K1L', 'K1M', 'K1N', 'K1P', 'K1R',
       'K1S', 'K1V', 'K1Y', 'K1Z', 'K2A', 'K2B', 'K2C', 'K2P'],
      dtype=object)

#### Format the data so it looks nice

In [77]:
df2['Area'] = df2['Area'].str.replace('Ottawa','')
df2['Area'] = df2['Area'].str.strip('( ')
df2['Area'] = df2['Area'].str.strip(') ')
df2.head()

Unnamed: 0,Postal Code,Area,Province,Province Code,Subdivision,Latitude,Longitude,"Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016"
0,K1A,Government of Canada and Gatineau offices,Ontario,ON,Ottawa,45.4207,-75.7023,589.0,484.0,352.0
1,K1G,Riverview / Hawthorne,Ontario,ON,Ottawa,45.3548,-75.5773,34075.0,15128.0,14195.0
2,K1H,Alta Vista,Ontario,ON,Ottawa,45.3876,-75.6593,15796.0,7180.0,6804.0
3,K1K,Overbrook,Ontario,ON,Ottawa,45.4448,-75.6431,29499.0,14212.0,13482.0
4,K1L,Vanier,Ontario,ON,Ottawa,45.44,-75.663,17021.0,10452.0,9369.0


In [78]:
df3 = df2.drop(['Province','Province Code'],axis=1)
df3.head()

Unnamed: 0,Postal Code,Area,Subdivision,Latitude,Longitude,"Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016"
0,K1A,Government of Canada and Gatineau offices,Ottawa,45.4207,-75.7023,589.0,484.0,352.0
1,K1G,Riverview / Hawthorne,Ottawa,45.3548,-75.5773,34075.0,15128.0,14195.0
2,K1H,Alta Vista,Ottawa,45.3876,-75.6593,15796.0,7180.0,6804.0
3,K1K,Overbrook,Ottawa,45.4448,-75.6431,29499.0,14212.0,13482.0
4,K1L,Vanier,Ottawa,45.44,-75.663,17021.0,10452.0,9369.0


#### Check the shape of the final dataframe we will be using

In [79]:
df3.shape

(17, 8)

#### Create a map of our area with labels for each neighborhood

In [80]:
location = [45.4215,-75.6972]
o_map = folium.Map(location,zoom_start=12)
for lat, lng, Area, Subdivision in zip(df3['Latitude'],
                                           df3['Longitude'],
                                           df3['Area'],
                                           df3['Subdivision']):
    label = '{}, {}'.format(Area, Subdivision)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(o_map)  
    
o_map

In [81]:
CLIENT_ID = '1MJYZYLOZVGTHAFNXUWFVQCT5JEDGTZCKZLPITYGXSSD2WKO'
CLIENT_SECRET = 'XIWD13KJ3NB1SKP4TREO2B533PAN2TK0UEAVSJG0XBN0EUYR' 
VERSION = '20181705'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1MJTNYLOZVGTHAFNXUWFVQCT5JEDPTZCKZLPITYGXSSD2WKO
CLIENT_SECRET:XIWD13KJ3NB1SKP4TREO2B533QGN2TK0UEAVSJG0XBN0EUYR


In [82]:
neighborhood_latitude = df3.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df3.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df3.loc[0, 'Area'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Government of Canada  and Gatineau offices are 45.4207, -75.7023.


### Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

In [83]:
LIMIT = 100 
radius = 500 

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url# display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=1MJTNYLOZVGTHAFNXUWFVQCT5JEDPTZCKZLPITYGXSSD2WKO&client_secret=XIWD13KJ3NB1SKP4TREO2B533QGN2TK0UEAVSJG0XBN0EUYR&v=20180605&ll=45.4207,-75.7023&radius=500&limit=100'

In [84]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f02cbee3638df69a26d7071'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-5739fdb2498ea29b0ec70f01-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/ramen_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1d1941735',
         'name': 'Noodle House',
         'pluralName': 'Noodle Houses',
         'primary': True,
         'shortName': 'Noodles'}],
       'id': '5739fdb2498ea29b0ec70f01',
       'location': {'address': '153 Bank St',
        'cc': 'CA',
        'city': 'Ottawa',
        'country': 'Canada',
        'crossStreet': 'btwn Slater St & Laurier Ave',
        'distance': 305,
        'formattedAddress': ['153 Bank St (btwn Slater St & Laurier Ave)',
         'Ottawa ON K1P 5N7',
         'Canada'],
        

In [85]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [86]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) 

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Sansotei Ramen 三草亭,Noodle House,45.41892,-75.699328
1,Alt Hotel Ottawa,Hotel,45.419973,-75.698948
2,Juice Monkey,Juice Bar,45.419537,-75.699445
3,Queen St Fare,Bar,45.420948,-75.69937
4,Stroked Ego,Men's Store,45.419286,-75.699698
5,Ottawa Streat Gourmet,Food Truck,45.420846,-75.69855
6,TELUS,IT Services,45.419587,-75.699479
7,Bier Markt,Restaurant,45.421657,-75.699641
8,Sheraton Ottawa Hotel,Hotel,45.420846,-75.697712
9,Caribbean Sizzler,Caribbean Restaurant,45.419274,-75.699715


In [87]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

57 venues were returned by Foursquare.


In [88]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Area', 
                  'Area Latitude', 
                  'Area Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [89]:
o_venues = getNearbyVenues(names=df3['Area'],
                                   latitudes=df3['Latitude'],
                                   longitudes=df3['Longitude']
                                  )

Government of Canada  and Gatineau offices
Riverview / Hawthorne
Alta Vista
Overbrook
Vanier
Rockcliffe Park / New Edinburgh
Lower Town / Sandy Hill / University of
Parliament Hill
West Downtown area
The Glebe /  South /  East
Riverside Park / Hunt Club West / Riverside South / YOW
West
Westboro
Highland Park / Carlingwood
Britannia / Pinecrest
Queensway / Copeland / Carlington / Carleton Heights
Centre Town


In [90]:
print(o_venues.shape)
o_venues.head()

(348, 7)


Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Government of Canada and Gatineau offices,45.4207,-75.7023,Sansotei Ramen 三草亭,45.41892,-75.699328,Noodle House
1,Government of Canada and Gatineau offices,45.4207,-75.7023,Alt Hotel Ottawa,45.419973,-75.698948,Hotel
2,Government of Canada and Gatineau offices,45.4207,-75.7023,Juice Monkey,45.419537,-75.699445,Juice Bar
3,Government of Canada and Gatineau offices,45.4207,-75.7023,Queen St Fare,45.420948,-75.69937,Bar
4,Government of Canada and Gatineau offices,45.4207,-75.7023,Stroked Ego,45.419286,-75.699698,Men's Store


In [91]:
o_venues.groupby('Area').count()

Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alta Vista,1,1,1,1,1,1
Britannia / Pinecrest,14,14,14,14,14,14
Centre Town,38,38,38,38,38,38
Government of Canada and Gatineau offices,57,57,57,57,57,57
Highland Park / Carlingwood,8,8,8,8,8,8
Lower Town / Sandy Hill / University of,16,16,16,16,16,16
Overbrook,17,17,17,17,17,17
Parliament Hill,38,38,38,38,38,38
Queensway / Copeland / Carlington / Carleton Heights,5,5,5,5,5,5
Riverside Park / Hunt Club West / Riverside South / YOW,3,3,3,3,3,3


In [92]:
print('There are {} uniques categories.'.format(len(o_venues['Venue Category'].unique())))

There are 126 uniques categories.


In [93]:
o_onehot = pd.get_dummies(o_venues[['Venue Category']], prefix="", prefix_sep="")


o_onehot['Area'] = o_venues['Area'] 

fixed_columns = [o_onehot.columns[-1]] + list(o_onehot.columns[:-1])
o_onehot = o_onehot[fixed_columns]

o_onehot.head()

Unnamed: 0,Area,Adult Boutique,Airport Service,American Restaurant,Art Gallery,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,...,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Government of Canada and Gatineau offices,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Government of Canada and Gatineau offices,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Government of Canada and Gatineau offices,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Government of Canada and Gatineau offices,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Government of Canada and Gatineau offices,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [94]:
o_onehot.shape

(348, 127)

In [95]:
o_grouped = o_onehot.groupby('Area').mean().reset_index()
o_grouped.head()

Unnamed: 0,Area,Adult Boutique,Airport Service,American Restaurant,Art Gallery,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,...,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Alta Vista,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Britannia / Pinecrest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Centre Town,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.052632
3,Government of Canada and Gatineau offices,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,...,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.0,0.0
4,Highland Park / Carlingwood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [96]:
o_grouped.shape

(16, 127)

In [97]:
num_top_venues = 5

for hood in o_grouped['Area']:
    print("----"+hood+"----")
    temp = o_grouped[o_grouped['Area'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alta Vista----
                 venue  freq
0           Playground   1.0
1       Adult Boutique   0.0
2  Moroccan Restaurant   0.0
3                Plaza   0.0
4          Pizza Place   0.0


----Britannia / Pinecrest----
                 venue  freq
0       Ice Cream Shop  0.21
1   Salon / Barbershop  0.07
2                 Bank  0.07
3  Fried Chicken Joint  0.07
4                 Park  0.07


----Centre Town----
         venue  freq
0  Yoga Studio  0.05
1  Coffee Shop  0.05
2   Restaurant  0.05
3          Pub  0.05
4    Bookstore  0.05


----Government of Canada  and Gatineau offices----
         venue  freq
0  Coffee Shop  0.11
1   Restaurant  0.07
2   Food Truck  0.05
3         Café  0.05
4        Hotel  0.05


----Highland Park / Carlingwood----
                  venue  freq
0                  Park  0.25
1           Sports Club  0.12
2          Skating Rink  0.12
3            Skate Park  0.12
4  Gym / Fitness Center  0.12


----Lower Town / Sandy Hill / University of----
      

In [98]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [99]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Area'] = o_grouped['Area']

for ind in np.arange(o_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(o_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alta Vista,Playground,Yoga Studio,Farmers Market,Cycle Studio,Dentist's Office,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
1,Britannia / Pinecrest,Ice Cream Shop,Health Food Store,Bank,Pet Store,Convenience Store,Park,Salon / Barbershop,Fast Food Restaurant,Pharmacy,Italian Restaurant
2,Centre Town,Yoga Studio,Bookstore,Restaurant,Coffee Shop,Pub,Beer Bar,Burger Joint,Pizza Place,Other Nightlife,Newsstand
3,Government of Canada and Gatineau offices,Coffee Shop,Restaurant,Café,Sushi Restaurant,Food Truck,Hotel,Sandwich Place,Gym,Middle Eastern Restaurant,Caribbean Restaurant
4,Highland Park / Carlingwood,Park,Skating Rink,Gym / Fitness Center,Gym,Skate Park,Sports Club,Dog Run,Diner,Dive Bar,Discount Store


Now we have all our venues for each neighborhood along with a dataframe that contains the 10 most common venues for each neighborhood. This concludes our data gathering/wrangling phase, we are now ready to preform some data analysis on this data to figure out an optimal spot for our restaurant.

## Methodology <a name="methodology"></a>

We are looking for high population density, lots of residential dwellings, and an area with few fast food venues.

In the previous step we have collected the required data, including postal codes, neighborhoods, top 10 venues in each neighborhood and population of each neighborhood.

The second step of our analysis will be plotting populations and densities and clustering our neighborhoods based on top 10 venues.

## Analysis <a name="analysis"></a>

In [100]:
# set number of clusters
kclusters = 5
o_grouped_clustering = o_grouped.drop('Area', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(o_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 1, 1, 1, 1, 1, 1, 4, 3])

In [101]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

o_merged = df3

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
o_merged = o_merged.join(neighborhoods_venues_sorted.set_index('Area'), on='Area')
o_merged = o_merged.dropna().reset_index()
o_merged = o_merged.drop('index', axis=1)
o_merged.head(100) # check the last columns!

Unnamed: 0,Postal Code,Area,Subdivision,Latitude,Longitude,"Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016",Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,K1A,Government of Canada and Gatineau offices,Ottawa,45.4207,-75.7023,589.0,484.0,352.0,1.0,Coffee Shop,Restaurant,Café,Sushi Restaurant,Food Truck,Hotel,Sandwich Place,Gym,Middle Eastern Restaurant,Caribbean Restaurant
1,K1H,Alta Vista,Ottawa,45.3876,-75.6593,15796.0,7180.0,6804.0,0.0,Playground,Yoga Studio,Farmers Market,Cycle Studio,Dentist's Office,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
2,K1K,Overbrook,Ottawa,45.4448,-75.6431,29499.0,14212.0,13482.0,1.0,Bank,Grocery Store,Restaurant,Convenience Store,Pet Store,Dentist's Office,Coffee Shop,Discount Store,Beer Store,Fast Food Restaurant
3,K1L,Vanier,Ottawa,45.44,-75.663,17021.0,10452.0,9369.0,1.0,Poutine Place,Pharmacy,Liquor Store,Financial or Legal Service,Chinese Restaurant,Diner,Discount Store,Dumpling Restaurant,Dog Run,Dive Bar
4,K1M,Rockcliffe Park / New Edinburgh,Ottawa,45.4491,-75.6818,6695.0,3180.0,2919.0,2.0,Park,Playground,Bus Stop,Curling Ice,Cycle Studio,Dentist's Office,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
5,K1N,Lower Town / Sandy Hill / University of,Ottawa,45.4289,-75.6844,25063.0,16708.0,13262.0,1.0,Middle Eastern Restaurant,Hotel,Indie Movie Theater,Liquor Store,Brewery,Korean Restaurant,Restaurant,Record Shop,Business Service,Coffee Shop
6,K1P,Parliament Hill,Ottawa,45.4225,-75.7026,340.0,469.0,230.0,1.0,Coffee Shop,Food Truck,Restaurant,Hotel,Gym,Café,Department Store,Caribbean Restaurant,Plaza,Comfort Food Restaurant
7,K1R,West Downtown area,Ottawa,45.4123,-75.7108,18730.0,12095.0,10518.0,1.0,Vietnamese Restaurant,Chinese Restaurant,Bubble Tea Shop,Asian Restaurant,Grocery Store,Sushi Restaurant,Light Rail Station,Café,Korean Restaurant,Curling Ice
8,K1S,The Glebe / South / East,Ottawa,45.399,-75.6871,28660.0,14379.0,12757.0,1.0,Restaurant,Café,Pub,Bakery,Event Space,Sporting Goods Shop,Department Store,Pizza Place,French Restaurant,Park
9,K1V,Riverside Park / Hunt Club West / Riverside So...,Ottawa,45.3281,-75.6719,54835.0,22427.0,21048.0,3.0,Airport Service,Bus Station,Café,Yoga Studio,Dive Bar,Event Space,English Restaurant,Dumpling Restaurant,Dog Run,Discount Store


In [102]:
# create map
map_clusters = folium.Map(location, zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for index, lat, lon, poi, cluster, pop in zip(range(len(o_merged)), o_merged['Latitude'], o_merged['Longitude'],
                                  o_merged['Area'], o_merged['Cluster Labels'], o_merged['Population, 2016']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster) + ' Pop ' + str(pop), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=o_merged.iloc[index]['Population, 2016']*.001,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)


       
map_clusters

In [103]:
for i in range(0,len(o_merged)):
   folium.Circle(
      location=[o_merged.iloc[i]['Longitude'], o_merged.iloc[i]['Latitude']],
      popup=o_merged.iloc[i]['Area'],
      radius=o_merged.iloc[i]['Population, 2016']*10,
      color='crimson',
      fill=True,
      fill_color='crimson'
   ).add_to(map_clusters)

In [104]:
o_merged.loc[o_merged['Cluster Labels'] == 0, o_merged.columns[[1] + list(range(5, o_merged.shape[1]))]]

Unnamed: 0,Area,"Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016",Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Alta Vista,15796.0,7180.0,6804.0,0.0,Playground,Yoga Studio,Farmers Market,Cycle Studio,Dentist's Office,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store


In [105]:
o_merged.loc[o_merged['Cluster Labels'] == 1, o_merged.columns[[1] + list(range(5, o_merged.shape[1]))]]

Unnamed: 0,Area,"Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016",Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Government of Canada and Gatineau offices,589.0,484.0,352.0,1.0,Coffee Shop,Restaurant,Café,Sushi Restaurant,Food Truck,Hotel,Sandwich Place,Gym,Middle Eastern Restaurant,Caribbean Restaurant
2,Overbrook,29499.0,14212.0,13482.0,1.0,Bank,Grocery Store,Restaurant,Convenience Store,Pet Store,Dentist's Office,Coffee Shop,Discount Store,Beer Store,Fast Food Restaurant
3,Vanier,17021.0,10452.0,9369.0,1.0,Poutine Place,Pharmacy,Liquor Store,Financial or Legal Service,Chinese Restaurant,Diner,Discount Store,Dumpling Restaurant,Dog Run,Dive Bar
5,Lower Town / Sandy Hill / University of,25063.0,16708.0,13262.0,1.0,Middle Eastern Restaurant,Hotel,Indie Movie Theater,Liquor Store,Brewery,Korean Restaurant,Restaurant,Record Shop,Business Service,Coffee Shop
6,Parliament Hill,340.0,469.0,230.0,1.0,Coffee Shop,Food Truck,Restaurant,Hotel,Gym,Café,Department Store,Caribbean Restaurant,Plaza,Comfort Food Restaurant
7,West Downtown area,18730.0,12095.0,10518.0,1.0,Vietnamese Restaurant,Chinese Restaurant,Bubble Tea Shop,Asian Restaurant,Grocery Store,Sushi Restaurant,Light Rail Station,Café,Korean Restaurant,Curling Ice
8,The Glebe / South / East,28660.0,14379.0,12757.0,1.0,Restaurant,Café,Pub,Bakery,Event Space,Sporting Goods Shop,Department Store,Pizza Place,French Restaurant,Park
10,West,19774.0,9988.0,9458.0,1.0,Coffee Shop,New American Restaurant,Art Gallery,Pub,Park,Record Shop,Plaza,Café,Pizza Place,French Restaurant
11,Westboro,20983.0,10909.0,10054.0,1.0,Coffee Shop,Sandwich Place,Bank,Restaurant,Middle Eastern Restaurant,Supermarket,Diner,Baseball Field,Shopping Mall,Bus Station
12,Highland Park / Carlingwood,16790.0,7433.0,7205.0,1.0,Park,Skating Rink,Gym / Fitness Center,Gym,Skate Park,Sports Club,Dog Run,Diner,Dive Bar,Discount Store


In [106]:
o_merged.loc[o_merged['Cluster Labels'] == 2, o_merged.columns[[1] + list(range(5, o_merged.shape[1]))]]

Unnamed: 0,Area,"Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016",Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Rockcliffe Park / New Edinburgh,6695.0,3180.0,2919.0,2.0,Park,Playground,Bus Stop,Curling Ice,Cycle Studio,Dentist's Office,Department Store,Dessert Shop,Dim Sum Restaurant,Diner


In [107]:
o_merged.loc[o_merged['Cluster Labels'] == 3, o_merged.columns[[1] + list(range(5, o_merged.shape[1]))]]

Unnamed: 0,Area,"Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016",Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Riverside Park / Hunt Club West / Riverside So...,54835.0,22427.0,21048.0,3.0,Airport Service,Bus Station,Café,Yoga Studio,Dive Bar,Event Space,English Restaurant,Dumpling Restaurant,Dog Run,Discount Store


In [108]:
o_merged.loc[o_merged['Cluster Labels'] == 4, o_merged.columns[[1] + list(range(5, o_merged.shape[1]))]]

Unnamed: 0,Area,"Population, 2016","Total private dwellings, 2016","Private dwellings occupied by usual residents, 2016",Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Queensway / Copeland / Carlington / Carleton H...,27941.0,12901.0,11863.0,4.0,Cosmetics Shop,Pharmacy,Park,Sandwich Place,Fast Food Restaurant,English Restaurant,Cycle Studio,Dentist's Office,Department Store,Dessert Shop


## Results and Discussion <a name="results"></a>

After having clustered our venues based on thier top 10 venues, and adding in population sizes as sizes of the labels we can see the clusters and the population in those areas clearly. 

Our analysis shows cluster 4 has a variety of venues without including fast food restaurants. Looking at the map we can see the denisity of residential housing and the high population in this area which makes it look like an ideal location for our restaurant. However we have only looked at the available data and it would be a good idea to see the area in person to see if there are other reasons for there being no fast food restaurants in the area. The recommended zone would however be a good starting point for an initial look at suitable buildings. 

## Conclusion <a name="conclusion"></a>

Since we were looking for areas in Ottawa with high population density, a high number of residential dwellings, and a low number of fast food restaurants. We identified that the boroughs of Queensway / Copeland / Carlington / Carleton Heights as the optimal areas to open a greek fast food restaurant.