<a href="https://colab.research.google.com/github/freezingMonkeys/Coursera_Capstone/blob/main/Coursera_Capstone_Final_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Coursera Capstone Project: Battle of the Neighbourhoods</h1>

<h3>BY: Dakota Chang </h3>



---



<h1>1. Introduction </h1>
Discussion of the business problem and the audience who would be interested in this project.


**Description of the Problem and Background**

*Scenario/Target Audience:*

My friend will be moving to Vancouver soon, but she is not quite familiar with the city. She is a lover of food and would like to learn about the neighbourhoods and distrubution of international foods within them. Since Vancouver has a diverse population and is a popular choice for a lot of immigrants, this project will be helpful to a lot of other people who also love trying different foods and are moving to Vancouver.

*Business Problem:*

The challenge is to find locations with a similar profile of international cuisine in close proximity to it and label it so it can be easily comprehended.



---



<h1>2. Data</h1>
Description of the data and its sources that will be used to solve the problem



**Description of the Data**

*Sources:*


*   [Postal Codes and Long Lat of Locations in BC](https://www.geonames.org/postal-codes/CA/BC/british-columbia.html): This website will be scraped and engineered into a dataframe of places in Vancouver.
*   [Foursquare](https://http://foursquare.com/): Foursqaure will be scraped for the nearby venues of each location.



*Steps of Data Cleaning and Engineering:*


1.   Scrape data from geonames.org in order to create a dataframe with postal codes, longitude and latitude of locations in BC province, Canada
2.   Engineer the data such that the dataframe would consist of the right postal codes, longitudes and latitudes.
3.   Match up the longitudes and latitudes of the collected data and use the Foursquare API to find the nearest venues of all locations
4.   Remove unrelated venues from the dataframe (i.e. keep only restaurants/international cuisines)
5.   Apply one-hot encoding to the dataframe so it can be easily processed



---



<h2><u><b>PART 1:</b></u></h2>

Step 1: Grabbing the data

In [1]:
# importing needed libraries
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup as soup

In [2]:
# parsing url content
url = 'https://www.geonames.org/postal-codes/CA/BC/british-columbia.html'
xml_page = requests.get(url).content
soup_page = soup(xml_page, 'xml')
table = soup_page.find('table', {'class': 'restable'})

Step 2: Engineering the Data into its corresponding postal codes and longtitudes+latitudes

In [3]:
# grabbing longtitudes and latitudes
lng_lats = table.findAll('a')
lng_lats[0].text

'49.323/-122.863'

In [4]:
# grabbing postal codes, town names, and admins
table_content = table.findAll('tr')
table_content = str(table_content)
table_content = table_content.split('<td>')

In [5]:
# grabbing postal codes
towns = []
postal_codes = []
for a in table_content:
  if (';<a ' not in a) and ('<small' not in a) and ('odd' not in a) and ('th' not in a) and (not a == '<tr><td/>') and (not a == 'Canada</td>') and ('British Columbia' not in a):
    if len(a)==8:
      postal_codes.append(a)
    else:
      towns.append(a)
  


In [6]:
# creating the dataframe from collected data
dictionary = {
    'lng_lats' : lng_lats[:192],
    'Postal Code': postal_codes[:192]
}
df = pd.DataFrame(dictionary)
df

Unnamed: 0,lng_lats,Postal Code
0,[[49.323/-122.863]],V3H</td>
1,[[49.221/-122.69]],V3Y</td>
2,[[49.026/-122.806]],V4B</td>
3,[[49.481/-119.586]],V2A</td>
4,[[49.866/-119.739]],V4T</td>
...,...,...
187,[[49.12/-123.117]],V7A</td>
188,[[48.783/-123.703]],V9L</td>
189,[[49.264/-122.937]],V5A</td>
190,[[49.277/-122.976]],V5B</td>


In [7]:
# cleaning the dataset
df['latitude'] = df['lng_lats'].apply(lambda x:float(str(x.text).split('/')[0]))
df['longitude'] = df['lng_lats'].apply(lambda x:float(str(x.text).split('/')[1]))

In [8]:
df.drop('lng_lats', axis=1, inplace=True)

In [9]:
df

Unnamed: 0,Postal Code,latitude,longitude
0,V3H</td>,49.323,-122.863
1,V3Y</td>,49.221,-122.690
2,V4B</td>,49.026,-122.806
3,V2A</td>,49.481,-119.586
4,V4T</td>,49.866,-119.739
...,...,...,...
187,V7A</td>,49.120,-123.117
188,V9L</td>,48.783,-123.703
189,V5A</td>,49.264,-122.937
190,V5B</td>,49.277,-122.976


In [10]:
# cleaning postal codes
df['Postal Code'] = df['Postal Code'].apply(lambda x:str(x)[:3])

In [11]:
df

Unnamed: 0,Postal Code,latitude,longitude
0,V3H,49.323,-122.863
1,V3Y,49.221,-122.690
2,V4B,49.026,-122.806
3,V2A,49.481,-119.586
4,V4T,49.866,-119.739
...,...,...,...
187,V7A,49.120,-123.117
188,V9L,48.783,-123.703
189,V5A,49.264,-122.937
190,V5B,49.277,-122.976


<h2><u><b>PART 2:</b></u></h2>

Step 1: Grab long+lats of BC from geocoder

In [12]:
from geopy.geocoders import Nominatim 
GeoLocator = Nominatim(user_agent='My-IBMNotebook')

address = 'BC, Canada'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of BC, Canada are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of BC, Canada are 55.001251, -125.002441.


Step 2: Draw map using the long+lats

In [13]:
import folium
map_BC = folium.Map(location=[latitude, longitude], zoom_start=5)

# add markers to map
for lat, lng, postal in zip(df['latitude'], df['longitude'], df['Postal Code']):
    label = '{}'.format(postal)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_BC)  
    
map_BC

Step 3: Grab information of nearby venues in BC from Foursquare

In [14]:
#@title code contains private information
CLIENT_ID = 'YOUR_ID'
CLIENT_SECRET = 'YOUR_SECRET'
VERSION = '20180605'
LIMIT = 200
radius = 5000 # define radius
latitude = 49.278 # location redefined due to lack of information from Foursquare
logitude = -123.091 # location redefined due to lack of information from Foursquare
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)

In [15]:
results = requests.get(url).json()

In [16]:
import json

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,categories,lat,lng
0,Drinkwaters Social House,Restaurant,49.28543,-124.976655
1,Paulette's Cleaning Service Ltd,Home Service,49.276154,-124.98096
2,Sproat Lake Landing,Harbor / Marina,49.285698,-124.976043
3,Van Isle Construction Ltd,Business Service,49.262651,-124.949073


In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 
                  'Postal Area Latitude', 
                  'Postal Area Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

bc_venues = getNearbyVenues(names=df['Postal Code'],
                                   latitudes=df['latitude'],
                                   longitudes=df['longitude']
                                  )

V3H
V3Y
V4B
V2A
V4T
V4V
V1A
V8K
V2R
V2S
V2T
V2V
V2W
V2X
V2Y
V2Z
V3C
V2C
V2E
V2H
V2J
V2K
V2L
V2M
V2N
V2P
V0C
V0E
V0G
V0H
V0J
V0K
V0L
V0M
V0N
V0P
V0R
V0S
V0T
V0V
V0W
V0X
V1H
V1J
V3E
V3G
V3J
V3K
V3L
V3M
V3N
V3R
V3S
V3T
V1L
V1M
V1P
V1S
V1T
V1V
V1W
V1X
V1Y
V1Z
V5P
V5R
V5S
V5T
V5V
V5W
V5X
V5Y
V5Z
V6C
V6E
V6G
V6H
V6J
V6K
V6L
V6M
V6N
V6P
V6R
V6S
V6V
V6W
V6X
V6Y
V6Z
V7B
V7C
V7E
V7G
V7H
V7J
V7K
V7L
V7M
V8S
V8T
V8V
V8W
V8X
V8Y
V8Z
V9B
V9C
V9E
V9J
V9K
V7N
V7P
V7R
V7S
V7T
V7V
V7W
V7X
V7Y
V8B
V8C
V8E
V8J
V8M
V8N
V8P
V8R
V4P
V4R
V4S
V4W
V4X
V4Z
V5C
V5E
V5G
V5H
V5J
V5K
V5L
V5M
V5N
V9N
V9R
V9S
V9T
V9V
V9W
V9X
V9Y
V9Z
V3V
V3W
V3X
V3Z
V4C
V4E
V4G
V4K
V4L
V4M
V4N
V3A
V2G
V0A
V1E
V1N
V1R
V6T
V9A
V9G
V9H
V8A
V8G
V8L
V9M
V4A
V3B
V2B
V0B
V1B
V1C
V1G
V1K
V6A
V6B
V7A
V9L
V5A
V5B
V9P


In [18]:
bc_venues

Unnamed: 0,Postal Code,Postal Area Latitude,Postal Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,V3H,49.323,-122.863,Buntzen Lake,49.324467,-122.856549,Lake
1,V3Y,49.221,-122.690,Jolly Coachman,49.222584,-122.690512,Pub
2,V3Y,49.221,-122.690,Foamers' Folly Brewing Co.,49.224774,-122.689383,Brewery
3,V3Y,49.221,-122.690,Waves Coffee House,49.222068,-122.689869,Coffee Shop
4,V3Y,49.221,-122.690,GoodLife Fitness,49.220698,-122.688491,Gym / Fitness Center
...,...,...,...,...,...,...,...
1790,V5B,49.277,-122.976,Kensington Pitch & Putt,49.276449,-122.977739,Golf Course
1791,V5B,49.277,-122.976,Zeta Café,49.280516,-122.977911,Tea Room
1792,V5B,49.277,-122.976,Kensington Park,49.277777,-122.982636,Park
1793,V5B,49.277,-122.976,Mac's Convenience Stores,49.277797,-122.982638,Convenience Store


Step 4: Engineer dataframe such that only food related venues are left

In [19]:
bc_venues['Venue Category'].unique()

array(['Lake', 'Pub', 'Brewery', 'Coffee Shop', 'Gym / Fitness Center',
       'Convenience Store', 'Vietnamese Restaurant', 'Sandwich Place',
       'Gym', 'Grocery Store', 'Japanese Restaurant', 'Bistro',
       'Elementary School', 'Waterfront', 'Café', 'Seafood Restaurant',
       'Greek Restaurant', 'Gastropub', 'Health Food Store',
       'Gas Station', 'Steakhouse', 'American Restaurant',
       'Thai Restaurant', 'Mobility Store', 'Museum', 'Ice Cream Shop',
       'Dessert Shop', 'Breakfast Spot', 'Restaurant', 'Pizza Place',
       'Fast Food Restaurant', 'Pharmacy', 'Supermarket', 'Burger Joint',
       'Factory', 'Liquor Store', 'Diner', 'Farm', 'German Restaurant',
       'Plaza', 'Bus Line', 'Business Service',
       'Construction & Landscaping', 'Campground', 'Sushi Restaurant',
       'Inn', 'Thrift / Vintage Store', 'Baseball Field', 'Hotel',
       'Motel', 'Moving Target', 'Shop & Service', 'Playground',
       'Smoothie Shop', 'Food Truck', 'Bakery', 'Park', 'Hocke

In [20]:
food_venues = ['Pub', 'Brewery', 'Coffee Shop', 'Vietnamese Restaurant', 'Sandwich Place', 'Japanese Restaurant', 'Bistro', 'Seafood Restaurant', 'Greek Restaurant', 'Gastropub', 'Steakhouse', 'American Restaurant', 'Thai Restaurant', 'Ice Cream Shop',
'Dessert Shop', 'Breakfast Spot', 'Pizza Place',
'Fast Food Restaurant', 'Burger Joint', 'Liquor Store', 'Diner', 'German Restaurant',
'Sushi Restaurant', 'Smoothie Shop', 'Food Truck', 'Bakery', 
'Food & Drink Shop','Comfort Food Restaurant',
'Tea Room', 'Deli / Bodega', 'Bar', 'Sports Bar',
'Indian Restaurant', 'Falafel Restaurant', 'Chinese Restaurant', 'Malay Restaurant', 
'Donut Shop', 'Filipino Restaurant', 'Bubble Tea Shop',
'Asian Restaurant', 'Dim Sum Restaurant', 'Korean Restaurant', 'Burrito Place', 'Candy Store', 
'Noodle House', 'Middle Eastern Restaurant', 'Fish Market', 'Fried Chicken Joint',
'Fish & Chips Shop', 'Farmers Market', 'Juice Bar', 'Ethiopian Restaurant',
'Cocktail Bar', 'Cantonese Restaurant',
'Taco Place', 'Australian Restaurant', 'Mediterranean Restaurant', 'New American Restaurant',
'Chocolate Shop', 'Irish Pub', 'Poke Place', 'Hot Dog Joint',
'Vegetarian / Vegan Restaurant', 'French Restaurant', 'Gay Bar',
'Italian Restaurant', 'South Indian Restaurant',
'Ramen Restaurant', 'Wine Shop',
'Mexican Restaurant', 'Hawaiian Restaurant',
'Himalayan Restaurant', 'Hotpot Restaurant', 'Frozen Yogurt Shop', 'Bagel Shop',
'Southern / Soul Food Restaurant', 'Caribbean Restaurant', 'Dumpling Restaurant',
'Taiwanese Restaurant', 'Portuguese Restaurant', 'Lebanese Restaurant',
'Wine Bar', 'Gaming Cafe', 'Tapas Restaurant', 
'Turkish Restaurant', 'BBQ Joint', 'Modern European Restaurant',
'Indonesian Restaurant', 'Cajun / Creole Restaurant', 
'Japanese Curry Restaurant',  'Moroccan Restaurant',
'Apres Ski Bar', 'Mongolian Restaurant', 'North Indian Restaurant',
'Salad Place', 'Beer Garden', 'Food Service','Pie Shop', 'Cheese Shop', 'Belgian Restaurant','Hookah Bar']

def food_venue(x):
  if x in food_venues:
    return 1
  return

bc_venues['food_venue'] = bc_venues['Venue Category'].apply(food_venue)

In [21]:
bc_food_venues = bc_venues.dropna()

In [22]:
bc_food_venues = bc_food_venues.drop('food_venue', axis=1)

In [23]:
bc_food_venues.groupby('Postal Code').count()

Unnamed: 0_level_0,Postal Area Latitude,Postal Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
V0A,1,1,1,1,1,1
V1A,3,3,3,3,3,3
V1E,1,1,1,1,1,1
V1J,4,4,4,4,4,4
V1K,1,1,1,1,1,1
...,...,...,...,...,...,...
V9K,2,2,2,2,2,2
V9L,10,10,10,10,10,10
V9S,4,4,4,4,4,4
V9Y,3,3,3,3,3,3


Step 5: Apply one-hot encoding to the nearby venues for KNN clustering

In [24]:
# one hot encoding
bc_onehot = pd.get_dummies(bc_food_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bc_onehot['Postal Code'] = bc_food_venues['Postal Code'] 

# move neighborhood column to the first column
fixed_columns = [bc_onehot.columns[-1]] + list(bc_onehot.columns[:-1])
bc_onehot = bc_onehot[fixed_columns]

bc_onehot.head()

Unnamed: 0,Postal Code,American Restaurant,Apres Ski Bar,Asian Restaurant,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Beer Garden,Belgian Restaurant,Bistro,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,...,Japanese Restaurant,Juice Bar,Korean Restaurant,Lebanese Restaurant,Liquor Store,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Mongolian Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Pie Shop,Pizza Place,Poke Place,Portuguese Restaurant,Pub,Ramen Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Smoothie Shop,South Indian Restaurant,Southern / Soul Food Restaurant,Sports Bar,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop
1,V3Y,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,V3Y,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,V3Y,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,V3Y,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
7,V3Y,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [25]:
bc_grouped = bc_onehot.groupby('Postal Code').mean().reset_index()
bc_grouped.head()

Unnamed: 0,Postal Code,American Restaurant,Apres Ski Bar,Asian Restaurant,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Beer Garden,Belgian Restaurant,Bistro,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,...,Japanese Restaurant,Juice Bar,Korean Restaurant,Lebanese Restaurant,Liquor Store,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Mongolian Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Pie Shop,Pizza Place,Poke Place,Portuguese Restaurant,Pub,Ramen Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Smoothie Shop,South Indian Restaurant,Southern / Soul Food Restaurant,Sports Bar,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop
0,V0A,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,V1A,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,V1E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,V1J,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,V1K,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 5

indicators = ['V']

# create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
postal_venues_sorted = pd.DataFrame(columns=columns)
postal_venues_sorted['Postal Code'] = bc_grouped['Postal Code']

for ind in np.arange(bc_grouped.shape[0]):
    postal_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bc_grouped.iloc[ind, :], num_top_venues)

postal_venues_sorted.head()

Unnamed: 0,Postal Code,1V Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,V0A,Bakery,Wine Shop,Fish & Chips Shop,Dessert Shop,Dim Sum Restaurant
1,V1A,American Restaurant,Fast Food Restaurant,German Restaurant,Asian Restaurant,Fish & Chips Shop
2,V1E,Modern European Restaurant,Wine Shop,Filipino Restaurant,Dessert Shop,Dim Sum Restaurant
3,V1J,Coffee Shop,Ice Cream Shop,Liquor Store,Filipino Restaurant,Dessert Shop
4,V1K,Sandwich Place,Wine Shop,Coffee Shop,Deli / Bodega,Dessert Shop


Step 5: Use KNN to cluster similar locations

In [27]:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 10
bc_grouped_clustering = bc_grouped.drop('Postal Code', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

# add clustering labels
postal_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
bc_merged = df

bc_merged = bc_merged.join(postal_venues_sorted.set_index('Postal Code'), on='Postal Code')

bc_merged # check the last columns!

Unnamed: 0,Postal Code,latitude,longitude,Cluster Labels,1V Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,V3H,49.323,-122.863,,,,,,
1,V3Y,49.221,-122.690,5.0,Brewery,Bistro,Pub,Sandwich Place,Japanese Restaurant
2,V4B,49.026,-122.806,5.0,Japanese Restaurant,Seafood Restaurant,American Restaurant,Ice Cream Shop,Coffee Shop
3,V2A,49.481,-119.586,5.0,Fast Food Restaurant,Coffee Shop,Pizza Place,Liquor Store,Diner
4,V4T,49.866,-119.739,,,,,,
...,...,...,...,...,...,...,...,...,...
187,V7A,49.120,-123.117,,,,,,
188,V9L,48.783,-123.703,5.0,Coffee Shop,Fast Food Restaurant,Asian Restaurant,Hookah Bar,Pizza Place
189,V5A,49.264,-122.937,5.0,Burger Joint,Wine Shop,Filipino Restaurant,Dessert Shop,Dim Sum Restaurant
190,V5B,49.277,-122.976,5.0,Vietnamese Restaurant,Tea Room,Sushi Restaurant,Coffee Shop,Italian Restaurant


In [28]:
bc_merged = bc_merged.dropna()
bc_merged['Cluster Labels'] = bc_merged['Cluster Labels'].apply(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [29]:
import matplotlib as plt
from matplotlib import cm
from matplotlib import colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=5)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bc_merged['latitude'], bc_merged['longitude'], bc_merged['Postal Code'], bc_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [33]:
bc_merged[bc_merged['Cluster Labels'] == 0]

Unnamed: 0,Postal Code,latitude,longitude,Cluster Labels,1V Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
101,V8V,48.417,-123.365,0,Italian Restaurant,Wine Shop,Fish & Chips Shop,Dessert Shop,Dim Sum Restaurant
177,V4A,49.037,-122.83,0,Italian Restaurant,Wine Shop,Fish & Chips Shop,Dessert Shop,Dim Sum Restaurant


In [34]:
bc_merged[bc_merged['Cluster Labels'] == 1]

Unnamed: 0,Postal Code,latitude,longitude,Cluster Labels,1V Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
66,V5S,49.218,-123.038,1,Chinese Restaurant,Farmers Market,Deli / Bodega,Pizza Place,Sushi Restaurant
69,V5W,49.233,-123.092,1,Chinese Restaurant,Pizza Place,Bubble Tea Shop,Bakery,Dessert Shop
71,V5Y,49.249,-123.11,1,Chinese Restaurant,Coffee Shop,Dessert Shop,Filipino Restaurant,Dim Sum Restaurant
72,V5Z,49.248,-123.121,1,Coffee Shop,Chinese Restaurant,Vietnamese Restaurant,Dessert Shop,Cantonese Restaurant
79,V6L,49.25,-123.166,1,Caribbean Restaurant,Bakery,Italian Restaurant,Wine Shop,Fish & Chips Shop
80,V6M,49.234,-123.145,1,Chinese Restaurant,Asian Restaurant,Sushi Restaurant,Wine Shop,Filipino Restaurant
82,V6P,49.215,-123.14,1,Chinese Restaurant,Sushi Restaurant,Bubble Tea Shop,Dessert Shop,Dim Sum Restaurant
88,V6Y,49.17,-123.137,1,Chinese Restaurant,Coffee Shop,Japanese Restaurant,Bubble Tea Shop,Bakery
110,V9K,49.347,-124.436,1,American Restaurant,Cajun / Creole Restaurant,Filipino Restaurant,Dessert Shop,Dim Sum Restaurant


In [36]:
bc_merged[bc_merged['Cluster Labels'] == 2]

Unnamed: 0,Postal Code,latitude,longitude,Cluster Labels,1V Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
50,V3N,49.227,-122.93,2,Indian Restaurant,Wine Shop,Fish & Chips Shop,Dessert Shop,Dim Sum Restaurant
70,V5X,49.216,-123.098,2,Indian Restaurant,Wine Shop,Fish & Chips Shop,Dessert Shop,Dim Sum Restaurant


In [37]:
bc_merged[bc_merged['Cluster Labels'] == 3]

Unnamed: 0,Postal Code,latitude,longitude,Cluster Labels,1V Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
43,V1J,56.25,-120.853,3,Coffee Shop,Ice Cream Shop,Liquor Store,Filipino Restaurant,Dessert Shop
54,V1L,49.5,-117.286,3,Coffee Shop,Liquor Store,Fast Food Restaurant,Filipino Restaurant,Dessert Shop
63,V1Z,49.862,-119.583,3,Japanese Restaurant,Liquor Store,Filipino Restaurant,Dessert Shop,Dim Sum Restaurant
92,V7E,49.132,-123.17,3,Pub,Liquor Store,Wine Shop,Coffee Shop,Deli / Bodega
115,V7T,49.332,-123.142,3,Mexican Restaurant,Liquor Store,Wine Shop,Fast Food Restaurant,Dessert Shop
150,V9Y,49.241,-124.803,3,Diner,Ice Cream Shop,Liquor Store,Wine Shop,Filipino Restaurant
154,V3X,49.107,-122.858,3,Liquor Store,Wine Shop,Filipino Restaurant,Dessert Shop,Dim Sum Restaurant


In [38]:
bc_merged[bc_merged['Cluster Labels'] == 4]

Unnamed: 0,Postal Code,latitude,longitude,Cluster Labels,1V Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
142,V5N,49.255,-123.067,4,Bakery,Wine Shop,Fish & Chips Shop,Dessert Shop,Dim Sum Restaurant
151,V9Z,48.375,-123.728,4,Diner,Wine Shop,Filipino Restaurant,Deli / Bodega,Dessert Shop
165,V0A,51.299,-116.939,4,Bakery,Wine Shop,Fish & Chips Shop,Dessert Shop,Dim Sum Restaurant
170,V9A,48.45,-123.419,4,Diner,Bakery,Wine Shop,Fish & Chips Shop,Dessert Shop


In [39]:
bc_merged[bc_merged['Cluster Labels'] == 5]

Unnamed: 0,Postal Code,latitude,longitude,Cluster Labels,1V Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,V3Y,49.221,-122.690,5,Brewery,Bistro,Pub,Sandwich Place,Japanese Restaurant
2,V4B,49.026,-122.806,5,Japanese Restaurant,Seafood Restaurant,American Restaurant,Ice Cream Shop,Coffee Shop
3,V2A,49.481,-119.586,5,Fast Food Restaurant,Coffee Shop,Pizza Place,Liquor Store,Diner
5,V4V,50.022,-119.405,5,Japanese Restaurant,Fast Food Restaurant,Pub,Sandwich Place,Burger Joint
6,V1A,49.683,-115.986,5,American Restaurant,Fast Food Restaurant,German Restaurant,Asian Restaurant,Fish & Chips Shop
...,...,...,...,...,...,...,...,...,...
185,V6A,49.278,-123.091,5,Cheese Shop,Pie Shop,Coffee Shop,Pub,Sandwich Place
186,V6B,49.279,-123.114,5,Coffee Shop,Taco Place,Breakfast Spot,Belgian Restaurant,Chinese Restaurant
188,V9L,48.783,-123.703,5,Coffee Shop,Fast Food Restaurant,Asian Restaurant,Hookah Bar,Pizza Place
189,V5A,49.264,-122.937,5,Burger Joint,Wine Shop,Filipino Restaurant,Dessert Shop,Dim Sum Restaurant


In [40]:
bc_merged[bc_merged['Cluster Labels'] == 6]

Unnamed: 0,Postal Code,latitude,longitude,Cluster Labels,1V Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
48,V3L,49.22,-122.9,6,Bar,Wine Shop,Fish & Chips Shop,Dessert Shop,Dim Sum Restaurant


In [41]:
bc_merged[bc_merged['Cluster Labels'] == 7]

Unnamed: 0,Postal Code,latitude,longitude,Cluster Labels,1V Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
14,V2Y,49.128,-122.624,7,Sandwich Place,Wine Shop,Coffee Shop,Deli / Bodega,Dessert Shop
184,V1K,50.112,-120.794,7,Sandwich Place,Wine Shop,Coffee Shop,Deli / Bodega,Dessert Shop


In [42]:
bc_merged[bc_merged['Cluster Labels'] == 8]

Unnamed: 0,Postal Code,latitude,longitude,Cluster Labels,1V Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
124,V8M,48.566,-123.419,8,Coffee Shop,Filipino Restaurant,Deli / Bodega,Dessert Shop,Dim Sum Restaurant
153,V3W,49.141,-122.857,8,Coffee Shop,Filipino Restaurant,Deli / Bodega,Dessert Shop,Dim Sum Restaurant
171,V9G,48.983,-123.819,8,Coffee Shop,Filipino Restaurant,Deli / Bodega,Dessert Shop,Dim Sum Restaurant


In [43]:
bc_merged[bc_merged['Cluster Labels'] == 9]

Unnamed: 0,Postal Code,latitude,longitude,Cluster Labels,1V Most Common Venue,2th Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue
168,V1R,49.1,-117.702,9,Food Service,Wine Shop,Filipino Restaurant,Deli / Bodega,Dessert Shop




---

<h1>Results: </h1>

<p>By simply checking the map, one can identify the similar locations and resteraunt profiles. You can check each cluster label (as printed above) for reference of what type of profile each location has based on its color. Thanks!</p>

---