# __Applied Data Science Capstone__

## Introduction  
___
#### Problem Description  

This notebook will explore the problem of a muscian or music lover moving to the Toronto area who is attempting to determine which area (borough) would best give them access to music venues.  We will attempt to take into account not only venues that exist in the borough but also their average rating to weight the score.  

Given this information we will plot the boroughs with a color scale to show which areas may be more appealing to move to.  

#### Data Description  

For this problem exploration we will use a mix of data from the following sources  
1. A list of postal codes and associated boroughs and neighborhoods scraped from the following [wiki page](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M), omitting any values that do not have any assigned borough
1. The above info will be merged with a list of locations (longitude and latitude) provided by the __geocoder__ library
1. The Foursquare API will be leveraged to pull a list of music venues in each of these locations along with their average rating so that they can be weighted

## Methodology and Analysis  
___

In [7]:
# Import required libraries
from bs4 import BeautifulSoup
import json
import numpy as np
import pandas as pd
import requests

In [63]:
# Define empty Dataframe for data
column_names = ['PostalCode', 'Borough', 'Neighborhood'] 
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,PostalCode,Borough,Neighborhood


In [64]:
# Retrieve data from wiki page
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
neighborhoods_rows = soup.select("table.wikitable")[0].select("tr")
neighborhoods_rows[:3]

[<tr>
 <th>Postal Code
 </th>
 <th>Borough
 </th>
 <th>Neighbourhood
 </th></tr>,
 <tr>
 <td>M1A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>,
 <tr>
 <td>M2A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>]

In [65]:
# Parse data from wiki page
# Skip the first row of headers
for row in neighborhoods_rows[1:]:
    postal_code, borough, neighborhood = list(map((lambda x: x.text.strip()), row.select("td")))
    if borough == "Not assigned":
        continue
    if neighborhood == "Not assigned":
        neighborhood = borough 
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood,
                                          'PostalCode': postal_code}, ignore_index=True)

In [66]:
neighborhoods.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [67]:
neighborhoods.shape

(103, 3)

___  

Use CSV file to associate location with postal codes

In [68]:
!wget -q -O "toronto_locations.csv" https://cocl.us/Geospatial_data
print("Data downloaded")

Data downloaded


In [69]:
# Import location data into df
locations_df = pd.read_csv("toronto_locations.csv")
locations_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [70]:
# Merge neighborhoods and location data into one dataframe
neighborhoods = neighborhoods.merge(locations_df, left_on="PostalCode", right_on="Postal Code")

In [71]:
# Drop extra postal code column
neighborhoods.drop(["Postal Code"], axis=1, inplace=True)

In [72]:
neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


___  

Get music venue within 1000 meters and score them based on the following criteria  
- 1 points: venue has an average Foursquare rating less than 4
- 2 points: venue has an average Foursquare rating between 4 and 6.5 or has no rating
- 3 points: venue has an average Foursquare rating between 6.5 and 8.5
- 4 points: venue has an average Foursquare rating greater than 8.5

In [73]:
# The code was removed by Watson Studio for sharing.

In [74]:
column_names = ['PostalCode', 'Score'] 
neighborhood_music_score = pd.DataFrame(columns=column_names)
neighborhood_music_score

Unnamed: 0,PostalCode,Score


In [76]:
# Get and parse all "live music" venues within 1000 meters from the location of the borough
# Establish search qualifiers
search_query = "live%20music"
radius = 1000
venue_scores = {}
print("Finished with: ", end=" ")

# Loop through each borough getting all values returned from query
for index, neigh in neighborhoods.iterrows():
    neigh_score = 0
    neigh_lat = neigh["Latitude"]
    neigh_long = neigh["Longitude"]
    url = f"https://api.foursquare.com/v2/venues/search?client_id={foursquare_id}&client_secret={foursquare_secret}&ll={neigh_lat},{neigh_long}&v={version}&query={search_query}&radius={radius}"
    results = requests.get(url).json()
    
    # Loop through each venue returned above and query for their rating
    # Ratings will be saved to venue_scores dict to avoid querying twice
    for venue in results["response"]["venues"]:
        venue_id = venue["id"]
        try:
            venue_rating = venue_scores[venue_id]
        except KeyError:
            venue_url = f"https://api.foursquare.com/v2/venues/{venue_id}?client_id={foursquare_id}&client_secret={foursquare_secret}&v=20201027"
            venue_result = requests.get(url).json()
            try:
                venue_rating = venue_result["response"]["venue"]["rating"]
            except:
                venue_rating = 0
            venue_scores[venue_id] = venue_rating
        
        # Score venues and add to borough score
        if venue_rating == 0: 
            neigh_score += 2
        elif venue_rating < 4: 
            neigh_score += 1
        elif venue_rating < 6.5:
            neigh_score += 2
        elif venue_rating < 8.5:
            neigh_score += 3
        else:
            neigh_score += 4
    print(index, end=", ")
    
    # Add scores to dataframe
    neighborhood_music_score = neighborhood_music_score.append({"PostalCode": neigh["PostalCode"],
                                                                "Score": neigh_score}, ignore_index=True)

Finished with:  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 

In [78]:
# Merge neighborhoods and scores data into one dataframe
neighborhoods = neighborhoods.merge(neighborhood_music_score, left_on="PostalCode", right_on="PostalCode")

In [83]:
neighborhoods.sort_values(by=['Score'], ascending=False).head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Score
170,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.648429,-79.38228,60
61,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,60
60,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,60
49,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,60
165,M5W,Downtown Toronto,Stn A PO Boxes,43.646435,-79.374846,60
97,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817,60
19,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,60
157,M5T,Downtown Toronto,"Kensington Market, Chinatown, Grange Park",43.653206,-79.400049,60
9,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,60
85,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,60


In [48]:
# Install and import folium for map creation
!pip install folium==0.5.0
import folium # map rendering library



Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 9.1 MB/s  eta 0:00:01
[?25hCollecting branca
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=620562184bd2efe03d0d77a66d987d2f8378dd7968b38a179bb3defd1e9f586d
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.5.0


In [51]:
# Install colour library for coloring folium map points
!pip install colour
from colour import Color

Collecting colour
  Downloading colour-0.1.5-py2.py3-none-any.whl (23 kB)
Installing collected packages: colour
Successfully installed colour-0.1.5


In [87]:
# Create a list of colors to be used on the map to indicate which boroughs have high scores
lightskyblue = Color("lightskyblue")
colors = list(lightskyblue.range_to(Color("darkred"),61))
converted_colors = list(map((lambda x: x.hex), colors))

In [84]:
import branca

In [92]:
# Create map on toronto neighborhoods with relation to music score

toronto_latitude = 43.6529
toronto_longitude = -79.3849

map_toronto = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=10)

for lat, lng, borough, neighborhood, score in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood'], neighborhoods['Score']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=colors[score].hex,
        fill=True,
        fill_color=colors[score].hex,
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)
    
colormap = branca.colormap.LinearColormap(converted_colors, vmin=0, vmax=61)
colormap.caption = "Borough music score"
colormap.add_to(map_toronto)

map_toronto