# __Applied Data Science Capstone__

## Introduction  
___
#### Problem Description  

This notebook will explore the problem of a muscian or music lover moving to the Toronto area who is attempting to determine which area (borough) would best give them access to music venues.  We will attempt to take into account not only venues that exist in the borough but also their average rating to weight the score.  

Given this information we will plot the boroughs with a color scale to show which areas may be more appealing to move to.  

#### Data Description  

For this problem exploration we will use a mix of data from the following sources  
1. A list of postal codes and associated boroughs and neighborhoods scraped from the following [wiki page](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M), omitting any values that do not have any assigned borough
1. The above info will be merged with a list of locations (longitude and latitude) provided by the __geocoder__ library
1. The Foursquare API will be leveraged to pull a list of music venues in each of these locations along with their average rating so that they can be weighted

In [53]:
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import requests

In [2]:
print("Hello Capstone Project Course!")

Hello Capstone Project Course!


___

#### __Week 3: Segmenting and Clustering Neighborhoods in Toronto__  

Problem 1) Generating the dataframe

In [12]:
# Define empty Dataframe for data
column_names = ['PostalCode', 'Borough', 'Neighborhood'] 
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,PostalCode,Borough,Neighborhood


In [7]:
# Retrieve data from wiki page
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
neighborhoods_rows = soup.select("table.wikitable")[0].select("tr")
neighborhoods_rows[:3]

[<tr>
 <th>Postal Code
 </th>
 <th>Borough
 </th>
 <th>Neighbourhood
 </th></tr>,
 <tr>
 <td>M1A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>,
 <tr>
 <td>M2A
 </td>
 <td>Not assigned
 </td>
 <td>Not assigned
 </td></tr>]

In [14]:
# Parse data from wiki page
# Skip the first row of headers
for row in neighborhoods_rows[1:]:
    postal_code, borough, neighborhood = list(map((lambda x: x.text.strip()), row.select("td")))
    if borough == "Not assigned":
        continue
    if neighborhood == "Not assigned":
        neighborhood = borough 
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood,
                                          'PostalCode': postal_code}, ignore_index=True)

In [15]:
neighborhoods.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [16]:
neighborhoods.shape

(103, 3)

___  

  
  Problem 2) Fill in Latitude and Longitude for Dataframe created above and generate map

Using CSV file to get longitude and latitude for Dataframe

In [35]:
!wget -q -O "toronto_locations.csv" https://cocl.us/Geospatial_data
print("Data downloaded")

Data downloaded


In [36]:
# Import location data into df
locations_df = pd.read_csv("toronto_locations.csv")
locations_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [45]:
# Merge neighborhoods and location data into one dataframe
neighborhoods = neighborhoods.merge(locations_df, left_on="PostalCode", right_on="Postal Code")

In [50]:
# Drop extra postal code column
neighborhoods.drop(["Postal Code"], axis=1, inplace=True)

In [51]:
neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [56]:
# Install and import folium for map creation
!pip install folium==0.5.0
import folium # map rendering library



Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 5.7 MB/s eta 0:00:011
[?25hCollecting branca
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=9c8e8764c5578e87351c13aaf60f4f280ec3e167379b2f30a34c08061d7b8d51
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.5.0


In [64]:
# Create map on toronto neighborhoods

toronto_latitude = 43.6529
toronto_longitude = -79.3849

map_toronto = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

map_toronto