## Capstone Course

In This Course, called battle of the neighborhoodds, we will try to find a new neighborhood in Toronto to move to.
Our Criteria will be our old neighborhood, we have grown to like quite a bit.
but since we got a new job at the other side of town we will have to move.

In [1]:
import pandas as pd
import numpy as np
import bs4 as bs
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import pickle
import requests
import pandas_datareader as pdr
from pandas_datareader import data, wb

### 1. Getting the data and converting it to a pandas Data Frame
Using beautiful soup and requests

In [2]:
wiki = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

response = requests.get(wiki)
soup = bs.BeautifulSoup(response.text)

table = soup.find('table', {'class': 'wikitable sortable'})

In [3]:
# Empty list or table data
postal_codes = []
# Iterate table rows
for row in table.find_all("tr")[1:]:
    #Save each row to empty list
    entry = []
    entry.append(row.find_all('td')[0].text)
    entry.append(row.find_all("td")[1].text)
    entry.append(row.find_all("td")[2].text)
    # Create list of lists(row data)
    postal_codes.append(entry)
    
#Print first 5 entries
print(postal_codes[:][0:5])

[['M1A', 'Not assigned', 'Not assigned\n'], ['M2A', 'Not assigned', 'Not assigned\n'], ['M3A', 'North York', 'Parkwoods\n'], ['M4A', 'North York', 'Victoria Village\n'], ['M5A', 'Downtown Toronto', 'Harbourfront\n']]


In [4]:
#Convert list to Pandas Data Frame
pcodes = pd.DataFrame(postal_codes, columns=["Postcode", "Borough", "Neighbourhood"])
pcodes.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned\n
1,M2A,Not assigned,Not assigned\n
2,M3A,North York,Parkwoods\n
3,M4A,North York,Victoria Village\n
4,M5A,Downtown Toronto,Harbourfront\n


##### Clean Data, drop not assigned and merge rows

In [5]:
# Drop Not assigned rows and write Borough into not assigned neighbourhoods
pcodes["Borough"].replace({"Not assigned":np.nan}, inplace=True)
pcodes["Neighbourhood"].replace({"Not assigned\n":pcodes["Borough"]}, inplace=True)
pcodes.dropna(axis = 0, inplace=True)
pcodes.reset_index(drop = True,inplace=True)
pcodes.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods\n
1,M4A,North York,Victoria Village\n
2,M5A,Downtown Toronto,Harbourfront\n
3,M6A,North York,Lawrence Heights\n
4,M6A,North York,Lawrence Manor\n


In [6]:
# cut the \n
pcodes["Neighbourhood"] = pcodes["Neighbourhood"].apply(lambda x: x.strip())
pcodes.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


In [7]:
# Join rows with same Postcode
pcodes= pcodes.groupby(["Postcode", "Borough"])["Neighbourhood"].apply(lambda x: ",".join(x.astype(str))).reset_index()
pcodes.tail(30)

Unnamed: 0,Postcode,Borough,Neighbourhood
73,M6C,York,Humewood-Cedarvale
74,M6E,York,Caledonia-Fairbanks
75,M6G,Downtown Toronto,Christie
76,M6H,West Toronto,"Dovercourt Village,Dufferin"
77,M6J,West Toronto,"Little Portugal,Trinity"
78,M6K,West Toronto,"Brockton,Exhibition Place,Parkdale Village"
79,M6L,North York,"Downsview,North Park,Upwood Park"
80,M6M,York,"Del Ray,Keelesdale,Mount Dennis,Silverthorn"
81,M6N,York,"The Junction North,Runnymede"
82,M6P,West Toronto,"High Park,The Junction South"


In [8]:
# Save file so you don't have to request it too often from Wikipedia
with open("C:/Users/cube/Documents/Studium/Python/IBM_Data_Science_Course/Datasets/Canada_Postel_Codes.pickle", "wb") as f:
    pickle.dump(pcodes, f)

### 2. Add Coordinates from csv file

In [9]:
geocodes = pd.read_csv("http://cocl.us/Geospatial_data")
geocodes.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [10]:
geocodes.rename(columns={"Postal Code":"Postcode"}, inplace = True)

In [11]:
geocodes["Postcode"] = geocodes["Postcode"].astype(str)
pcodes["Postcode"] = pcodes["Postcode"].astype(str)

In [12]:
pcodes = pd.merge(pcodes, geocodes, on=["Postcode"], how = "outer")
pcodes.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### 3. Exploration with Folium
First get Coordinates of Toronto with geopy.
Then display map with folium and show localities

In [14]:
# get coordinates of toronto:
from geopy.geocoders import Nominatim
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

In [21]:
# create map of New York using latitude and longitude values
import folium
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# Only neighbourhoods that contain Toronto
pcodes_near = pcodes[pcodes["Borough"].str.contains("Toronto")]

# add markers to map
for lat, lng, borough, neighborhood in zip(pcodes_near['Latitude'], pcodes_near['Longitude'], pcodes_near['Borough'], pcodes_near['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map