# Toronto Map Clustering and Segmentation 

This notebook is concerned with web scraping data from the following webpage: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M


## 1. Beautiful Soup and Pandas To Scrape Data

In [76]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np




Using the **soup.find_all()** method and the **pd.read_html** we were able to read the data to a df table in just five lines of code

In [77]:
url = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(url.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))[0]
df.describe()

Unnamed: 0,Postcode,Borough,Neighbourhood
count,288,288,288
unique,180,12,209
top,M8Y,Not assigned,Not assigned
freq,8,77,78


### Cleaned Data

Removed all of the **"Not assigned"** as well as combined the repeated values of the Postcode and joined them

In [98]:
df=df[df["Borough"] != "Not assigned"]
df.reset_index(drop=True,inplace=True)
df["Neighbourhood"].replace(to_replace = "Not assigned",value = df["Borough"],inplace = True)
df = df.groupby(["Postcode","Borough"])["Neighbourhood"].apply(lambda x: ','.join(x)).reset_index()
df.head()


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [79]:
df.shape

(103, 3)

## 2. Getting Lat/Long Information

Downloaded the geospatial data as the **import geocode** was not working

In [99]:
#Downloading Data from Link

!wget -q -O 'latlong.csv' https://cocl.us/Geospatial_data
latlongdf = pd.read_csv('latlong.csv')                      #Storing in dataframe

print("Data Downloaded")

Data Downloaded


In [100]:
latlongdf.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Changed the index to **"Postcode"** in order to merge the two dataframes together by the Postcode values 

In [101]:
df.set_index("Postcode")
latlongdf.rename(columns = {"Postal Code":"Postcode"},inplace =True)
latlongdf.set_index("Postcode")
df_data = pd.merge(df, latlongdf, on='Postcode')
df_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## 3. Visualizing Data Using Folium

### A. Importing Libraries


In [None]:
!conda install -c conda-forge folium=0.5.0 --yes


Solving environment: / 

In [91]:
!conda install -c conda-forge geopy --yes

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          90 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.20.0-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.20.0         | 57 KB     | ##################################### | 100% 
geographiclib-1.49   | 32 KB     | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


In [None]:
import folium
from geopy.geocoders import Nominatim 
import matplotlib.cm as cm
import matplotlib.colors as colors



Used the **geopy** module to retrieve the LAT/LONG of Toronto, Ontario, Canada

In [95]:
address = "Toronto, Ontario, Canada"
geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Created the Map while also adding the Borough and Neighborhood information onto the map

In [97]:
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_data['Latitude'], df_data['Longitude'], df_data['Borough'], df_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map