# Segmentating and Clustering Neighborhoods in Toronto


This notebook includes three parts:

The first one is the resposible to retrieve the data from the link below. It retrieves the information from the table and it processes and filter it to only 
maintain the usefull information.

As you can see, if you enter the link written in the task, some subtasks defined in the tasks are already done. For example, in the table in the link, neighborhoods
are grouped by borough and separated by commas. There's no need to add any aditional logic to do it.

The second one creates a pandas dataframe using the list created by the first part. This df won't contain any row where the borough or neighborhood is empy and neighbohoods
are grouped by boroughs.

The last one is where we put the data into a Toronto map. This map its interactive and shows the differente neighborhoods from Toronto, Ontario.


In [1]:
#!conda install -c anaconda beautifulsoup4

In [1]:
'''
    Necessary imports
'''
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import pandas as pd

## Retrieving the wikipedia data

Retrieved through this link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M (the one written in the task)

In [2]:
'''
    Taking wikipedia data
'''

url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
response = requests.get(url)
print(response)

<Response [200]>


In [3]:
'''
    Auxiliar methods filter data retrieved
'''
def delete_empty_values(instance):
    while('' in instance):
        instance.remove('')
    
    return instance

In [4]:
'''
    Parsing data and filtering it
'''

soup=BeautifulSoup(response.text, "html.parser")
res=soup.findAll('tr')   #Find all the table fields

result=list()
for x in res:
    
    instance=list()
    content=x.get_text()
    
    for c in content.split('\n'):
        instance.append(c)
    
    result.append(instance)

In [5]:
'''
    Deleting useless data and give the correct format.
'''
new_results=list()
for instance in result:
    i=delete_empty_values(instance)
    if len(i)==3:
        new_results.append(i)

## Creating the pandas DF

In [6]:
#Now, new_results contains the information we need to create a DF. Lets create it.

#The first element contains the name of the columns. 
df=pd.DataFrame(new_results[1:], columns=new_results[0])


In [7]:
print(df.shape)
df

(103, 3)


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [93]:
#!conda install -c conda-forge geocoder

In [None]:
'''
     API unreilable. So, we dont use the package. 
'''

import geocoder # import geocoder

# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Toronto, Ontario'.format('M8Z'))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

In [8]:
'''
    Reading de csv and creating a df
'''
ll_df=pd.read_csv('http://cocl.us/Geospatial_data')

In [9]:
'''
    Merge information
'''
result=pd.merge(df, ll_df, on='Postal Code')


In [10]:
print(result.shape)
result

(103, 5)


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


## Cluster and add neighborhoods into the map

In [None]:
 #!conda install -c conda-forge folium 

In [16]:
import folium

map_toronto = folium.Map(location=[43.651070, -79.347015], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(result['Latitude'], result['Longitude'], result['Borough'], result['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto