# Get the neighborhoods in Toronto
We are going to obtain data for different neighborhoods based on the boroughs in Toronto. To achieve this we obtained the data from wikipedia page for the city of Toronto, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M.

The whole process can be described in following steps.
1. Extract the information from wikipage about the neighborhoods and boroughs.
2. Clean up the dataframe so that for each postal code we have data for the borough and the neighborhood/neighborhoods.
3. Add lattitude-longitude for each neighborhood.
4. Visualize neighborhoods in a map of Toronto.

Please use this link to open the Jupyter notebook to see the maps: 

https://nbviewer.jupyter.org/

### Step 1. Extract the information from wikipage about the neighborhoods and boroughs
To perform this task we are going to use the BeautifulSoup library and here is a tutorial link for it. https://www.youtube.com/watch?v=ng2o98k983k

#### Checking if neccessary libraries are installed, otherwise install them. The finally importing them to the notebook

In [1]:
! pip install beautifulsoup4



In [2]:
! pip install lxml



In [3]:
! pip install requests



In [1]:
from bs4 import BeautifulSoup
import requests
import numpy as np
import pandas as pd
import folium

#### Requesting the webpage from which we need to extract the data.

In [2]:
source=requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

Parsing the webpage data using lxml parser in the BeautifulSoup library.

In [3]:
soup=BeautifulSoup(source,'lxml')

We can now see the contents of the webpage in raw html format. Here I only show the beginning to keep the notebook short. 

In [4]:
print(soup.prettify()[0:1000])

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className = document.documentElement.className.replace( /(^|\s)client-nojs(\s|$)/, "$1client-js$2" );
  </script>
  <script>
   (window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":890001695,"wgRevisionId":890001695,"wgArticleId":539066,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Communications in Ontario","Postal codes in Canada","Toronto","Ontario-related lists"],"wgBreakFrames":false,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wg

#### Let's extract the table that contains the boroughs and neighborhood information.
The table in the html page can be access by following the command given below. We save the contents of the table in a list that will be written into a data frame

In [5]:
Table_text=[]
for match in soup.body.table.find_all('td'):
    Table_text.append(match.text)

In [6]:
Table_text[:10]

['M1A',
 'Not assigned',
 'Not assigned\n',
 'M2A',
 'Not assigned',
 'Not assigned\n',
 'M3A',
 'North York',
 'Parkwoods\n',
 'M4A']

#### Let's create a dataframe and save the text from above into it

In [7]:
Neighborhoods=pd.DataFrame({'PostalCode':Table_text[0::3],'Borough':Table_text[1::3],'Neighborhood':Table_text[2::3]})

In [8]:
Neighborhoods.head(20)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned\n
1,M2A,Not assigned,Not assigned\n
2,M3A,North York,Parkwoods\n
3,M4A,North York,Victoria Village\n
4,M5A,Downtown Toronto,Harbourfront\n
5,M5A,Downtown Toronto,Regent Park\n
6,M6A,North York,Lawrence Heights\n
7,M6A,North York,Lawrence Manor\n
8,M7A,Queen's Park,Not assigned\n
9,M8A,Not assigned,Not assigned\n


### Step 2. Clean up the DataFrame. 
1. In the neighborhood data-entry we need to remove the \n from the end
2. Remove any row where both 'Borough' and 'Neighborhood' have the entry 'Not assigned'
3. If only 'Neighborhood' have the entry 'Not assigned' then fill the entry from column 'Borough' for that row in there.
4. Since we want the table in terms of unique postal codes so we will group the neighborhoods with same postalcode.

#### 1. Remove '\n' from the end in the column 'Borough'

In [9]:
Neighborhoods.Neighborhood=Neighborhoods.Neighborhood.map(lambda x: x[:-1])

#### 2. Drop rows that have both 'Borough' and 'Neighborhood' have the entry 'Not assigned'.

In [10]:
Neighborhoods=Neighborhoods[~((Neighborhoods.Borough=='Not assigned') & (Neighborhoods.Neighborhood=='Not assigned'))]

#### 3. Finally, we will set any entry in Neighborhood that is 'Not assigned' to entry from 'Borough' in that row.

In [11]:
def replace_missing(row):
    if row.Neighborhood=='Not assigned':
        row.Neighborhood=row.Borough
    return row

In [12]:
Neighborhoods=Neighborhoods.apply(replace_missing,axis=1)

#### 4. Grouping the neighborhoods with same postal code.

In [13]:
Neighborhoods=Neighborhoods.groupby('PostalCode').agg(list).reset_index()

In [14]:
Neighborhoods.Borough=Neighborhoods.Borough.map(lambda x: np.unique(x)[0])
Neighborhoods.Neighborhood=Neighborhoods.Neighborhood.map(lambda x: ', '.join(np.unique(x)))

We also reset the index to clean up the dataframe.

In [15]:
Neighborhoods.reset_index(drop=True,inplace=True)

## Here is the final dataFrame all cleaned up.

In [16]:
Neighborhoods.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [17]:
print('Number of rows=',Neighborhoods.shape[0])

Number of rows= 103


### Step 3. Add lattitude andlongitude to each neighborhood row.

Using the csv file provided for the assignment.

In [18]:
Postal_Geo_coordinates=pd.read_csv('http://cocl.us/Geospatial_data')

In [19]:
Postal_Geo_coordinates.head(10)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


Rename PostalCode column to perform joining of 'Neighborhoods' dataframe to 'Postal_Geo_coordinates'

In [20]:
Postal_Geo_coordinates.rename(columns={'Postal Code':'PostalCode'},inplace=True)

In [21]:
Postal_Geo_coordinates.shape

(103, 3)

In [22]:
Postal_Geo_coordinates=Postal_Geo_coordinates.set_index('PostalCode')

In [23]:
Neighborhoods=Neighborhoods.set_index('PostalCode')

In [24]:
Neighborhoods_joined=pd.concat([Neighborhoods,Postal_Geo_coordinates],axis=1).reset_index()

In [28]:
Neighborhoods_joined.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Port Union, Rouge Hill",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


In [31]:
print('Number of rows',Neighborhoods_joined.shape[0])

Number of rows 103


In [27]:
toronto_lat=43.6532
toronto_long=-79.3832
# create map of toronto using latitude and longitude values
map_toronto = folium.Map(location=[toronto_lat, toronto_long], zoom_start=10)

# # add markers to map
for lat, lng, borough, neighborhood in zip(Neighborhoods_joined['Latitude'], Neighborhoods_joined['Longitude'],Neighborhoods_joined['Borough'], Neighborhoods_joined['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
      [lat, lng],
      radius=5,
      popup=label,
      color='blue',
      fill=True,
      fill_color='#3186cc',
      fill_opacity=0.7,
      parse_html=False).add_to(map_toronto)  
    
map_toronto

## Saving the data to csv file

In [30]:
Neighborhoods_joined.to_csv('Neighborhoods_Toronto.csv',index=False)