# IBM Data Science Professional Course on Coursera
## Capstone Project Course Assignment: Segmenting and Clustering Neighborhoods in Toronto.
### Week 3 Part 1: To build a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name.

#### First, we will import libraries that will be needed.

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import numpy as np

import json
from bs4 import BeautifulSoup

from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium

#### Now, Let's scrap the data from the wikipedia page into a dataframe. We will use BeautifulSoup package.

In [2]:
toronto_data = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text

In [3]:
Soup = BeautifulSoup(toronto_data, 'html.parser')

In [4]:
postalcodeList = []
boroughList = []
neighborhoodList = []

In [5]:
for row in Soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if(len(cells) > 0):
        postalcodeList.append(cells[0].text)
        boroughList.append(cells[1].text)
        neighborhoodList.append(cells[2].text.rstrip('\n'))

In [6]:
toronto_df = pd.DataFrame({"PostalCode": postalcodeList,
                           "Borough": boroughList,
                           "Neighborhood": neighborhoodList})
toronto_df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A\n,Not assigned\n,Not assigned
1,M2A\n,Not assigned\n,Not assigned
2,M3A\n,North York\n,Parkwoods
3,M4A\n,North York\n,Victoria Village
4,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront"
5,M6A\n,North York\n,"Lawrence Manor, Lawrence Heights"
6,M7A\n,Downtown Toronto\n,"Queen's Park, Ontario Provincial Government"
7,M8A\n,Not assigned\n,Not assigned
8,M9A\n,Etobicoke\n,"Islington Avenue, Humber Valley Village"
9,M1B\n,Scarborough\n,"Malvern, Rouge"


#### Let's drop the cells with Bourough that is ''Not assigned".

In [7]:
toronto_df_dropNA = toronto_df[toronto_df.Borough != "Not assigned\n"].reset_index(drop = True)
toronto_df_dropNA

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A\n,North York\n,Parkwoods
1,M4A\n,North York\n,Victoria Village
2,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront"
3,M6A\n,North York\n,"Lawrence Manor, Lawrence Heights"
4,M7A\n,Downtown Toronto\n,"Queen's Park, Ontario Provincial Government"
5,M9A\n,Etobicoke\n,"Islington Avenue, Humber Valley Village"
6,M1B\n,Scarborough\n,"Malvern, Rouge"
7,M3B\n,North York\n,Don Mills
8,M4B\n,East York\n,"Parkview Hill, Woodbine Gardens"
9,M5B\n,Downtown Toronto\n,"Garden District, Ryerson"


#### Now, we will combine Neighborhoods in the same borough.

In [8]:
toronto_df_grouped = toronto_df_dropNA.groupby(["PostalCode", "Borough"], as_index = False).agg(lambda x: ", ".join(x))
toronto_df_grouped

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B\n,Scarborough\n,"Malvern, Rouge"
1,M1C\n,Scarborough\n,"Rouge Hill, Port Union, Highland Creek"
2,M1E\n,Scarborough\n,"Guildwood, Morningside, West Hill"
3,M1G\n,Scarborough\n,Woburn
4,M1H\n,Scarborough\n,Cedarbrae
5,M1J\n,Scarborough\n,Scarborough Village
6,M1K\n,Scarborough\n,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L\n,Scarborough\n,"Golden Mile, Clairlea, Oakridge"
8,M1M\n,Scarborough\n,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N\n,Scarborough\n,"Birch Cliff, Cliffside West"


#### Let's make the value of Neighborhood same as borough where Neighborhood = "Not assigned"

In [9]:
for index, row in toronto_df_grouped.iterrows():
    if row["Neighborhood"] == "Not assigned":
        row["Neighborhood"] = row["Borough"]
toronto_df_grouped

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B\n,Scarborough\n,"Malvern, Rouge"
1,M1C\n,Scarborough\n,"Rouge Hill, Port Union, Highland Creek"
2,M1E\n,Scarborough\n,"Guildwood, Morningside, West Hill"
3,M1G\n,Scarborough\n,Woburn
4,M1H\n,Scarborough\n,Cedarbrae
5,M1J\n,Scarborough\n,Scarborough Village
6,M1K\n,Scarborough\n,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L\n,Scarborough\n,"Golden Mile, Clairlea, Oakridge"
8,M1M\n,Scarborough\n,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N\n,Scarborough\n,"Birch Cliff, Cliffside West"


##### We will restrict the number of rows to 5.

In [14]:
toronto_df_grouped.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B\n,Scarborough\n,"Malvern, Rouge"
1,M1C\n,Scarborough\n,"Rouge Hill, Port Union, Highland Creek"
2,M1E\n,Scarborough\n,"Guildwood, Morningside, West Hill"
3,M1G\n,Scarborough\n,Woburn
4,M1H\n,Scarborough\n,Cedarbrae


##### We will restrict the number of rows to 11.

In [13]:
toronto_df_grouped.head(11)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B\n,Scarborough\n,"Malvern, Rouge"
1,M1C\n,Scarborough\n,"Rouge Hill, Port Union, Highland Creek"
2,M1E\n,Scarborough\n,"Guildwood, Morningside, West Hill"
3,M1G\n,Scarborough\n,Woburn
4,M1H\n,Scarborough\n,Cedarbrae
5,M1J\n,Scarborough\n,Scarborough Village
6,M1K\n,Scarborough\n,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L\n,Scarborough\n,"Golden Mile, Clairlea, Oakridge"
8,M1M\n,Scarborough\n,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N\n,Scarborough\n,"Birch Cliff, Cliffside West"


#### Now, at the end of the Part 1, we will print the total number of rows in the dataframe.

In [12]:
toronto_df_grouped.shape

(103, 3)