# Scraping Toronto Postal Codes from Wikipedia
-- Part 2 - Adding Long/Lat to the Postal Codes

Get Started on Step __ for <b>PART 2 -- Get Long/Lat</b><br /> 
This is part 2 of the assignment. Take Part 1 postal Code list and use it to pull Lat/Long for each postal code area. 

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [2]:
# download wikipedia 
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
webcopy = requests.get(url).text
postaldata = BeautifulSoup(webcopy, 'lxml')

In [3]:
# create the new Toronto Dataframe
columnlabel = ['Postalcode','Borough','Neighborhood']
tor_codes = pd.DataFrame(columns = columnlabel)

In [4]:
tor_codes.head()

Unnamed: 0,Postalcode,Borough,Neighborhood


In [5]:
# read through table to get postcode, borough, neighborhood 
wikicontent = postaldata.find('div', class_='mw-parser-output')
table = wikicontent.table.tbody
postcode = 0
borough = 0
neighborhood = 0

In [6]:
tor_codes.head()

Unnamed: 0,Postalcode,Borough,Neighborhood


In [7]:
for tr in table.find_all('tr'):
    i = 0
    for td in tr.find_all('td'):
        if i == 0:
            postcode = td.text.strip('\n').replace(']','')  # postcode = td.text
            i = i + 1
            # print(postcode,"tim1",i)  #added
        elif i == 1:
            borough = td.text.strip('\n').replace(']','')   # borough1 = td.text
            i = i + 1
            # print(borough,i)  #added
        elif i == 2: 
            neighborhood = td.text.strip('\n').replace(']','')
            # print(neighborhood,"tim3",i)  #added
    tor_codes = tor_codes.append({'Postalcode': postcode,'Borough': borough,'Neighborhood': neighborhood},ignore_index=True)

In [8]:
tor_codes.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,0,0,0
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village


In [9]:
# clean dataframe 
tor_codes = tor_codes[tor_codes.Borough!='Not assigned']
tor_codes = tor_codes[tor_codes.Borough!= 0]
tor_codes.reset_index(drop = True, inplace = True)
i = 0
for i in range(0,tor_codes.shape[0]):
    if tor_codes.iloc[i][2] == 'Not assigned':
        tor_codes.iloc[i][2] = tor_codes.iloc[i][1]
        i = i+1

In [10]:
tor_codes.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [11]:
tor_codes_grp = tor_codes.groupby(['Postalcode','Borough'])['Neighborhood'].apply(', '.join).reset_index()
# drop burroughs with none assigned
tor_codes_grp = tor_codes_grp.dropna()
empty = 'Not assigned'
tor_codes_grp = tor_codes_grp[(tor_codes_grp.Postalcode != empty ) & (tor_codes_grp.Borough != empty) & (tor_codes_grp.Neighborhood != empty)]

In [12]:
# group neighborhoods with like burroughs
def neighborhood_list(grouped):    
    return ', '.join(sorted(grouped['Neighborhood'].tolist()))

In [13]:
tor_codes_grp.head(20)

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [14]:
print(tor_codes.shape)

(103, 3)


### PART 2 -- Get Long/Lat

In [15]:
# Read the cvs file and convert it to a dataframe

url='http://cocl.us/Geospatial_data'
postcode_df=pd.read_csv(url)
postcode_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Match up column heading: Change "Postal Code" to "Postalcode" for proper merger

In [16]:
postcode_df.columns = ['Postalcode', 'Latitude', 'Longitude']
postcode_df.head()

Unnamed: 0,Postalcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [17]:
merged_df=pd.merge(tor_codes,postcode_df, how='right', on = 'Postalcode')
merged_df.head(15)

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


# Part 3 - Explore and cluster the neighborhoods in Toronto. 

Decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you.
Just make sure:
- to add enough Markdown cells to explain what you decided to do and to report any observations you make.
- to generate maps to visualize your neighborhoods and how they cluster together.

>> Go to Notebook:  Pt 3 - Cluster pull FourSquare Data