<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto City (part 2 )</font></h1>

## Introduction
In this project, we will be required to explore, segment, and cluster the neighborhoods in the city of Toronto. However, unlike New York, the neighborhood data is not readily available on the internet. 
For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. We will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset.

Once the data is in a structured format, we will replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.
This is Part 2 of the project where we merge location information with the scraped data

## Table of Contents



<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>
    

 1. <a href="#item1">Webscrape the Data from Wikipedia page (Part 1)</a>
    
 2. <a href="#item2">Preprocess and Explore the Dataset (Part 1)</a>

 3. <a href="#item3">Gathering Locations Data  (Part 2)</a>

 
</font>
</div>

# Part 2 

## 3. Gathering Locations Data

Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.
There are several alternatives

### Google Maps Geocoding API (not free)
In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/.

### Geocoder Python package 
 
 we can use the link : https://geocoder.readthedocs.io/index.html.
The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking postal code M5G as an example, your code would look something like this:

### csv file download

Given that Geocodor package can be very unreliable, in case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data

We will Use the Geocoder package or the csv file to create the following dataframe:

In [14]:
!pip install geocoder
import geocoder # import geocoder
import pandas as pd
import numpy as np



In [15]:
#define the postal code
postalcode=TorontoDF["PostalCode"]
#postalcode
postalcode[1]

'M1C'

In [16]:
# initialize your variable to None
#lat_lng_coords = None

# loop until you get the coordinates
#while(lat_lng_coords is None):
  #g = geocoder.google('{}, Toronto, Ontario'.format(postalcode))
  #lat_lng_coords = g.latlng

#latitude = lat_lng_coords[0]
#longitude = lat_lng_coords[1]

In [17]:
#download the data instead 
!wget -q -O 'toronto_data.csv'  http://cocl.us/Geospatial_data
print('Data downloaded!')

torontoLocationData = pd.read_csv("toronto_data.csv") 

Data downloaded!


In [18]:
torontoLocationData.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [19]:
torontoLocationData.shape

(103, 3)

### Load the dataframe we preapred in the firts part of the exercise after scraping

In [20]:
TorontoDF = pd.read_csv("TorontoDFpart1.csv") 

In [21]:
TorontoDF.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern,Rouge"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Morningside,Guildwood,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### Add the locations to the Data Frame

In [22]:
#add the locations to TorontoDF
TorontoDF[['Latitude','Longitude']]=torontoLocationData[['Latitude','Longitude']]
TorontoDF.head(12)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern,Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Morningside,Guildwood,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Ionview,Kennedy Park,East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile,Clairlea,Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside,Scarborough Village West,Cliffcrest",43.716316,-79.239476
9,M1N,Scarborough,"Cliffside West,Birch Cliff",43.692657,-79.264848


### Save the new dataframe into csv file to use in the part 3

In [23]:
TorontoDF.to_csv('TorontoDFpart2.csv', encoding='utf-8', index=False)

In [24]:
TorontoDF.shape

(103, 5)

## This is the end of Part 2