# Segmenting and Clustering Neighborhoods in Toronto

First, we import the file prepared in Section 1 and pass it into a data frame.

In [11]:
import pandas as pd

In [14]:
# tn - dataframe for Toronto's neighborhood data

tnp=pd.read_csv('torontop.csv')
tnp.drop('Unnamed: 0',axis=1,inplace=True)
tnp.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


____________
## 2. Neighborhoods in Toronto: Coordinates

In this section, we attribute geographical coordinates for each Postal Code in Toronto.

We start by trying using geocoder:

In [15]:
!pip install geocoder

Defaulting to user installation because normal site-packages is not writeable


In [16]:
import geocoder

In [17]:

coords = None
postal_code='M5G'

#while(coord is None):
g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
coord=g.latlng

print(coord)

None


Geocoder does not work, therefore we use the results given by the dataset 'Geospatial_Coordinates.csv'.

In [6]:
df_coord=pd.read_csv('Geospatial_Coordinates.csv')
df_coord.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


We make an inner join in python through the column keys 'Postal Code', in *tn* dataset, and 'Postal Code' in *df_coord* dataset.

In [18]:
# tnc - dataframe for Toronto's neighborhoods with the geographic coordinates

tnc = pd.merge(left=tnp, right=df_coord, left_on='Postal Code', right_on='Postal Code')
tnc.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


Making a query:

In [19]:
tnc[tnc['Postal Code']=='M5G']

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383


In [20]:
tnc.shape

(103, 5)

The dataframe *tnc* has the same dimensions as *tnp*, meaning that it has been attributed a pair of coordinates to each Postal Code. 

At last, we copy the dataframe into a .csv file, to be used in Section 3.

In [21]:
tnc.to_csv('torontoc.csv')