# Segmenting and Clustering Neighborhoods in Toronto: part 2

In this second part of the _**Segmenting and Clustering Neighborhoods in Toronto**_ project, we're going to add latitude and longitude columns to our dataframe.

Again, we begin with the libraries being imported.

In [1]:
import pandas as pd 
import geocoder

Let's open the dataframe we created in [part 1](https://github.com/anaflvss/Coursera-Capstone/blob/master/02_neighborhoods_toronto_part_1.ipynb). This [CSV file](https://github.com/anaflvss/Coursera-Capstone/blob/master/data/post_code_toronto_1.csv) was included in the repository.

In [2]:
df = pd.read_csv('../Coursera-Capstone/data/post_code_toronto_1.csv')
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Now, we're gonna get the coordinates. It wasn't possible to get them using the **geocoder** library. So, we will use the coordinates CSV file instead.

But first, we need to sort **'df'** so it matches the postal code order in the coordinates file.

In [3]:
import re

# We will use these two functions to help with the sorting.
def atoi(text):
    return int(text) if text.isdigit() else text
def natural_keys(text):
    return [ atoi(c) for c in re.split('(\d+)',text) ]

In [4]:
# Sorting df
postal_code = list(df['PostalCode'])
postal_code.sort(key=natural_keys)

postIndex = dict(zip(postal_code,range(len(postal_code))))

df['PostRank'] = df['PostalCode'].map(postIndex)
df.sort_values('PostRank', inplace=True)
df.drop('PostRank', inplace=True, axis=1)
df.reset_index(inplace=True, drop=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Now, we load the coordinates dataframe.

In [5]:
lat_lon = pd.read_csv('../Coursera-Capstone/data/Geospatial_Coordinates.csv')
lat_lon.drop('Postal Code', axis=1, inplace=True)
lat_lon.head()

Unnamed: 0,Latitude,Longitude
0,43.806686,-79.194353
1,43.784535,-79.160497
2,43.763573,-79.188711
3,43.770992,-79.216917
4,43.773136,-79.239476


And then, we concat both dataframes together.

In [6]:
df = pd.concat([df, lat_lon], axis=1)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Finally, let's update the CSV file.

In [7]:
#df.to_csv('post_code_toronto_2.csv', index=False)