# Part 2. Joining Toronto Postal Codes and Corresponding Geographic Coordinates

Now that we have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

Use the Geocoder package or the csv file to create the following dataframe:


![alt text](https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/HZ3jNHNOEeiMwApe4i-fLg_f44f0f10ccfaf42fcbdba9813364e173_Screen-Shot-2018-06-18-at-7.18.16-PM.png?expiry=1573171200000&hmac=2iIUYIRmN0YOyohMC7895YGI3k_Lgz4WWAnBiwsRdIA)

In [2]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analysis

import requests # library to handle requests
import io 

## Importing Toronto Postal Code Data Frame 

In [3]:
p_code = "postal_code.csv"
TorontoPostalCodes = pd.read_csv(p_code).set_index("Postcode")
TorontoPostalCodes.rename_axis("Postal Code", axis = 'index', inplace = True)
TorontoPostalCodes.head()

Unnamed: 0_level_0,Borough,Neighborhood
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,Scarborough,"Rouge, Malvern"
M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
M1E,Scarborough,"Guildwood, Morningside, West Hill"
M1G,Scarborough,Woburn
M1H,Scarborough,Cedarbrae


## Getting the Geographic Coordinates of Toronto Neighbourhood for Utililizing Foursquare Location Data

In [4]:
url = "http://cocl.us/Geospatial_data"
s = requests.get(url).content
TorontoCoordinates = pd.read_csv(io.StringIO(s.decode('utf-8')))
TorontoCoordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


## Joining Toronto Postal Code and Toronto Geographic Coordinates

In [5]:
Toronto_Neighbourhoods = pd.merge(TorontoPostalCodes, TorontoCoordinates, on='Postal Code')
Toronto_Neighbourhoods.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [6]:
Toronto_Neighbourhoods.shape

(103, 5)

## Exporting Newly Created Dataframe to CSV File

In [7]:
Toronto_Neighbourhoods.to_csv('Toronto_Neighbourhoods.csv', index= False)

# Part 3. Importing and Cleaning Toronto Crime Data

In [14]:
crime = pd.read_csv('MCI_2014_to_2018.csv')
print('Dimension of the dataframe ',crime.shape)
crime.head()

Dimension of the dataframe  (8178, 29)


Unnamed: 0,X,Y,Index_,event_unique_id,occurrencedate,reporteddate,premisetype,ucr_code,ucr_ext,offence,...,occurrencedayofyear,occurrencedayofweek,occurrencehour,MCI,Division,Hood_ID,Neighbourhood,Lat,Long,ObjectId
0,-79.385193,43.659229,2349,GO-20149004286,2014-06-20T10:55:00.000Z,2014-06-20T13:20:00.000Z,Apartment,2130,210,Theft Over,...,171.0,Friday,10,Theft Over,D52,76.0,Bay Street Corridor (76),43.659229,-79.385193,2001.0
1,-79.4254,43.777592,2350,GO-20142411379,2014-07-02T00:20:00.000Z,2014-07-02T02:58:00.000Z,Outside,1457,100,Pointing A Firearm,...,183.0,Wednesday,0,Assault,D32,36.0,Newtonbrook West (36),43.777592,-79.4254,2002.0
2,-79.4254,43.777592,2351,GO-20142411379,2014-07-02T00:20:00.000Z,2014-07-02T02:58:00.000Z,Outside,1610,100,Robbery With Weapon,...,183.0,Wednesday,0,Robbery,D32,36.0,Newtonbrook West (36),43.777592,-79.4254,2003.0
3,-79.210373,43.801727,2352,GO-20142412127,2014-07-02T01:30:00.000Z,2014-07-02T05:40:00.000Z,House,2120,200,B&E,...,183.0,Wednesday,1,Break and Enter,D42,132.0,Malvern (132),43.801727,-79.210373,2004.0
4,-79.254334,43.835884,2354,GO-20142417548,2014-07-02T20:52:00.000Z,2014-07-02T20:57:00.000Z,Commercial,1430,100,Assault,...,183.0,Wednesday,20,Assault,D42,130.0,Milliken (130),43.835884,-79.254334,2005.0


In [9]:
crime.isnull().sum()

X                      0
Y                      0
Index_                 0
event_unique_id        0
occurrencedate         0
reporteddate           0
premisetype            0
ucr_code               0
ucr_ext                0
offence                0
reportedyear           0
reportedmonth          0
reportedday            0
reporteddayofyear      0
reporteddayofweek      0
reportedhour           0
occurrenceyear         4
occurrencemonth        4
occurrenceday          4
occurrencedayofyear    4
occurrencedayofweek    4
occurrencehour         0
MCI                    0
Division               1
Hood_ID                1
Neighbourhood          1
Lat                    1
Long                   1
ObjectId               1
dtype: int64

In [10]:
#Keep the DataFrame with valid entries in the same variable
crime.dropna(inplace=True)
crime.shape

(8173, 29)

In [11]:
crime.isnull().sum()

X                      0
Y                      0
Index_                 0
event_unique_id        0
occurrencedate         0
reporteddate           0
premisetype            0
ucr_code               0
ucr_ext                0
offence                0
reportedyear           0
reportedmonth          0
reportedday            0
reporteddayofyear      0
reporteddayofweek      0
reportedhour           0
occurrenceyear         0
occurrencemonth        0
occurrenceday          0
occurrencedayofyear    0
occurrencedayofweek    0
occurrencehour         0
MCI                    0
Division               0
Hood_ID                0
Neighbourhood          0
Lat                    0
Long                   0
ObjectId               0
dtype: int64

In [12]:
crime.to_csv('crimeclean.csv')