# 1. Introduction

### 1.1 Background

For this project I am a real estate agent recruited by a family of four, about to relocate from Paris, France to Toronto, Canada. The client is looking to relocate in a neighborhood that is more secure than their current city. 

Ideally, the client wishes to live in a **family friendly average neighborhood**, where schools(elementary and high schools) are at walking distance from their homes. As they are big foodies and love discovering new types of food, they would like to have **great diversity in restaurant options** close to their home. Both parents will be working at the **Michael Garron hospital** and wished for their commute to be no longer than 20 minutes by car. 

### 1.2 Problematic

Moving from one country to another is not an easy task, particularly when one does not have a point of reference. This project will help to determine the best borough in Toronto for this family to relocate and ensure that their transition into this new life is as smooth as possible

In [1]:
import pandas as pd 
import numpy as np
import csv
import json
from pandas.io.json import json_normalize
from IPython.display import Image 
from IPython.core.display import HTML

print('All Imported')

All Imported


### Address of the Michael Garron Hospital

In [48]:
work_location = '825 Coxwell Ave, East York, ON M4C 3E7'

# Data Source

To assess the level of criminality in Toronto, we will scrap and analyze data from the [open data portal](https://open.toronto.ca/dataset/major-crime-indicators/) from the city of Toronto. 

To compare these data, criminality in Paris was extracted from [this article](https://www.leparisien.fr/paris-75/paris-la-delinquance-n-en-finit-pas-de-grimper-16-10-2019-8174451.php#:~:text=Cela%20signifie%20une%20moyenne%20de%2050%20cambriolages%20quotidiens%20dans%20la%20capitale.&text=%C2%AB%20Dans%20le%20XVIIIe%20arrondissement%2C%20il,Davantage%20de%20violences.), sourced from the Ministre de L’interieur

To help define the boroughs and neighborhood of Toronto, [the open data portal](https://open.toronto.ca/dataset/neighbourhoods/) for the city of Toronto was accessed again. 

# **Data Cleaning**

## City of Toronto Data Cleaning
#### [Toronto Open Data Portal for Major Crimes](https://open.toronto.ca/dataset/major-crime-indicators/)

In [2]:
!wget -q -O 'MCI_2014_to_2019.csv' https://opendata.arcgis.com/datasets/0c5fa2b642214e8baf0601405abccf30_0.csv?outSR=%7B%22latestWkid%22%3A3857%2C%22wkid%22%3A102100%7D
print('Data downloaded!')

Data downloaded!


In [3]:
crimes_df = pd.read_csv('MCI_2014_to_2019.csv')
crimes_df.head()

Unnamed: 0,X,Y,Index_,event_unique_id,occurrencedate,reporteddate,premisetype,ucr_code,ucr_ext,offence,...,occurrencedayofyear,occurrencedayofweek,occurrencehour,MCI,Division,Hood_ID,Neighbourhood,Long,Lat,ObjectId
0,-8816401.0,5434587.0,701,GO-20141756319,2014/03/24 00:00:00+00,2014/03/24 00:00:00+00,Commercial,1430,100,Assault,...,83.0,Monday,1,Assault,D42,132,Malvern (132),-79.199081,43.800281,1
1,-8837252.0,5413357.0,901,GO-20143006885,2014/09/27 00:00:00+00,2014/09/29 00:00:00+00,Other,2120,200,B&E,...,270.0,Saturday,16,Break and Enter,D52,76,Bay Street Corridor (76),-79.386383,43.662472,2
2,-8862433.0,5422276.0,702,GO-20141756802,2014/03/24 00:00:00+00,2014/03/24 00:00:00+00,Commercial,2120,200,B&E,...,83.0,Monday,6,Break and Enter,D23,1,West Humber-Clairville (1),-79.612595,43.720406,3
3,-8833104.0,5431887.0,703,GO-20141760570,2014/03/24 00:00:00+00,2014/03/24 00:00:00+00,Apartment,2120,200,B&E,...,83.0,Monday,15,Break and Enter,D33,47,Don Valley Village (47),-79.349121,43.782772,4
4,-8845311.0,5413667.0,902,GO-20142004859,2014/05/03 00:00:00+00,2014/05/03 00:00:00+00,Commercial,1610,210,Robbery - Business,...,123.0,Saturday,2,Robbery,D11,90,Junction Area (90),-79.458778,43.66449,5


In [4]:
crimes2_df = crimes_df[['MCI', 'occurrenceyear', 'Lat', 'Long', 'Neighbourhood']]
crimes2_df.head()

Unnamed: 0,MCI,occurrenceyear,Lat,Long,Neighbourhood
0,Assault,2014.0,43.800281,-79.199081,Malvern (132)
1,Break and Enter,2014.0,43.662472,-79.386383,Bay Street Corridor (76)
2,Break and Enter,2014.0,43.720406,-79.612595,West Humber-Clairville (1)
3,Break and Enter,2014.0,43.782772,-79.349121,Don Valley Village (47)
4,Robbery,2014.0,43.66449,-79.458778,Junction Area (90)


Now that dataframe only contains the desired information, I want to isolate only the year 2019. Before being able to do so, all non finite values have to first be dropped. To do so, Iam amylizing this dataframe to see if any null and non-finite occurence exist:

In [5]:
type_ = crimes2_df['occurrenceyear'].dtype.kind
null_ = crimes2_df['occurrenceyear'].isnull().sum()
inf_ = np.isinf(crimes2_df['occurrenceyear']).sum()
fin_ = np.isfinite(crimes2_df['occurrenceyear']).sum()
print('Occurrence year type is: ', type_)
print('There are' , null_, 'null occurences,', inf_, 'numbers and', fin_, ' finite numbers in the "yearoccurrence" column.')

Occurrence year type is:  f
There are 59 null occurences, 0 numbers and 206376  finite numbers in the "yearoccurrence" column.


In [6]:
crimes2_df.dropna(subset=['occurrenceyear'])
crimes2_df.shape

(206435, 5)

Now that the 59 Null occurences are dropped, we only keep year 2019. There are now 37,674 crimes that occured in Toronto for the year 2019. The dataset is ready to use.

In [41]:
crimes3_df = crimes2_df.loc[crimes2_df['occurrenceyear'] == 2019.0]
crimes3_df.reset_index(drop=True, inplace=True)
crimes3_df.head()


Unnamed: 0,MCI,occurrenceyear,Lat,Long,Neighbourhood
0,Assault,2019.0,43.810932,-79.227135,Malvern (132)
1,Assault,2019.0,43.663906,-79.384155,Church-Yonge Corridor (75)
2,Assault,2019.0,43.655777,-79.380676,Church-Yonge Corridor (75)
3,Assault,2019.0,43.723015,-79.415932,Bedford Park-Nortown (39)
4,Break and Enter,2019.0,43.648773,-79.528748,Islington-City Centre West (14)


In [44]:
crimes3_df.shape

(37674, 5)

## City of Paris Data set

Unfortunately, there are no open source for crime data for the Paris metro area. Data were extracted from [this article](https://www.leparisien.fr/paris-75/paris-la-delinquance-n-en-finit-pas-de-grimper-16-10-2019-8174451.php#:~:text=Cela%20signifie%20une%20moyenne%20de%2050%20cambriolages%20quotidiens%20dans%20la%20capitale.&text=%C2%AB%20Dans%20le%20XVIIIe%20arrondissement%2C%20il,Davantage%20de%20violences.), for which the data were provided by the French government. To standardize the categories, all assaults were grouped together. The same was for robberies and thefts. Data are from Jan - Sept inclusively. Since the data were in an image and not in a table, the values were added to a panda dictionnary that was further transfored in a dataframe.

In [45]:
paris_crimes = {'Assault': 26299, 'Break and Enter': 13743, 'Robbery':12757, 'Theft Over': 130898, 'Other':50514}
paris = pd.Series(paris_crimes)
paris_df = pd.DataFrame(paris)
paris_df.rename(columns = {0:'Count_Paris'}, inplace = True)
paris_df

Unnamed: 0,Count_Paris
Assault,26299
Break and Enter,13743
Robbery,12757
Theft Over,130898
Other,50514


# Techniques used

### Data Analysis

Data Analysis will help us narrow down the sagest neighborhood

1- we will create a new dataframe that will contain all crimes by types.
    a.	Pie charts will then be used to compare Paris and Toronto criminality
    
2- We will use the panda library again to isolate the 10 more dangerous and the 10 safest neighborhoods of Toronto. Graph bars will then be use to illustrate the difference.

3- Using folium and geopy, all crimes will be transposed onto a map of Toronto to provide an additional visual.

4- All neighbourhoods within 10-15 km distance from the hospital will be selected for further analysis


### Foursquare API

5- Foursquare API will be used to first, determine if each neighborhood contained both a elementary and high schools that are close to each other. 

6- Foursquare API will be used to retrieve all restaurants venues in the selected neighborhoods. 

7- For the selected neighborhoods, the one with the overall best school scores and with the greatest variety of restaurants will be selected. 