### Data and Collection of Data

For this particular project, we have collected some information on the crime rate from 2014 to 2019 in Toronto

### Based on our problem, some factors should be taken into account in order to take the good decision are listed below:

1. Figure out the safest borough based on Toronto Crime Rate.
2. Discover the optimal common venues and select the appropriate neighbourhood within the borough.

## We will treat the geographical data about Toronto in order to plot the corresponding neighbourhoods which are considered as safe and secure for the opening of a new restaurant.


The following dataset which is Toronto Crime rate (https://www.kaggle.com/alincijov/toronto-crime-rate-per-neighbourhood) is analysing.

Toronto Neighbourhoods Boundary File includes 2014-2019 Crime Data by Neighbourhood. Counts are available for Assault, Auto Theft, Break and Enter, Robbery, Theft Over and Homicide. Data also includes four year averages and crime rates per 100,000 people by neighbourhood based on 2016 Census Population.

In this project, we extract data from the following data sources:
1. Toronto Crime rate (https://www.kaggle.com/alincijov/toronto-crime-rate-per-neighbourhood)
2. Restaurants data in every neighborhood will be obtained using Foursquare API
3. Google Geocoding API will be used to center the hexagon neighborhood.


In [1]:
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # conversion an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
import pandas as pd
import numpy as np


import requests # library to handle requests
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


### Reading from the Dataset
We proceed to the Toronto Dataset with the recent crime report from 2008 to 2019.


In [2]:
!pip install lxml
import lxml
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(url, header=0)
df = df[0]
df.head()





Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [11]:
toronto_Crime_df = pd.read_csv('https://raw.githubusercontent.com/Ellis0312/The-Battle-of-Neighborhoods/main/Neighbourhood_Crime_Rates.csv',index_col='OBJECTID')
toronto_Crime_df.head()

Unnamed: 0_level_0,Neighbourhood,Hood_ID,Population,Assault_2014,Assault_2015,Assault_2016,Assault_2017,Assault_2018,Assault_2019,Assault_AVG,...,TheftOver_2015,TheftOver_2016,TheftOver_2017,TheftOver_2018,TheftOver_2019,TheftOver_AVG,TheftOver_CHG,TheftOver_Rate_2019,Shape__Area,Shape__Length
OBJECTID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,Yonge-St.Clair,97,12528,20,29,39,27,34,37,31.0,...,5,8,0,3,6,4.3,1.0,47.9,1161315.0,5873.270582
2,York University Heights,27,27593,271,296,361,344,357,370,333.2,...,46,37,39,38,28,36.3,-0.26,101.5,13246660.0,18504.777326
3,Lansing-Westgate,38,16164,44,80,68,85,75,72,70.7,...,5,5,11,6,11,7.0,0.83,68.1,5346186.0,11112.109625
4,Yorkdale-Glen Park,31,14804,106,136,174,161,175,209,160.2,...,14,26,23,20,29,22.5,0.45,195.9,6038326.0,10079.42692
5,Stonegate-Queensway,16,25051,88,71,76,95,87,82,83.2,...,8,4,6,7,4,6.0,-0.43,16.0,7946202.0,11853.189878


In [13]:
# Dropping the rows where Borough is 'Not assigned'
df = df[df.Borough != 'Not assigned']

# Combining the neighbourhoods with same Postalcode
df = df.groupby(['Postal Code','Borough'], sort=False).agg(', '.join)
df.reset_index(inplace=True)

# Replacing the name of the neighbourhoods which are 'Not assigned' with names of Borough
df['Neighbourhood'] = np.where(df['Neighbourhood'] == 'Not assigned',df['Borough'], df['Neighbourhood'])

df.shape
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [14]:
#Importing the csv file conatining the latitudes and longitudes for various neighbourhoods in Canada
latitude_longitude = pd.read_csv('https://cocl.us/Geospatial_data')
latitude_longitude.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Merging the two tables for getting the Latitudes and Longitudes for various neighbourhoods in Canada

In [15]:
latitude_longitude.rename(columns={'Postal Code':'Postal Code'},inplace=True)
df = pd.merge(df,latitude_longitude,on='Postal Code')
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [17]:
### Merging the three tables for getting the Neighbourhood with the crime Rate 

df = pd.merge(df,toronto_Crime_df,on='Neighbourhood')
df.head()


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Hood_ID,Population,Assault_2014,Assault_2015,Assault_2016,...,TheftOver_2015,TheftOver_2016,TheftOver_2017,TheftOver_2018,TheftOver_2019,TheftOver_AVG,TheftOver_CHG,TheftOver_Rate_2019,Shape__Area,Shape__Length
0,M4A,North York,Victoria Village,43.725882,-79.315572,43,17510,118,138,133,...,6,4,5,4,5,5.0,0.25,28.6,4755219.0,11800.341701
1,M6C,York,Humewood-Cedarvale,43.693781,-79.428191,106,14365,43,52,52,...,3,5,6,7,4,4.7,-0.43,27.8,1871263.0,6036.268116
2,M4E,East Toronto,The Beaches,43.676357,-79.293031,63,21567,83,108,86,...,5,3,7,5,9,6.2,0.8,41.7,3595829.0,11275.180743
3,M1G,Scarborough,Woburn,43.770992,-79.216917,137,53485,352,395,365,...,13,14,23,13,8,13.7,-0.38,15.0,12334070.0,18111.264992
4,M2H,North York,Hillcrest Village,43.803762,-79.363452,48,16934,63,59,41,...,5,2,4,7,6,5.2,-0.14,35.4,5395666.0,9570.813843
