<h1><b>Machine Learning Based Clustering and Segmentation for Navigation<b></h1>

<h3><b>Introduction</b></h3>
    <p>
    An ML based navigation algorithm that is based on several factors pertaining to neighbourhoods. That will give you the most efficient route to the desired destination, based on factors such as crime rate and population density.
    </p>
<h3><b>Project Contribution</b></h3>
    <p>
    The project contribution is to find correlations between topics surrounding the crime rate, population information and income sources. The purpose of this Jupyter notebook is to focus on the following correlations:
        <ul>
            <li>Correlation between police stations and assault rates</li>
            <li>Correlation between schools and income</li>
        </ul>
    </p>
<h3><b>Prerequisites</b></h3>
<ul>
    <li>Foursquare API</li>
</ul>
<h3><b>Datasets Used</b></h3>

<h3><b>Import Statements</b></h3>

In [3696]:
from dotenv import load_dotenv
from dotenv import dotenv_values
import folium
import requests
import pandas as pd 
from pandas import json_normalize
from bs4 import BeautifulSoup as bs
import os
from sklearn.cluster import KMeans

import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors


<h3><b>Foursquare API Initialization / Check</b></h3>
<h4><b>Category Codes:</b></h4>
<ul>
    <li>10000 - Arts and Entertainment</li>
    <li>11000 - Business and Professional Services</li>
    <li>12000 - Community and Government</li>
    <li>13000 - Dining and Drinking</li>
    <li>14000 - Event</li>
    <li>15000 - Health and Medicine</li>
    <li>16000 - Landmarks and Outdoors</li>
    <li>17000 - Retail</li>
    <li>18000 - Sports and Recreation</li>
    <li>19000 - Travel and Transportation</li>
</ul>

In [3697]:
config = dotenv_values(".env")
url = "https://api.foursquare.com/v3/places/nearby"

headers = {"Accept": "application/json",
            "Authorization": config["API_KEY"]}

response = requests.request("GET", url, headers=headers)

def create_request(coords= None, location = None, categories = None, limit = "10"):
    """
        Important:
            - Coords and location cannot be entered together
            - Location and radius cannot be entered together

        The coords will be a list with latitude and longitude.\n 
        Location will be a city and province such as  "Oshawa, ON".\n
        The category is a string from the above codes, with a default of None.\n
        The limit parameter is a maximum of 50, with a default of 10 requests.\n

        Examples:
            - create_request(coords=[-72.848752,43.895962], limit="1")
            - create_request(coords=[-72.848752,43.895962], categories="10000", limit="2")\n
            - create_request(location=["Oshawa","ON"], limit="2")
            - create_request(location=["Oshawa","ON"], categories="10000", limit="20")
    """

    if(coords and categories == None):
        url = "https://api.foursquare.com/v3/places/search?ll=" + str(coords[0]) + "%2C" + str(coords[1]) + "&radius=100000"  + "&limit=" + limit
    elif(coords and categories):
        url = "https://api.foursquare.com/v3/places/search?ll=" + str(coords[0]) + "%2C" + str(coords[1]) +"&categories=" + categories + "&radius=100000" + "&limit=" + limit
    elif(location and categories == None):
        url = "https://api.foursquare.com/v3/places/search?" + "near=" + str(location[0]) + "%2C" + str(location[1]) + "&limit=" + limit
    elif(location and categories):
        url = "https://api.foursquare.com/v3/places/search?" + "categories=" + categories + "&near=" + str(location[0]) + "%2C" + str(location[1]) + "&limit=" + limit
    else:
        return False
    
    response = requests.request("GET", url, headers=headers)
    
    if(response.status_code == 200):
        return response.json()
    else:
        return False

<h3><b>Creating Law Enforcement DataFrame</b></h3>

In [3698]:
latitude = 43.6532
longitude = -79.3832
police_stations = create_request(location=["Toronto","ON"], categories="12070", limit="50")

# Generates data from the Foursquare API
police_venues = json_normalize(police_stations['results'], max_level=3)
pd.DataFrame(police_venues)
police_venues.head()

Unnamed: 0,fsq_id,categories,chains,distance,link,name,timezone,geocodes.main.latitude,geocodes.main.longitude,geocodes.roof.latitude,geocodes.roof.longitude,location.address,location.country,location.cross_street,location.formatted_address,location.locality,location.neighborhood,location.postcode,location.region,related_places.children
0,4c00016237850f473c8e973f,"[{'id': 12072, 'name': 'Police Station', 'icon...",[],1441,/v3/places/4c00016237850f473c8e973f,53 Division Toronto Police Service,America/Toronto,43.706104,-79.400647,43.706104,-79.400647,75 Eglinton Ave W,CA,at Duplex Ave.,"75 Eglinton Ave W (at Duplex Ave.), Toronto ON...",Toronto,[Yonge and Eglinton],M4R 2G9,ON,
1,4cc5d24f80624688058a3e2f,"[{'id': 12072, 'name': 'Police Station', 'icon...",[],1693,/v3/places/4cc5d24f80624688058a3e2f,Toronto Police Service - 13 Division,America/Toronto,43.698433,-79.436581,43.698433,-79.436581,1435 Eglinton Ave W,CA,Allen Expressway,"1435 Eglinton Ave W (Allen Expressway), Toront...",Toronto,[York],M6C 3Z4,ON,
2,4e700aa1ae602b2e721fc1c7,"[{'id': 12071, 'name': 'Fire Station', 'icon':...",[],2106,/v3/places/4e700aa1ae602b2e721fc1c7,Toronto Fire Station 341,America/Toronto,43.694398,-79.441081,43.694398,-79.441081,555 Oakwood Ave,CA,,"555 Oakwood Ave, Toronto ON M6E 2X4",Toronto,,M6E 2X4,ON,
3,4d83b00b7e8ef04d20d8fbbd,"[{'id': 12071, 'name': 'Fire Station', 'icon':...",[],3084,/v3/places/4d83b00b7e8ef04d20d8fbbd,Toronto Fire Station 131,America/Toronto,43.726175,-79.402729,43.726175,-79.402729,3135 Yonge St,CA,Wanless Ave,"3135 Yonge St (Wanless Ave), Toronto ON M4N 2K8",Toronto,,M4N 2K8,ON,
4,4dcd5207922e8ac4247be29d,"[{'id': 12072, 'name': 'Police Station', 'icon...",[],4088,/v3/places/4dcd5207922e8ac4247be29d,University of Toronto Campus Community Police,America/Toronto,43.664817,-79.400841,43.664817,-79.400841,21 Sussex Ave,CA,btwn Huron & St. George,"21 Sussex Ave (btwn Huron & St. George), Toron...",Toronto,[Susex Ulster],M5S 1J6,ON,


<h3><b>Creating Educational DataFrame</b></h3>

In [3699]:
latitude = 43.6532
longitude = -79.3832
educational = create_request(location=["Toronto","ON"], categories="15000", limit="50")

# Generates data from the Foursquare API
education_venues = json_normalize(educational['results'], max_level=3)
pd.DataFrame(education_venues)
education_venues.head()

Unnamed: 0,fsq_id,categories,chains,distance,link,name,timezone,geocodes.main.latitude,geocodes.main.longitude,geocodes.roof.latitude,...,location.cross_street,location.formatted_address,location.locality,location.neighborhood,location.postcode,location.region,related_places.children,related_places.parent.fsq_id,related_places.parent.name,location.address_extended
0,4ae06ddef964a5203b7f21e3,"[{'id': 15054, 'name': 'Veterinarian', 'icon':...",[],6726,/v3/places/4ae06ddef964a5203b7f21e3,Toronto Humane Society,America/Toronto,43.657649,-79.356448,43.657649,...,Queen St. E,"11 River St (Queen St. E), Toronto ON M5A 4C2",Toronto,[Trefann],M5A 4C2,ON,,,,
1,4b1d38d8f964a520360d24e3,"[{'id': 15016, 'name': 'Medical Center', 'icon...",[],7522,/v3/places/4b1d38d8f964a520360d24e3,West Park Health Care Centre,America/Toronto,43.688064,-79.508431,,...,Jane and weston rd.,"82 Buttonwood Ave (Jane and weston rd.), Toron...",Toronto,,,ON,,,,
2,4b81a736f964a52089b530e3,"[{'id': 15054, 'name': 'Veterinarian', 'icon':...",[],12217,/v3/places/4b81a736f964a52089b530e3,Birchmount Veterinary Clinic,America/Toronto,43.761702,-79.290318,43.761702,...,,"1563 Birchmount Rd, Toronto ON M1P 2H4",Toronto,[Wexford],M1P 2H4,ON,,,,
3,4bcdebfefb84c9b650d8223e,"[{'id': 15011, 'name': 'Healthcare Clinic', 'i...",[],5260,/v3/places/4bcdebfefb84c9b650d8223e,Albany Medical Clinic,America/Toronto,43.677835,-79.358341,43.677835,...,btwn Pretoria & Erindale,"807 Broadview Ave (btwn Pretoria & Erindale), ...",Toronto,[Greektown],M4K 2P8,ON,"[{'fsq_id': '5a6f39566c08d12458f39808', 'name'...",,,
4,4ae5f615f964a5208ba321e3,"[{'id': 15014, 'name': 'Hospital', 'icon': {'p...",[],7206,/v3/places/4ae5f615f964a5208ba321e3,St. Joseph's Health Centre,America/Toronto,43.639278,-79.450095,43.639278,...,Roncesvalles,"30 the Queensway (Roncesvalles), Toronto ON M6...",Toronto,[Little Poland],M6R 1B5,ON,"[{'fsq_id': '5bbb9bc5646e3800394a73b8', 'name'...",,,


<h3><b>Gather the Postal Codes</b></h3>

In [3700]:
path = os.getcwd()
path = os.path.join(path,"datasets/neighborhood-data.csv")
postcodes = pd.read_csv(path)
postcodes.drop(postcodes.columns[[0]], axis=1, inplace=True)
postcodes.head()



Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


<h3><b>Gather the Dataframe for Homicide Rates and Crime Rates</b></h3>

In [3701]:
path2 = os.getcwd()
path2 = os.path.join(path2,"datasets/neighbourhood-crime-rates.csv")
crimedata = pd.read_csv(path2)
crimedata.drop(crimedata.columns[[0]], axis=1, inplace=True)

path3 = os.getcwd()
path3 = os.path.join(path3,"datasets/combined-dataset.csv")
combined_data = pd.read_csv(path3)
combined_data.drop(combined_data.columns[[0]], axis=1, inplace=True)
combined_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Hood_ID,Population,Assault_AVG,Assault_Rate_2019,AutoTheft_AVG,AutoTheft_Rate_2019,Homicide_AVG,Homicide_Rate_2019,Latitude,Longitude
0,M3A,North York,Parkwoods,45,34805,159.7,454.0,31.5,91.9,0.3,2.9,43.751,-79.323
1,M4A,North York,Victoria Village,43,17510,119.3,753.9,16.5,102.8,0.7,5.7,43.735,-79.312
2,M6A,North York,Lawrence Heights,32,22372,104.0,518.5,28.5,102.8,0.2,0.0,43.722,-79.451
3,M1B,Scarborough,Rouge,131,46496,173.3,391.4,50.5,187.1,0.8,0.0,43.804,-79.165
4,M1B,Scarborough,Malvern,132,43794,278.2,760.4,47.2,162.1,1.7,2.3,43.809,-79.221


<h3><b></b></h3>


In [3702]:
# k = 6
# toronto_clustering = combined_data.drop(columns=["Postcode", "Population","Borough", "Neighbourhood", "Assault_AVG", "Assault_Rate_2019", "AutoTheft_Rate_2019", "Homicide_AVG" , "Homicide_Rate_2019", "Latitude", "Longitude"])
# kmeans = KMeans(n_clusters = k,random_state=0).fit(toronto_clustering)
# kmeans.labels_
# # combined_data.insert(0, "Cluster Labels", kmeans.labels_)
# combined_data

<h3><b>Correlation #1: Law Enforcement Buildings in Neighbourhoods and Auto Theft Rates</b></h3>

In [3703]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for average, lat, lon, neighbourhood, cluster in zip(combined_data["AutoTheft_AVG"], combined_data['Latitude'], combined_data['Longitude'], combined_data['Neighbourhood'], combined_data['Cluster Labels']):
    label = folium.Popup('Neighbourhood ' + str(neighbourhood) + " Auto Theft " + str(average), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

for name, latitude, longitude in zip(police_venues['name'], police_venues['geocodes.main.latitude'], police_venues['geocodes.main.longitude']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [latitude,longitude],
    radius=5,
    popup=label,
    color='black',
    fill=True,
    fill_color='black',
    fill_opacity=0.7,
    parse_html=False).add_to(map_clusters)

       
map_clusters

KeyError: 'Cluster Labels'

<h3><b>Gather the Dataframe for Income Rates and Education Rates</b></h3>

In [None]:
path4 = os.getcwd()
path4 = os.path.join(path4,"datasets/population-dataset-combined.csv")
population_data = pd.read_csv(path4)
population_data.drop(population_data.columns[[0]], axis=1, inplace=True)
population_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Neighbourhood Number,"Total - Highest certificate, diploma or degree for the population aged 15 years and over in private households - 25% sample data","No certificate, diploma or degree",Secondary (high) school diploma or equivalency certificate,Trades certificate or diploma other than Certificate of Apprenticeship or Certificate of Qualification,Certificate of Apprenticeship or Certificate of Qualification,"College, CEGEP or other non-university certificate or diploma",...,"$45,000 to $49,999","$50,000 to $59,999","$60,000 to $69,999","$70,000 to $79,999","$80,000 to $89,999","$90,000 to $99,999","$100,000 and over","$200,000 and over",Latitude,Longitude
0,M3A,North York,Parkwoods,45,28890,4140,7660,700,605,5295,...,620,1200,1025,880,790,650,3795,890,43.751,-79.323
1,M6A,North York,Lawrence Manor,32,17080,2675,4340,505,330,2635,...,335,735,565,500,435,315,2155,755,43.726,-79.436
2,M1B,Scarborough,Rouge,131,38125,6580,11740,1020,635,7740,...,455,970,950,930,940,845,6060,1075,43.805,-79.166
3,M1B,Scarborough,Malvern,132,35885,7345,11575,1155,710,6915,...,655,1360,1200,1000,905,795,3280,225,43.809,-79.222
4,M3B,North York,Don Mills North,42,23390,2295,5150,450,345,3490,...,400,930,885,780,655,605,4615,1750,43.761,-79.411


In [3704]:
k = 6
combined_dataframe = pd.merge(combined_data, population_data, on="Neighbourhood")
combined_dataframe
toronto_clustering = combined_dataframe[["Homicide_AVG", "No certificate, diploma or degree", "Hood_ID", "Neighbourhood Number"]]
kmeans = KMeans(n_clusters = k,random_state=0).fit(toronto_clustering)
toronto_clustering
combined_dataframe.insert(0, "Cluster Labels", kmeans.labels_)
combined_dataframe.head()


Unnamed: 0,Cluster Labels,Postcode_x,Borough_x,Neighbourhood,Hood_ID,Population,Assault_AVG,Assault_Rate_2019,AutoTheft_AVG,AutoTheft_Rate_2019,...,"$45,000 to $49,999","$50,000 to $59,999","$60,000 to $69,999","$70,000 to $79,999","$80,000 to $89,999","$90,000 to $99,999","$100,000 and over","$200,000 and over",Latitude_y,Longitude_y
0,2,M3A,North York,Parkwoods,45,34805,159.7,454.0,31.5,91.9,...,620,1200,1025,880,790,650,3795,890,43.751,-79.323
1,1,M1B,Scarborough,Rouge,131,46496,173.3,391.4,50.5,187.1,...,455,970,950,930,940,845,6060,1075,43.805,-79.166
2,1,M1B,Scarborough,Malvern,132,43794,278.2,760.4,47.2,162.1,...,655,1360,1200,1000,905,795,3280,225,43.809,-79.222
3,0,M3B,North York,Don Mills North,42,27695,80.5,267.2,21.8,151.7,...,400,930,885,780,655,605,4615,1750,43.761,-79.411
4,3,M6C,York,Humewood-Cedarvale,106,14365,46.3,320.2,16.2,111.4,...,300,510,430,360,275,240,2045,935,43.695,-79.428


<h3><b>Correlation: Educational Buildings and Income </b></h3>


In [3705]:
map_clusters_2 = folium.Map(location=[latitude, longitude], zoom_start=10)

x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for average, lat, lon, neighbourhood, cluster in zip(combined_dataframe["$200,000 and over"], combined_dataframe['Latitude_y'], combined_dataframe['Longitude_y'], combined_dataframe['Neighbourhood'], combined_dataframe['Cluster Labels']):
    label = folium.Popup('Neighbourhood: ' + str(neighbourhood) + " Income: " + str(average), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_2)


map_clusters_2