<h1><b>Machine Learning Based Clustering and Segmentation for Navigation<b></h1>

<h3><b>Introduction</b></h3>
    <p>
    An ML based navigation algorithm that is based on several factors pertaining to neighbourhoods. That will give you the most efficient route to the desired destination, based on factors such as crime rate and population density.
    </p>
<h3><b>Project Contribution</b></h3>
    <p>
    The project contribution is to find correlations between topics surrounding the crime rate, population information and income sources. The purpose of this Jupyter notebook is to focus on the following correlations:
        <ul>
            <li>Correlation between police stations and assault rates</li>
            <li>Correlation between schools and income</li>
        </ul>
    </p>
<h3><b>Prerequisites</b></h3>
<ul>
    <li>Foursquare API</li>
</ul>
<h3><b>Datasets Used</b></h3>

<h3><b>Import Statements</b></h3>

In [718]:
from dotenv import load_dotenv
from dotenv import dotenv_values
import folium
import requests
import pandas as pd 
from pandas import json_normalize
from bs4 import BeautifulSoup as bs
import os
from sklearn.cluster import KMeans

import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors


<h3><b>Foursquare API Initialization / Check</b></h3>
<h4><b>Category Codes:</b></h4>
<ul>
    <li>10000 - Arts and Entertainment</li>
    <li>11000 - Business and Professional Services</li>
    <li>12000 - Community and Government</li>
    <li>13000 - Dining and Drinking</li>
    <li>14000 - Event</li>
    <li>15000 - Health and Medicine</li>
    <li>16000 - Landmarks and Outdoors</li>
    <li>17000 - Retail</li>
    <li>18000 - Sports and Recreation</li>
    <li>19000 - Travel and Transportation</li>
</ul>

In [719]:
config = dotenv_values(".env")
url = "https://api.foursquare.com/v3/places/nearby"

headers = {"Accept": "application/json",
            "Authorization": config["API_KEY"]}

response = requests.request("GET", url, headers=headers)

def create_request(coords= None, location = None, categories = None, limit = "10"):
    """
        Important:
            - Coords and location cannot be entered together
            - Location and radius cannot be entered together

        The coords will be a list with latitude and longitude.\n 
        Location will be a city and province such as  "Oshawa, ON".\n
        The category is a string from the above codes, with a default of None.\n
        The limit parameter is a maximum of 50, with a default of 10 requests.\n

        Examples:
            - create_request(coords=[-72.848752,43.895962], limit="1")
            - create_request(coords=[-72.848752,43.895962], categories="10000", limit="2")\n
            - create_request(location=["Oshawa","ON"], limit="2")
            - create_request(location=["Oshawa","ON"], categories="10000", limit="20")
    """

    if(coords and categories == None):
        url = "https://api.foursquare.com/v3/places/search?ll=" + str(coords[0]) + "%2C" + str(coords[1]) + "&radius=100000"  + "&limit=" + limit
    elif(coords and categories):
        url = "https://api.foursquare.com/v3/places/search?ll=" + str(coords[0]) + "%2C" + str(coords[1]) +"&categories=" + categories + "&radius=100000" + "&limit=" + limit
    elif(location and categories == None):
        url = "https://api.foursquare.com/v3/places/search?" + "near=" + str(location[0]) + "%2C" + str(location[1]) + "&limit=" + limit
    elif(location and categories):
        url = "https://api.foursquare.com/v3/places/search?" + "categories=" + categories + "&near=" + str(location[0]) + "%2C" + str(location[1]) + "&limit=" + limit
    else:
        return False
    
    response = requests.request("GET", url, headers=headers)
    
    if(response.status_code == 200):
        return response.json()
    else:
        return False

<h3><b>Creating Venue DataFrame</b></h3>

In [720]:
latitude = 43.6532
longitude = -79.3832
police_stations = create_request(location=["Toronto","ON"], categories="12070", limit="50")

# Generates data from the Foursquare API
police_venues = json_normalize(police_stations['results'], max_level=3)
pd.DataFrame(police_venues)


Unnamed: 0,fsq_id,categories,chains,distance,link,name,timezone,geocodes.main.latitude,geocodes.main.longitude,geocodes.roof.latitude,geocodes.roof.longitude,location.address,location.country,location.cross_street,location.formatted_address,location.locality,location.neighborhood,location.postcode,location.region,related_places.children
0,4c00016237850f473c8e973f,"[{'id': 12072, 'name': 'Police Station', 'icon...",[],1441,/v3/places/4c00016237850f473c8e973f,53 Division Toronto Police Service,America/Toronto,43.706104,-79.400647,43.706104,-79.400647,75 Eglinton Ave W,CA,at Duplex Ave.,"75 Eglinton Ave W (at Duplex Ave.), Toronto ON...",Toronto,[Yonge and Eglinton],M4R 2G9,ON,
1,4cc5d24f80624688058a3e2f,"[{'id': 12072, 'name': 'Police Station', 'icon...",[],1693,/v3/places/4cc5d24f80624688058a3e2f,Toronto Police Service - 13 Division,America/Toronto,43.698433,-79.436581,43.698433,-79.436581,1435 Eglinton Ave W,CA,Allen Expressway,"1435 Eglinton Ave W (Allen Expressway), Toront...",Toronto,[York],M6C 3Z4,ON,
2,4e700aa1ae602b2e721fc1c7,"[{'id': 12071, 'name': 'Fire Station', 'icon':...",[],2106,/v3/places/4e700aa1ae602b2e721fc1c7,Toronto Fire Station 341,America/Toronto,43.694398,-79.441081,43.694398,-79.441081,555 Oakwood Ave,CA,,"555 Oakwood Ave, Toronto ON M6E 2X4",Toronto,,M6E 2X4,ON,
3,4d83b00b7e8ef04d20d8fbbd,"[{'id': 12071, 'name': 'Fire Station', 'icon':...",[],3084,/v3/places/4d83b00b7e8ef04d20d8fbbd,Toronto Fire Station 131,America/Toronto,43.726175,-79.402729,43.726175,-79.402729,3135 Yonge St,CA,Wanless Ave,"3135 Yonge St (Wanless Ave), Toronto ON M4N 2K8",Toronto,,M4N 2K8,ON,
4,4dcd5207922e8ac4247be29d,"[{'id': 12072, 'name': 'Police Station', 'icon...",[],4088,/v3/places/4dcd5207922e8ac4247be29d,University of Toronto Campus Community Police,America/Toronto,43.664817,-79.400841,43.664817,-79.400841,21 Sussex Ave,CA,btwn Huron & St. George,"21 Sussex Ave (btwn Huron & St. George), Toron...",Toronto,[Susex Ulster],M5S 1J6,ON,
5,4c72bc447121a1cdcc4f63d1,"[{'id': 12071, 'name': 'Fire Station', 'icon':...",[],4539,/v3/places/4c72bc447121a1cdcc4f63d1,Fire Station 313,America/Toronto,43.671376,-79.376211,43.671376,-79.376211,443 Bloor St E,CA,at Glen Rd.,"443 Bloor St E (at Glen Rd.), Toronto ON M4W 1J1",Toronto,,M4W 1J1,ON,
6,50e1e4e0e4b0c6dccf466c8f,"[{'id': 12072, 'name': 'Police Station', 'icon...",[],4788,/v3/places/50e1e4e0e4b0c6dccf466c8f,Davenport Police Station,America/Toronto,43.67075,-79.459597,43.67075,-79.459597,2054 Davenport Rd,CA,at Osler St,"2054 Davenport Rd (at Osler St), Toronto ON M6...",Toronto,,M6N 1C8,ON,
7,4e3b19ebfa7645537598ff38,"[{'id': 12071, 'name': 'Fire Station', 'icon':...",[],4855,/v3/places/4e3b19ebfa7645537598ff38,Toronto Fire Station 314,America/Toronto,43.663106,-79.384201,43.663106,-79.384201,12 Grosvenor St,CA,St Luke Ln,"12 Grosvenor St (St Luke Ln), Toronto ON",Toronto,,,ON,
8,4e28e05062e17c3301a324f8,"[{'id': 12071, 'name': 'Fire Station', 'icon':...",[],4888,/v3/places/4e28e05062e17c3301a324f8,Toronto Fire Station 315,America/Toronto,43.6569,-79.404721,43.6569,-79.404721,132 Bellevue Ave,CA,at College St,"132 Bellevue Ave (at College St), Toronto ON M...",Toronto,,M5T 2N9,ON,
9,4c33ca0866e40f473e9dc88b,"[{'id': 12072, 'name': 'Police Station', 'icon...",[],4906,/v3/places/4c33ca0866e40f473e9dc88b,Toronto Police Service - 11 Division,America/Toronto,43.671002,-79.46076,43.671002,-79.46076,2054 Davenport Rd,CA,,"2054 Davenport Rd, Toronto ON M6N 1C8",Toronto,,M6N 1C8,ON,


<h3><b>Gather the Postal Codes</b></h3>

In [721]:
path = os.getcwd()
path = os.path.join(path,"datasets/neighborhood-data.csv")
postcodes = pd.read_csv(path)
postcodes.drop(postcodes.columns[[0]], axis=1, inplace=True)
postcodes.head()



Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


<h3><b>Gather the Dataframe for Homicide Rates and Crime Rates</b></h3>

In [722]:
path2 = os.getcwd()
path2 = os.path.join(path2,"datasets/neighbourhood-crime-rates.csv")
crimedata = pd.read_csv(path2)
crimedata.drop(crimedata.columns[[0]], axis=1, inplace=True)

path3 = os.getcwd()
path3 = os.path.join(path3,"datasets/combined-dataset.csv")
combined_data = pd.read_csv(path3)
combined_data.drop(combined_data.columns[[0]], axis=1, inplace=True)
combined_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Hood_ID,Population,Assault_AVG,Assault_Rate_2019,AutoTheft_AVG,AutoTheft_Rate_2019,Homicide_AVG,Homicide_Rate_2019,Latitude,Longitude
0,M3A,North York,Parkwoods,45,34805,159.7,454.0,31.5,91.9,0.3,2.9,43.751,-79.323
1,M4A,North York,Victoria Village,43,17510,119.3,753.9,16.5,102.8,0.7,5.7,43.735,-79.312
2,M6A,North York,Lawrence Heights,32,22372,104.0,518.5,28.5,102.8,0.2,0.0,43.722,-79.451
3,M1B,Scarborough,Rouge,131,46496,173.3,391.4,50.5,187.1,0.8,0.0,43.804,-79.165
4,M1B,Scarborough,Malvern,132,43794,278.2,760.4,47.2,162.1,1.7,2.3,43.809,-79.221


<h3><b></b></h3>


In [723]:
map_creation = folium.Map(location=[latitude, longitude], zoom_start=10)

for name, latitude, longitude in zip(police_venues['name'], police_venues['geocodes.main.latitude'], police_venues['geocodes.main.longitude']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [latitude,longitude],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_creation)

k = 6
toronto_clustering = combined_data.drop(columns=["Postcode", "Population","Borough", "Neighbourhood", "Assault_AVG", "Assault_Rate_2019", "AutoTheft_Rate_2019", "Homicide_AVG" , "Homicide_Rate_2019", "Latitude", "Longitude"])
kmeans = KMeans(n_clusters = k,random_state=0).fit(toronto_clustering)
kmeans.labels_
combined_data.insert(0, "Cluster Labels", kmeans.labels_)
combined_data

Unnamed: 0,Cluster Labels,Postcode,Borough,Neighbourhood,Hood_ID,Population,Assault_AVG,Assault_Rate_2019,AutoTheft_AVG,AutoTheft_Rate_2019,Homicide_AVG,Homicide_Rate_2019,Latitude,Longitude
0,5,M3A,North York,Parkwoods,45,34805,159.7,454.0,31.5,91.9,0.3,2.9,43.751,-79.323
1,5,M4A,North York,Victoria Village,43,17510,119.3,753.9,16.5,102.8,0.7,5.7,43.735,-79.312
2,5,M6A,North York,Lawrence Heights,32,22372,104.0,518.5,28.5,102.8,0.2,0.0,43.722,-79.451
3,4,M1B,Scarborough,Rouge,131,46496,173.3,391.4,50.5,187.1,0.8,0.0,43.804,-79.165
4,4,M1B,Scarborough,Malvern,132,43794,278.2,760.4,47.2,162.1,1.7,2.3,43.809,-79.221
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
78,2,M4X,Downtown Toronto,Cabbagetown,71,11669,102.3,1079.8,10.7,188.5,0.3,0.0,43.665,-79.368
79,2,M4X,Downtown Toronto,St. James Town,71,11669,102.3,1079.8,10.7,188.5,0.3,0.0,43.670,-79.373
80,0,M8Y,Etobicoke,Kingsway Park South East,15,9271,25.8,302.0,16.5,302.0,0.0,0.0,43.619,-79.500
81,0,M8Y,Etobicoke,Mimico NE,17,33964,299.2,959.8,37.3,176.7,0.7,2.9,43.614,-79.495


<h3><b>Correlation #1: Law Enforcement Buildings in Neighbourhoods and Auto Theft Rates</b></h3>

In [724]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, neighbourhood, cluster in zip(combined_data['Latitude'], combined_data['Longitude'], combined_data['Neighbourhood'], combined_data['Cluster Labels']):
    label = folium.Popup('Neighbourhood ' + str(neighbourhood), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

for name, latitude, longitude in zip(police_venues['name'], police_venues['geocodes.main.latitude'], hospital_venues['geocodes.main.longitude']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [latitude,longitude],
    radius=5,
    popup=label,
    color='black',
    fill=True,
    fill_color='black',
    fill_opacity=0.7,
    parse_html=False).add_to(map_clusters)

       
map_clusters

<h3><b>Correlation #2: Educational Buildings and Income </b></h3>


<h3><b>Correlation #3: </b></h3>
