<h1><b>Machine Learning Based Clustering and Segmentation for Navigation<b></h1>

<h3><b>Introduction</b></h3>
    <p>
    An ML based navigation algorithm that is based on several factors pertaining to neighbourhoods. That will give you the most efficient route to the desired destination, based on factors such as crime rate and population density.
    </p>
<h3><b>Project Contribution</b></h3>
    <p>
    The project contribution is to find correlations between topics surrounding the crime rate, population information and income sources. The purpose of this Jupyter notebook is to focus on the following correlations:
        <ul>
            <li>Correlation between hospitals and homicide rates</li>
            <li>Correlation between police stations and assault rates</li>
            <li>Correlation between schools and income</li>
        </ul>
    </p>
<h3><b>Prerequisites</b></h3>
<ul>
    <li>Foursquare API</li>
</ul>
<h3><b>Datasets Used</b></h3>

<h3><b>Import Statements</b></h3>

In [155]:
from dotenv import load_dotenv
from dotenv import dotenv_values
import folium
import requests
import pandas as pd 
from pandas import json_normalize
from bs4 import BeautifulSoup as bs
import os
from sklearn.cluster import KMeans

import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors


<h3><b>Foursquare API Initialization / Check</b></h3>
<h4><b>Category Codes:</b></h4>
<ul>
    <li>10000 - Arts and Entertainment</li>
    <li>11000 - Business and Professional Services</li>
    <li>12000 - Community and Government</li>
    <li>13000 - Dining and Drinking</li>
    <li>14000 - Event</li>
    <li>15000 - Health and Medicine</li>
    <li>16000 - Landmarks and Outdoors</li>
    <li>17000 - Retail</li>
    <li>18000 - Sports and Recreation</li>
    <li>19000 - Travel and Transportation</li>
</ul>

In [156]:
config = dotenv_values(".env")
url = "https://api.foursquare.com/v3/places/nearby"

headers = {"Accept": "application/json",
            "Authorization": config["API_KEY"]}

response = requests.request("GET", url, headers=headers)

def create_request(coords= None, location = None, categories = None, limit = "10"):
    """
        Important:
            - Coords and location cannot be entered together
            - Location and radius cannot be entered together

        The coords will be a list with latitude and longitude.\n 
        Location will be a city and province such as  "Oshawa, ON".\n
        The category is a string from the above codes, with a default of None.\n
        The limit parameter is a maximum of 50, with a default of 10 requests.\n

        Examples:
            - create_request(coords=[-72.848752,43.895962], limit="1")
            - create_request(coords=[-72.848752,43.895962], categories="10000", limit="2")\n
            - create_request(location=["Oshawa","ON"], limit="2")
            - create_request(location=["Oshawa","ON"], categories="10000", limit="20")
    """

    if(coords and categories == None):
        url = "https://api.foursquare.com/v3/places/search?ll=" + str(coords[0]) + "%2C" + str(coords[1]) + "&radius=100000"  + "&limit=" + limit
    elif(coords and categories):
        url = "https://api.foursquare.com/v3/places/search?ll=" + str(coords[0]) + "%2C" + str(coords[1]) +"&categories=" + categories + "&radius=100000" + "&limit=" + limit
    elif(location and categories == None):
        url = "https://api.foursquare.com/v3/places/search?" + "near=" + str(location[0]) + "%2C" + str(location[1]) + "&limit=" + limit
    elif(location and categories):
        url = "https://api.foursquare.com/v3/places/search?" + "categories=" + categories + "&near=" + str(location[0]) + "%2C" + str(location[1]) + "&limit=" + limit
    else:
        return False
    
    response = requests.request("GET", url, headers=headers)
    
    if(response.status_code == 200):
        return response.json()
    else:
        return False

<h3><b>Creating Venue DataFrame</b></h3>

In [157]:
latitude = 43.6532 
longitude = -79.3832
hospitals = create_request(coords=[latitude, longitude], categories="15000", limit="50")

# Generates data from the Foursquare API
hospital_venues = json_normalize(hospitals['results'], max_level=3)
hospital_venues.drop(hospital_venues.columns[[0, 1, 2, 3, 4,  9, 10, 11, 13, 12, 17, 18, 19, 20, 21]], axis=1, inplace=True)

#
pd.DataFrame(hospital_venues)


Unnamed: 0,name,timezone,geocodes.main.latitude,geocodes.main.longitude,location.formatted_address,location.locality,location.neighborhood,location.address_extended,location.po_box
0,St. Joseph's Health Centre,America/Toronto,43.639278,-79.450095,"30 the Queensway (Roncesvalles), Toronto ON M6...",Toronto,[Little Poland],,
1,M-wing: Sunnybrook,America/Toronto,43.721781,-79.376429,"2075 Bayview Ave (Sunnybrook Hospital), Toront...",Toronto,,,
2,St Michael's Hospital,America/Toronto,43.653818,-79.37758,"30 Bond St (at Queen St E), Toronto ON M5B 1W8",Toronto,[Downtown Toronto],# 800,
3,VEC Veterinary Emergency Clinic,America/Toronto,43.673978,-79.389731,"920 Yonge St (Entrance on McMurrich St), Toron...",Toronto,,# 117,
4,Oakville Trafalgar Memorial Hospital,America/Toronto,43.45054,-79.764593,"3001 Hospital Gate (Dundas & Third Line), Oakv...",Oakville,,,
5,Hospital For Sick Children,America/Toronto,43.656932,-79.388829,"555 University Ave (at Gerrard St.), Toronto O...",Toronto,[Chinatown],,
6,Toronto Rehabilitation Institute,America/Toronto,43.656552,-79.389423,"550 University Ave (at Elm St.), Toronto ON M5...",Toronto,[Chinatown],# 3000,
7,Toronto General Hospital,America/Toronto,43.658762,-79.388292,"190 Elizabeth St (at Gerrard St W), Toronto ON...",Toronto,,,
8,Princess Margaret Cancer Centre,America/Toronto,43.658143,-79.390705,"610 University Ave (at Gerrard St. W), Toronto...",Toronto,,,
9,Women's College Hospital,America/Toronto,43.661526,-79.38741,"76 Grenville St (at Bay St.), Toronto ON M5S 1B2",Toronto,[Chinatown],Suite 214,


<h3><b>Gather the Postal Codes</b></h3>

In [158]:
path = os.getcwd()
path = os.path.join(path,"datasets/neighborhood-data.csv")
postcodes = pd.read_csv(path)
postcodes.drop(postcodes.columns[[0]], axis=1, inplace=True)
postcodes.head()



Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


<h3><b>Gather the Dataframe for Homicide Rates and Crime Rates</b></h3>

In [159]:
path2 = os.getcwd()
path2 = os.path.join(path2,"datasets/neighbourhood-crime-rates.csv")
crimedata = pd.read_csv(path2)
crimedata.drop(crimedata.columns[[0]], axis=1, inplace=True)

path3 = os.getcwd()
path3 = os.path.join(path3,"datasets/combined-dataset.csv")
combined_data = pd.read_csv(path3)
combined_data.drop(combined_data.columns[[0]], axis=1, inplace=True)
combined_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Hood_ID,Population,Assault_AVG,Assault_Rate_2019,AutoTheft_AVG,AutoTheft_Rate_2019,Homicide_AVG,Homicide_Rate_2019,Latitude,Longitude
0,M3A,North York,Parkwoods,45,34805,159.7,454.0,31.5,91.9,0.3,2.9,43.751,-79.323
1,M4A,North York,Victoria Village,43,17510,119.3,753.9,16.5,102.8,0.7,5.7,43.735,-79.312
2,M6A,North York,Lawrence Heights,32,22372,104.0,518.5,28.5,102.8,0.2,0.0,43.722,-79.451
3,M1B,Scarborough,Rouge,131,46496,173.3,391.4,50.5,187.1,0.8,0.0,43.804,-79.165
4,M1B,Scarborough,Malvern,132,43794,278.2,760.4,47.2,162.1,1.7,2.3,43.809,-79.221


<h3><b></b></h3>


In [160]:
map_creation = folium.Map(location=[latitude, longitude], zoom_start=10)

for name, latitude, longitude in zip(hospital_venues['name'], hospital_venues['geocodes.main.latitude'], hospital_venues['geocodes.main.longitude']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [latitude,longitude],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_creation)

k = 5
toronto_clustering = combined_data.drop(columns=["Postcode", "Population","Borough", "Neighbourhood", "Hood_ID", "Assault_AVG", "Assault_Rate_2019", "AutoTheft_AVG", "AutoTheft_Rate_2019", "Homicide_AVG" , "Homicide_Rate_2019"])
kmeans = KMeans(n_clusters = k,random_state=0).fit(toronto_clustering)
kmeans.labels_
combined_data.insert(0, "Cluster Labels", kmeans.labels_)
combined_data

Unnamed: 0,Cluster Labels,Postcode,Borough,Neighbourhood,Hood_ID,Population,Assault_AVG,Assault_Rate_2019,AutoTheft_AVG,AutoTheft_Rate_2019,Homicide_AVG,Homicide_Rate_2019,Latitude,Longitude
0,1,M3A,North York,Parkwoods,45,34805,159.7,454.0,31.5,91.9,0.3,2.9,43.751,-79.323
1,1,M4A,North York,Victoria Village,43,17510,119.3,753.9,16.5,102.8,0.7,5.7,43.735,-79.312
2,0,M6A,North York,Lawrence Heights,32,22372,104.0,518.5,28.5,102.8,0.2,0.0,43.722,-79.451
3,2,M1B,Scarborough,Rouge,131,46496,173.3,391.4,50.5,187.1,0.8,0.0,43.804,-79.165
4,2,M1B,Scarborough,Malvern,132,43794,278.2,760.4,47.2,162.1,1.7,2.3,43.809,-79.221
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
79,4,M4X,Downtown Toronto,Cabbagetown,71,11669,102.3,1079.8,10.7,188.5,0.3,0.0,43.665,-79.368
80,4,M4X,Downtown Toronto,St. James Town,71,11669,102.3,1079.8,10.7,188.5,0.3,0.0,43.670,-79.373
81,3,M8Y,Etobicoke,Kingsway Park South East,15,9271,25.8,302.0,16.5,302.0,0.0,0.0,43.619,-79.500
82,3,M8Y,Etobicoke,Mimico NE,17,33964,299.2,959.8,37.3,176.7,0.7,2.9,43.614,-79.495


<h3><b>Correlation #1: Medical Buildings and Homicide Rates</b></h3>

In [161]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, neighbourhood, cluster in zip(combined_data['Latitude'], combined_data['Longitude'], combined_data['Neighbourhood'], combined_data['Cluster Labels']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

for name, latitude, longitude in zip(hospital_venues['name'], hospital_venues['geocodes.main.latitude'], hospital_venues['geocodes.main.longitude']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [latitude,longitude],
    radius=5,
    popup=label,
    color='black',
    fill=True,
    fill_color='black',
    fill_opacity=0.7,
    parse_html=False).add_to(map_clusters)

       
map_clusters

<h3><b>Correlation #2: Educational Buildings and Education </b></h3>


<h3><b>Mapping Police Venues</b></h3>
