<h1><b>Machine Learning Based Clustering and Segmentation for Navigation<b></h1>

<h3><b>Introduction</b></h3>
    <p>
    An ML based navigation algorithm that is based on several factors pertaining to neighbourhoods. That will give you the most efficient route to the desired destination, based on factors such as crime rate and population density.
    </p>
<h3><b>Project Contribution</b></h3>
    <p>
    The project contribution is to find correlations between topics surrounding the crime rate, population information and income sources. The purpose of this Jupyter notebook is to focus on the following correlations:
        <ul>
            <li>Correlation between hospitals and homicide rates</li>
            <li>Correlation between police stations and assault rates</li>
            <li>Correlation between schools and income</li>
        </ul>
    </p>
<h3><b>Prerequisites</b></h3>
<ul>
    <li>Foursquare API</li>
</ul>
<h3><b>Datasets Used</b></h3>

<h3><b>Import Statements</b></h3>

In [1]:
from dotenv import load_dotenv
from dotenv import dotenv_values
import folium
import requests
import pandas as pd 
from pandas import json_normalize
from bs4 import BeautifulSoup as bs
import os

<h3><b>Foursquare API Initialization / Check</b></h3>
<h4><b>Category Codes:</b></h4>
<ul>
    <li>10000 - Arts and Entertainment</li>
    <li>11000 - Business and Professional Services</li>
    <li>12000 - Community and Government</li>
    <li>13000 - Dining and Drinking</li>
    <li>14000 - Event</li>
    <li>15000 - Health and Medicine</li>
    <li>16000 - Landmarks and Outdoors</li>
    <li>17000 - Retail</li>
    <li>18000 - Sports and Recreation</li>
    <li>19000 - Travel and Transportation</li>
</ul>

In [2]:
config = dotenv_values(".env")
url = "https://api.foursquare.com/v3/places/nearby"

headers = {"Accept": "application/json",
            "Authorization": config["API_KEY"]}

response = requests.request("GET", url, headers=headers)

def create_request(coords= None, location = None, categories = None, limit = "10"):
    """
        Important:
            - Coords and location cannot be entered together
            - Location and radius cannot be entered together

        The coords will be a list with latitude and longitude.\n 
        Location will be a city and province such as  "Oshawa, ON".\n
        The category is a string from the above codes, with a default of None.\n
        The limit parameter is a maximum of 50, with a default of 10 requests.\n

        Examples:
            - create_request(coords=[-72.848752,43.895962], limit="1")
            - create_request(coords=[-72.848752,43.895962], categories="10000", limit="2")\n
            - create_request(location=["Oshawa","ON"], limit="2")
            - create_request(location=["Oshawa","ON"], categories="10000", limit="20")
    """

    if(coords and categories == None):
        url = "https://api.foursquare.com/v3/places/search?ll=" + str(coords[0]) + "%2C" + str(coords[1]) + "&radius=100000"  + "&limit=" + limit
    elif(coords and categories):
        url = "https://api.foursquare.com/v3/places/search?ll=" + str(coords[0]) + "%2C" + str(coords[1]) +"&categories=" + categories + "&radius=100000" + "&limit=" + limit
    elif(location and categories == None):
        url = "https://api.foursquare.com/v3/places/search?" + "near=" + str(location[0]) + "%2C" + str(location[1]) + "&limit=" + limit
    elif(location and categories):
        url = "https://api.foursquare.com/v3/places/search?" + "categories=" + categories + "&near=" + str(location[0]) + "%2C" + str(location[1]) + "&limit=" + limit
    else:
        return False
    
    response = requests.request("GET", url, headers=headers)
    
    if(response.status_code == 200):
        return response.json()
    else:
        return False

<h3><b>Creating Venue DataFrame</b></h3>

In [10]:
latitude = 43.6532 
longitude = -79.3832
hospitals = create_request(location = ["Toronto", "ON"], categories="15000", limit="50")

# Generates data from the Foursquare API
hospital_venues = json_normalize(hospitals['results'], max_level=3)
hospital_venues.drop(hospital_venues.columns[[0, 1, 2, 3, 5, 8, 9, 10, 11, 13, 12, 17, 18, 19, 20, 21]], axis=1, inplace=True)

#
pd.DataFrame(hospital_venues)


Unnamed: 0,name,geocodes.main.latitude,geocodes.main.longitude,location.locality,location.neighborhood,location.postcode
0,Toronto Humane Society,43.657649,-79.356448,Toronto,[Trefann],M5A 4C2
1,West Park Health Care Centre,43.688064,-79.508431,Toronto,,
2,Birchmount Veterinary Clinic,43.761702,-79.290318,Toronto,[Wexford],M1P 2H4
3,Albany Medical Clinic,43.677835,-79.358341,Toronto,[Greektown],M4K 2P8
4,St. Joseph's Health Centre,43.639278,-79.450095,Toronto,[Little Poland],M6R 1B5
5,M-wing: Sunnybrook,43.721781,-79.376429,Toronto,,M4N 3M5
6,VEC Veterinary Emergency Clinic,43.673978,-79.389731,Toronto,,M4W 3C7
7,St Michael's Hospital,43.653818,-79.37758,Toronto,[Downtown Toronto],M5B 1W8
8,York Hill Endodontics,43.699655,-79.425217,Toronto,[York],M5P 3L1
9,Eglinton Veterinary Facilities,43.705266,-79.404445,Toronto,[Yonge and Eglinton],M4R 1A8


<h3><b>Scraping the Wikipedia Page for Postal Codes</b></h3>

In [4]:
path = os.getcwd()
path = os.path.join(path,"datasets/neighborhood-data.csv")
postcodes = pd.read_csv(path)
postcodes.drop(postcodes.columns[[0]], axis=1, inplace=True)
postcodes.head()



Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


<h3><b>Scraping the for Homicide Rates and Crime Rates</b></h3>

In [20]:
path2 = os.getcwd()
path2 = os.path.join(path2,"datasets/neighbourhood-crime-rates.csv")
crimedata = pd.read_csv(path2)
crimedata.drop(crimedata.columns[[0]], axis=1, inplace=True)
crimedata.head()

Unnamed: 0,Neighbourhood,Hood_ID,Population,Assault_AVG,Assault_Rate_2019,AutoTheft_AVG,AutoTheft_Rate_2019,Homicide_AVG,Homicide_Rate_2019
0,Yonge-St.Clair,97,12528,31.0,295.3,4.3,47.9,0.0,0.0
1,York University Heights,27,27593,333.2,1340.9,106.3,521.9,0.8,0.0
2,Lansing-Westgate,38,16164,70.7,445.4,23.7,198.0,1.7,0.0
3,Yorkdale-Glen Park,31,14804,160.2,1411.8,55.5,412.1,1.2,6.8
4,Stonegate-Queensway,16,25051,83.2,327.3,28.7,135.7,0.0,0.0


<h3><b>Combining DataFrames</b></h3>

<h3><b>Mapping Hospital Venues</b></h3>


In [19]:
map_creation = folium.Map(location=[latitude, longitude], zoom_start=10)

for name, latitude, longitude in zip(hospital_venues['name'], hospital_venues['geocodes.main.latitude'], hospital_venues['geocodes.main.longitude']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [latitude,longitude],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_creation)

map_creation

<h3><b>Correlation #1: Hospitals and Homicide Rates</b></h3>


<h3><b>Mapping Police Venues</b></h3>
