# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera
### By: Matias Garib

This Jupyter Notebook contains all the code and brief comments of the Coursera Capstone project. The full report will be accessible in the following Github Repository: https://github.com/MatiasGarib/Coursera_Capstone

## Table of contents
* [Introduction: Business Problem]
* [Importing Datasets]
* [Data Cleaning]
* [Methodology]
* [Analysis]
* [Results and Discussion]
* [Conclusion]

--------------------
<h2> Introduction <h2>


People want to start going out and visiting restaurants, but they want to visit places with the best hygiene practices. The questions we want to answer, for the city of San Francisco, are: which are the cleanest restaurants in each neighborhood? Which are the safest neighborhoods to go out to eat?

--------------------

-----------------------------
<h2> Importing Datasets <h2>

1. The first dataset to be used consists of a **GeoJSON file with the names and boundaries of 92 San Francisco neighborhoods (GeoJSON)** 
2. **Foursquare APIs (URI)**
3. City of San Francisco Health Department’s **hygiene inspection program (CSV)** 

-----------------------------


In [356]:
pip install sodapy

Note: you may need to restart the kernel to use updated packages.


In [317]:
pip install fuzzy_pandas

Note: you may need to restart the kernel to use updated packages.


In [1]:
#Import required libraries
import pandas as pd
import numpy as np
import requests
import folium
import fuzzy_pandas as fpd
from sodapy import Socrata



<h3>SF neighborhoods and Hygiene Data<h3>

The neighborhoods and hygiene inspection datasets are easily accessible thanks to the Socrata API provided by the San Francisco Government

In [52]:
#Import Hygiene and Nhoods dataframes
client = Socrata("data.sfgov.org", None)
results = client.get("pyih-qa8i", limit=60000)
hygiene_df=pd.DataFrame.from_records(results)

nhoods=client.get("743h-p4bq", limit=60000)
nhoods_df=pd.DataFrame.from_records(nhoods)



In [53]:
print(hygiene_df.shape)
print(nhoods_df.shape)

(53973, 23)
(92, 4)


In [54]:
nhoods_df.head(2)

Unnamed: 0,sfar_distr,the_geom,nbrhood,nid
0,District 6 - Central North,"{'type': 'MultiPolygon', 'coordinates': [[[[-1...",Alamo Square,6e
1,District 6 - Central North,"{'type': 'MultiPolygon', 'coordinates': [[[[-1...",Anza Vista,6a


In [55]:
hygiene_df.head(2)

Unnamed: 0,business_id,business_name,business_address,business_city,business_state,business_postal_code,inspection_id,inspection_date,inspection_type,violation_id,...,inspection_score,business_latitude,business_longitude,business_location,:@computed_region_fyvs_ahh9,:@computed_region_p5aj_wyqh,:@computed_region_rxqg_mtj9,:@computed_region_yftq_j783,:@computed_region_bh8s_q3mv,:@computed_region_ajp5_b2md
0,69618,Fancy Wheatfield Bakery,1362 Stockton St,San Francisco,CA,94133,69618_20190304,2019-03-04T00:00:00.000,Complaint,69618_20190304_103130,...,,,,,,,,,,
1,97975,BREADBELLY,1408 Clement St,San Francisco,CA,94118,97975_20190725,2019-07-25T00:00:00.000,Routine - Unscheduled,97975_20190725_103124,...,96.0,,,,,,,,,


<h3> Foursquare <h3>

We will now use the Foursquare API to search each neighborhoods restaurants

In [56]:
#Setup my Foursquare account
client_id = 'U0BHFR2CGBOER0NS2E3LDULEVT032SXA3KVWLR2U1RTQBJCV' # your Foursquare ID
client_secret = 'WRRQIHUGH45BSIKD4HCNE5ZXRNAK3E1JJNIXVNRVBNYLZYEC' # your Foursquare Secret
version = '20180605' # Foursquare API version
category= '4d4b7105d754a06374d81259' #Food Category
limit=1000


print('Your credentails:')
print('CLIENT_ID: ' + client_id)
print('CLIENT_SECRET:' + client_secret)

Your credentails:
CLIENT_ID: U0BHFR2CGBOER0NS2E3LDULEVT032SXA3KVWLR2U1RTQBJCV
CLIENT_SECRET:WRRQIHUGH45BSIKD4HCNE5ZXRNAK3E1JJNIXVNRVBNYLZYEC


In [57]:
#I define a function that will get venues from all the Neighborhoods listed in our nhoods dataframe
def getVenuesLoc(names, radius=1000):
    
    venues_list=[]
    unexplored_nhoods=[]
    explored_nhoods=[]
    for name in names:
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&near={},San Francisco, CA&categoryId={}&radius={}&limit={}'.format(
        client_id,
        client_secret,
        version,
        name,
        category,
        radius, 
        limit)
            
        # make the GET request for neighbourhoods that don't throw error
        results = requests.get(url).json()
        if 'errorType' in results['meta']:
            print("Couldn't get venues from:", name)
            unexplored_nhoods.append(name)
        else:
            print(name)
            explored_nhoods.append(name)
            results = results["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
            venues_list.append([(
            name,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
            nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
            nearby_venues.columns = ['Neighbourhood', 
                  'Venue', 
                  'Venue_Latitude', 
                  'Venue_Longitude', 
                  'Venue_Category']
    
    return(nearby_venues, explored_nhoods, unexplored_nhoods)

Becuase there are certain neighborhoods grouped, we ungroup them to apply the function

In [58]:
nhood_names=[]
for name in nhoods_df['nbrhood']:
    if '/' in name:
        split_name=name.split('/',1)
        nhood_names.append(split_name[0].strip())
        nhood_names.append(split_name[1].strip())
    elif '/' not in name:
        nhood_names.append(name)
    

In [59]:
sf_venues, explored_nhoods, unexplored_nhoods = getVenuesLoc(nhood_names)

Alamo Square
Anza Vista
Balboa Terrace
Couldn't get venues from: Bayview
Bernal Heights
Buena Vista Park
Ashbury Heights
Couldn't get venues from: Central Richmond
Central Sunset
Clarendon Heights
Couldn't get venues from: Corona Heights
Cow Hollow
Crocker Amazon
Couldn't get venues from: Diamond Heights
Downtown
Duboce Triangle
Couldn't get venues from: Eureka Valley
Couldn't get venues from: Dolores Heights
Excelsior
Financial District
Couldn't get venues from: Barbary Coast
Couldn't get venues from: Yerba Buena
Forest Hill
Couldn't get venues from: Forest Hills Extension
Forest Knolls
Glen Park
Golden Gate Heights
Golden Gate Park
Haight Ashbury
Hayes Valley
Hunters Point
Ingleside
Ingleside Heights
Ingleside Terrace
Couldn't get venues from: Inner Mission
Inner Parkside
Couldn't get venues from: Inner Richmond
Inner Sunset
Jordan Park
Laurel Heights
Couldn't get venues from: Lake Street
Monterey Heights
Couldn't get venues from: Lake Shore
Lakeside
Lone Mountain
Lower Pacific Heigh

In [60]:
print(sf_venues.shape)
print(hygiene_df.shape)
print(nhoods_df.shape)

(6120, 5)
(53973, 23)
(92, 4)


------------------------------------------
<h2> Data Cleaning  <h2>


By now we have thee data sets imported:
- SF Neighborhoods (92,4)
- Hygiene inspections (53973, 23)
- SF Venues from Foursquare API (5029)

Our approach will be the following:

1. Clean the Hygiene Inspections df by removing all inspections without a inspection score and removing duplicates (many restaurants are checked upon more than once, we will keep the latest inspection score)

2. Append the Hygien columns to our Venues df. Here we notice that names can vary a bit between df so we use *Fuzzy Merge* function to match similar but not exact strings.

------------------------------------------

In [61]:
hygiene_df.columns

Index(['business_id', 'business_name', 'business_address', 'business_city',
       'business_state', 'business_postal_code', 'inspection_id',
       'inspection_date', 'inspection_type', 'violation_id',
       'violation_description', 'risk_category', 'business_phone_number',
       'inspection_score', 'business_latitude', 'business_longitude',
       'business_location', ':@computed_region_fyvs_ahh9',
       ':@computed_region_p5aj_wyqh', ':@computed_region_rxqg_mtj9',
       ':@computed_region_yftq_j783', ':@computed_region_bh8s_q3mv',
       ':@computed_region_ajp5_b2md'],
      dtype='object')

In [62]:
#Removing Nan Values and keeping the important columns
hygiene_df=hygiene_df[['business_name','business_address','inspection_date', 'inspection_type','violation_description', 'risk_category','inspection_score' ]]
hygiene_df = hygiene_df[hygiene_df['inspection_score'].notna()]
hygiene_df

Unnamed: 0,business_name,business_address,inspection_date,inspection_type,violation_description,risk_category,inspection_score
1,BREADBELLY,1408 Clement St,2019-07-25T00:00:00.000,Routine - Unscheduled,Inadequately cleaned or sanitized food contact...,Moderate Risk,96
2,Hakkasan San Francisco,1 Kearny St,2018-04-18T00:00:00.000,Routine - Unscheduled,Inadequate and inaccessible handwashing facili...,Moderate Risk,88
4,Tselogs,552 Jones St,2018-04-12T00:00:00.000,Routine - Unscheduled,Improper thawing methods,Moderate Risk,94
8,"The Estate Kitchen, LLC",799 Bryant St,2018-04-16T00:00:00.000,Routine - Unscheduled,Improper food storage,Low Risk,86
9,Beloved Cafe,3338 24th St,2018-05-02T00:00:00.000,Routine - Unscheduled,Low risk vermin infestation,Low Risk,96
...,...,...,...,...,...,...,...
53967,El Gran Taco Loco,4591 Mission St.,2019-05-06T00:00:00.000,Routine - Unscheduled,Insufficient hot water or running water,Moderate Risk,76
53968,Blue Bottle Coffee,2 South Park,2019-05-06T00:00:00.000,Routine - Unscheduled,Inadequately cleaned or sanitized food contact...,Moderate Risk,80
53970,Philz Coffee,300 Folsom St,2019-05-06T00:00:00.000,Routine - Unscheduled,Foods not protected from contamination,Moderate Risk,92
53971,El Gran Taco Loco,4591 Mission St.,2019-05-06T00:00:00.000,Routine - Unscheduled,Inadequate food safety knowledge or lack of ce...,Moderate Risk,76


In [63]:
#Removing duplicates from the hygiene df
hygiene_df=hygiene_df.sort_values('inspection_date', ascending=False).drop_duplicates(subset='business_name', keep='first')
hygiene_df

Unnamed: 0,business_name,business_address,inspection_date,inspection_type,violation_description,risk_category,inspection_score
14552,Frisco Fried,5176 03rd St,2019-10-03T00:00:00.000,Routine - Unscheduled,Low risk vermin infestation,Low Risk,92
6787,Cafe Majestic,1500 SUTTER St,2019-10-03T00:00:00.000,Routine - Unscheduled,Low risk vermin infestation,Low Risk,84
13344,Sears Fine Food,439 Powell St,2019-10-03T00:00:00.000,Routine - Unscheduled,Unapproved or unmaintained equipment or utensils,Low Risk,91
11908,Tokyo Express,160 Spear St Lobby ID,2019-10-03T00:00:00.000,Routine - Unscheduled,Foods not protected from contamination,Moderate Risk,87
14354,SHERIDAN ELEMENTARY SCHOOL,431 CAPITOL Ave,2019-10-03T00:00:00.000,Routine - Unscheduled,Inadequate food safety knowledge or lack of ce...,Moderate Risk,92
...,...,...,...,...,...,...,...
25331,Sally's Restaurant and Deli,300 De Haro St #332,2016-10-06T00:00:00.000,Routine - Unscheduled,Unclean or degraded floors walls or ceilings,Low Risk,71
27998,CATER THYME,1 UNITED NATIONS Plz,2016-10-05T00:00:00.000,Routine - Unscheduled,,,100
24684,Way To Life Foods,1 United Nations Plaza,2016-10-05T00:00:00.000,Routine - Unscheduled,,,100
24270,Hey Hey Gourmet,1 United Nations Plaza,2016-10-05T00:00:00.000,Routine - Unscheduled,,,100


In [64]:
#Fuzzy Merge between the two datasets
sf_venues=fpd.fuzzy_merge(sf_venues, hygiene_df,
                        keep='all',
                        left_on=['Venue'],
                        right_on=['business_name'],
                        method='metaphone',
                        ignore_nonalpha=True,
                        ignore_nonlatin=True,
                        ignore_case=True,
                        join='inner')

sf_venues.head()

Unnamed: 0,Neighbourhood,Venue,Venue_Latitude,Venue_Longitude,Venue_Category,business_name,business_address,inspection_date,inspection_type,violation_description,risk_category,inspection_score
0,Alamo Square,Little Star Pizza,37.777489,-122.438281,Pizza Place,Little Star Pizza,846 Divisadero,2018-04-24T00:00:00.000,Routine - Unscheduled,,,100
1,Alamo Square,Brenda's Meat & Three,37.778265,-122.438584,Southern / Soul Food Restaurant,Brendas Meat & Three,919 DIVISADERO ST,2019-03-13T00:00:00.000,Routine - Unscheduled,Unapproved or unmaintained equipment or utensils,Low Risk,92
2,Alamo Square,The Mill,37.776425,-122.43797,Bakery,The Mill,736 DIVISADERO St,2019-04-11T00:00:00.000,Routine - Unscheduled,Unapproved or unmaintained equipment or utensils,Low Risk,88
3,Alamo Square,Jane the Bakery,37.783797,-122.434283,Bakery,Jane the Bakery,1875 Geary Blvd,2019-07-03T00:00:00.000,Routine - Unscheduled,Unclean or unsanitary food contact surfaces,High Risk,87
4,Alamo Square,The Progress,37.783745,-122.432972,American Restaurant,The Progress,1525 Fillmore St,2019-02-14T00:00:00.000,Routine - Unscheduled,Moderate risk food holding temperature,Moderate Risk,90


In [65]:
sf_venues.shape

(5223, 12)

In [70]:
print(sf_venues['risk_category'].isna().sum())
print(sf_venues['violation_description'].isna().sum())
print((sf_venues['inspection_score']==100).sum())

312
312
286


In [68]:
# We notice how columns with no violation description and no risk category assigned are the ones with inspection score 100
sf_venues['inspection_score'] = sf_venues['inspection_score'].astype(int)
sf_venues[(sf_venues['violation_description'].isna()) & (sf_venues['risk_category'].isna()) & (sf_venues['inspection_score']==100)]

Unnamed: 0,Neighbourhood,Venue,Venue_Latitude,Venue_Longitude,Venue_Category,business_name,business_address,inspection_date,inspection_type,violation_description,risk_category,inspection_score
0,Alamo Square,Little Star Pizza,37.777489,-122.438281,Pizza Place,Little Star Pizza,846 Divisadero,2018-04-24T00:00:00.000,Routine - Unscheduled,,,100
22,Alamo Square,Lady Falcon Coffee Club,37.775969,-122.433959,Food Truck,Lady Falcon Coffee Club,Beach Chalet Soccer Field Parking Lot,2017-04-15T00:00:00.000,Routine - Unscheduled,,,100
26,Alamo Square,Zaytoon,37.775185,-122.437896,Mediterranean Restaurant,ST. ANNE,1320 14th Ave,2018-11-30T00:00:00.000,Routine - Unscheduled,,,100
39,Alamo Square,Gardenias,37.786109,-122.432710,Restaurant,Gratta Wines,5273 B 03rd St,2019-05-20T00:00:00.000,Routine - Unscheduled,,,100
83,Anza Vista,Little Star Pizza,37.777489,-122.438281,Pizza Place,Little Star Pizza,846 Divisadero,2018-04-24T00:00:00.000,Routine - Unscheduled,,,100
...,...,...,...,...,...,...,...,...,...,...,...,...
5165,Nob Hill,Acquerello,37.791669,-122.421407,Italian Restaurant,Acquerello,1722 Sacramento St,2018-11-07T00:00:00.000,Routine - Unscheduled,,,100
5171,Nob Hill,1760,37.793206,-122.421211,New American Restaurant,903,1566 Carroll Ave,2019-04-18T00:00:00.000,Routine - Unscheduled,,,100
5175,Nob Hill,Another Cafe,37.790169,-122.415404,Café,Another Cafe,1191 Pine St,2019-06-17T00:00:00.000,Routine - Unscheduled,,,100
5195,Nob Hill,Rue Lepic,37.790913,-122.410770,French Restaurant,Rue Lepic,900 Pine St,2019-06-03T00:00:00.000,Routine - Unscheduled,,,100


In [81]:
#So we change them to No Risk and No Violation
sf_venues['violation_description'] = np.where((sf_venues['inspection_score']==100) , 'No Violation', sf_venues['violation_description'])
sf_venues['risk_category'] = np.where((sf_venues['inspection_score']==100) , 'No risk', sf_venues['risk_category'])
sf_venues

Unnamed: 0,Neighbourhood,Venue,Venue_Latitude,Venue_Longitude,Venue_Category,business_name,business_address,inspection_date,inspection_type,violation_description,risk_category,inspection_score
0,Alamo Square,Little Star Pizza,37.777489,-122.438281,Pizza Place,Little Star Pizza,846 Divisadero,2018-04-24T00:00:00.000,Routine - Unscheduled,No Violation,No risk,100
1,Alamo Square,Brenda's Meat & Three,37.778265,-122.438584,Southern / Soul Food Restaurant,Brendas Meat & Three,919 DIVISADERO ST,2019-03-13T00:00:00.000,Routine - Unscheduled,Unapproved or unmaintained equipment or utensils,Low Risk,92
2,Alamo Square,The Mill,37.776425,-122.437970,Bakery,The Mill,736 DIVISADERO St,2019-04-11T00:00:00.000,Routine - Unscheduled,Unapproved or unmaintained equipment or utensils,Low Risk,88
3,Alamo Square,Jane the Bakery,37.783797,-122.434283,Bakery,Jane the Bakery,1875 Geary Blvd,2019-07-03T00:00:00.000,Routine - Unscheduled,Unclean or unsanitary food contact surfaces,High Risk,87
4,Alamo Square,The Progress,37.783745,-122.432972,American Restaurant,The Progress,1525 Fillmore St,2019-02-14T00:00:00.000,Routine - Unscheduled,Moderate risk food holding temperature,Moderate Risk,90
...,...,...,...,...,...,...,...,...,...,...,...,...
5218,Nob Hill,Osso Steakhouse,37.791447,-122.413530,Steakhouse,Osso Steakhouse,1177 California St,2019-06-03T00:00:00.000,Routine - Unscheduled,Inadequately cleaned or sanitized food contact...,Moderate Risk,96
5219,Nob Hill,Batter Bakery,37.789551,-122.420776,Bakery,Batter Bakery,1517 Pine St,2018-08-21T00:00:00.000,Routine - Unscheduled,Wiping cloths not clean or properly stored or ...,Low Risk,98
5220,Nob Hill,Nobhill Pizza & Shawerma,37.790767,-122.419747,Pizza Place,Nobhill Pizza & Shawerma,1534 California St,2019-09-23T00:00:00.000,Routine - Unscheduled,High risk food holding temperature,High Risk,93
5221,Nob Hill,Kasa Indian Eatery,37.789655,-122.420449,Indian Restaurant,Kasa Indian Eatery,4001 18th St,2019-09-23T00:00:00.000,Routine - Unscheduled,Insufficient hot water or running water,Moderate Risk,86


In [91]:
# We finally drop the remainding Nan values to get our final df
sf_venues = sf_venues[sf_venues['violation_description'].notna()]
sf_venues = sf_venues[sf_venues['risk_category'].notna()]
sf_venues

Unnamed: 0,Neighbourhood,Venue,Venue_Latitude,Venue_Longitude,Venue_Category,business_name,business_address,inspection_date,inspection_type,violation_description,risk_category,inspection_score
0,Alamo Square,Little Star Pizza,37.777489,-122.438281,Pizza Place,Little Star Pizza,846 Divisadero,2018-04-24T00:00:00.000,Routine - Unscheduled,No Violation,No risk,100
1,Alamo Square,Brenda's Meat & Three,37.778265,-122.438584,Southern / Soul Food Restaurant,Brendas Meat & Three,919 DIVISADERO ST,2019-03-13T00:00:00.000,Routine - Unscheduled,Unapproved or unmaintained equipment or utensils,Low Risk,92
2,Alamo Square,The Mill,37.776425,-122.437970,Bakery,The Mill,736 DIVISADERO St,2019-04-11T00:00:00.000,Routine - Unscheduled,Unapproved or unmaintained equipment or utensils,Low Risk,88
3,Alamo Square,Jane the Bakery,37.783797,-122.434283,Bakery,Jane the Bakery,1875 Geary Blvd,2019-07-03T00:00:00.000,Routine - Unscheduled,Unclean or unsanitary food contact surfaces,High Risk,87
4,Alamo Square,The Progress,37.783745,-122.432972,American Restaurant,The Progress,1525 Fillmore St,2019-02-14T00:00:00.000,Routine - Unscheduled,Moderate risk food holding temperature,Moderate Risk,90
...,...,...,...,...,...,...,...,...,...,...,...,...
5218,Nob Hill,Osso Steakhouse,37.791447,-122.413530,Steakhouse,Osso Steakhouse,1177 California St,2019-06-03T00:00:00.000,Routine - Unscheduled,Inadequately cleaned or sanitized food contact...,Moderate Risk,96
5219,Nob Hill,Batter Bakery,37.789551,-122.420776,Bakery,Batter Bakery,1517 Pine St,2018-08-21T00:00:00.000,Routine - Unscheduled,Wiping cloths not clean or properly stored or ...,Low Risk,98
5220,Nob Hill,Nobhill Pizza & Shawerma,37.790767,-122.419747,Pizza Place,Nobhill Pizza & Shawerma,1534 California St,2019-09-23T00:00:00.000,Routine - Unscheduled,High risk food holding temperature,High Risk,93
5221,Nob Hill,Kasa Indian Eatery,37.789655,-122.420449,Indian Restaurant,Kasa Indian Eatery,4001 18th St,2019-09-23T00:00:00.000,Routine - Unscheduled,Insufficient hot water or running water,Moderate Risk,86


NICE! We have 5196 restaurants in our DF, all of which contain location information (neighborhood, lat , long) as well as hygiene data. We can map them out to finish this section

In [92]:
sf_center = [37.7749, -122.4194]
sf_map = folium.Map(location=sf_center, zoom_start=13)
folium.Marker(sf_center, popup='City Center').add_to(sf_map)
for name, lat, lng in zip(sf_venues.Venue, sf_venues.Venue_Latitude, sf_venues.Venue_Longitude):
    color = 'blue'
    folium.CircleMarker([lat, lng], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(sf_map)
sf_map

------------------------------------------
<h2> Methodology  <h2>

This project's main goal is to classify each of San Francisco's neghborhoods according to their restaurants infection risk.

(1) We've already imported and cleaned the datasets and now have a large datasets of restaurants and their hygiene inspection results

(2) We'll first perform a visual inspection of the data, visualizing for all of San Francisco which are high restaurant density spots. We will also create cloropleth maps with the inspection score points per neighborhood.

(3) The final analysis will focus on the neighborhood divisions. We will create indicators of risk density to cluster neighborhoods according to their restaurants hygiene practices and denisty. The best neighborhoods to go out to dinner will be the ones where there are most restaurants and where these have high hygiene standards. 


------------------------------------------

<h3> Visual Inspection: Restaurant Heatmaps  <h3>

In [130]:
#We first pre process the data for inspection
restaurants_latlng=sf_venues[['Venue_Latitude','Venue_Longitude']]
restaurants_latlng['Venue_Latitude']=restaurants_latlng['Venue_Latitude'].astype(float)
restaurants_latlng['Venue_Longitude']=restaurants_latlng['Venue_Longitude'].astype(float)
restaurants_latlng_list=restaurants_latlng.values.tolist()

In [167]:
#Heat Map of all the restaurants analyzed

from folium import plugins
from folium.plugins import HeatMap

sf_geo = 'https://data.sfgov.org/resource/743h-p4bq.geojson'

def boroughs_style(feature):
    return { 'color': 'blue', 'fill': False, 'stroke': True, 'weight': 0.5 }

sf_map = folium.Map(location=sf_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(sf_map)
HeatMap(restaurants_latlng_list, min_opacity=0.2, max_val=0.1, radius=15).add_to(sf_map)
folium.GeoJson(sf_geo, name='json', style_function=boroughs_style,).add_to(sf_map)
sf_map

In [186]:

sf_venues['Venue_Latitude']=sf_venues['Venue_Latitude'].astype(float)
sf_venues['Venue_Longitude']=sf_venues['Venue_Longitude'].astype(float)

#Pre Processing to produce heat maps of restaurants of none, low, medium and high risk
no_risk=sf_venues.loc[sf_venues['risk_category'] == 'No risk']
low_risk=sf_venues.loc[sf_venues['risk_category'] == 'Low Risk']
mod_risk=sf_venues.loc[sf_venues['risk_category'] == 'Moderate Risk']
high_risk=sf_venues.loc[sf_venues['risk_category'] == 'High Risk']

#We then create a list to input each heatmap

#No risk
norisk_latlng=no_risk[['Venue_Latitude','Venue_Longitude']]
norisk_latlng_list=norisk_latlng.values.tolist()

#Low Risk
lowrisk_latlng=low_risk[['Venue_Latitude','Venue_Longitude']]
lowrisk_latlng_list=lowrisk_latlng.values.tolist()

#Moderate Risk
modrisk_latlng=mod_risk[['Venue_Latitude','Venue_Longitude']]
modrisk_latlng_list=modrisk_latlng.values.tolist()

#High Risk
highrisk_latlng=high_risk[['Venue_Latitude','Venue_Longitude']]
highrisk_latlng_list=highrisk_latlng.values.tolist()

In [None]:
# No risk heatmap

In [193]:
norisk_map = folium.Map(location=sf_center, zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(norisk_map)
HeatMap(norisk_latlng_list, min_opacity=0.2, max_val=0.1, radius=15).add_to(norisk_map)
folium.GeoJson(sf_geo, name='json', style_function=boroughs_style,).add_to(norisk_map)
norisk_map

In [194]:
lowrisk_map = folium.Map(location=sf_center, zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(lowrisk_map)
HeatMap(lowrisk_latlng_list, min_opacity=0.2, max_val=0.1, radius=15).add_to(lowrisk_map)
folium.GeoJson(sf_geo, name='json', style_function=boroughs_style,).add_to(lowrisk_map)
lowrisk_map

In [195]:
modrisk_map = folium.Map(location=sf_center, zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(modrisk_map)
HeatMap(modrisk_latlng_list, min_opacity=0.2, max_val=0.1, radius=15).add_to(modrisk_map)
folium.GeoJson(sf_geo, name='json', style_function=boroughs_style,).add_to(modrisk_map)
modrisk_map

In [196]:
highrisk_map = folium.Map(location=sf_center, zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(highrisk_map)
HeatMap(highrisk_latlng_list, min_opacity=0.2, max_val=0.1, radius=15).add_to(highrisk_map)
folium.GeoJson(sf_geo, name='json', style_function=boroughs_style,).add_to(highrisk_map)
highrisk_map

In [199]:
# create a numpy array of length 6 and has linear spacing from the minium total immigration to the maximum total immigration
threshold_scale = np.linspace(sf_venues['inspection_score'].min(),
                              sf_venues['inspection_score'].max(),
                              6, dtype=int)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 1 # make sure that the last value of the list is greater than the maximum immigration

# let Folium determine the scale.
cloro_map = folium.Map(location=sf_center, zoom_start=12, tiles='Mapbox Bright')
cloro_map.choropleth(
    geo_data=sf_geo,
    data=sf_venues,
    columns=['Neighbourhood', 'inspection_score'],
    key_on='feature.properties.name',
    threshold_scale=threshold_scale,
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Immigration to Canada',
    reset=True
)
cloro_map