<a href="https://colab.research.google.com/github/ALK26/Projects/blob/master/San_Francisco_Police_Dept_Incidents_2016_Folium_Maps.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Datasets:

# San Francisco Police Department Incidents for the year 2016 - Police Department Incidents from San Francisco public data portal. 
# Incidents derived from San Francisco Police Department (SFPD) Crime Incident Reporting system. 
# Updated daily, showing data for the entire year of 2016.
# Address and location has been anonymized by moving to mid-block or to an intersection.


In [2]:
# Import Primary Modules:

import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library


In [3]:
# Exploring Datasets with pandas and Matplotlib
# Toolkits: relies on pandas and Numpy for data wrangling, analysis, and visualization. primary plotting library to explore for this data is Folium.

# Generating Maps with Python - create maps for different objectives. 
# To do that, work with another Python visualization library, namely Folium. 
# Folium was developed for the sole purpose of visualizing geospatial data. 
# While other libraries are available to visualize geospatial data, such as plotly, they might have a cap on how many API calls you can make within a defined time frame. 
# Folium, on the other hand, is completely free.

# Introduction to Folium
# Folium is a powerful Python library that helps create several types of Leaflet maps. 
# The fact that the Folium results are interactive makes this library very useful for dashboard building.

# From the official Folium documentation page:
# Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. 
# Manipulate your data in Python, then visualize it in on a Leaflet map via Folium.

# Folium makes it easy to visualize data that's been manipulated in Python on an interactive Leaflet map. 
# It enables both the binding of data to a map for choropleth visualizations as well as passing Vincent/Vega visualizations as markers on the map.

# The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. 
# Folium supports both GeoJSON and TopoJSON overlays, as well as the binding of data to those overlays to create choropleth maps with color-brewer color schemes.

# install Folium
# Folium is not available by default. So, first need to install it before veing able to import it.

!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Folium installed and imported!')

/bin/bash: conda: command not found
Folium installed and imported!


In [4]:
# Generating the world map is straigtforward in Folium. 
# Simply create a Folium Map object and then you display it. What is attactive about Folium maps is that they are interactive, so you can zoom into any region of interest despite the initial zoom level.

# define the world map
world_map = folium.Map()

# display world map
world_map

In [5]:
# can customize this default definition of the world map by specifying the centre of your map and the intial zoom level.
# All locations on a map are defined by their respective Latitude and Longitude values. 
# can create a map and pass in a center of Latitude and Longitude values of [0, 0].

# For a defined center, you can also define the intial zoom level into that location when the map is rendered. 
# The higher the zoom level the more the map is zoomed into the center.

# Example: create a map centered around Canada 

# define the world map centered around Canada with a low zoom level
world_map = folium.Map(location=[56.130, -106.35], zoom_start=4)

# display world map
world_map

In [6]:
# create the map again with a higher zoom level

# define the world map centered around Canada with a higher zoom level
world_map = folium.Map(location=[56.130, -106.35], zoom_start=8)

# display world map
world_map

In [7]:
# Example: Create a map of Mexico with a zoom level of 4.

# define Mexico's geolocation coordinates
mexico_latitude = 23.6345 
mexico_longitude = -102.5528

# define the world map centered around Canada with a higher zoom level
mexico_map = folium.Map(location=[mexico_latitude, mexico_longitude], zoom_start=4)

# display world map
mexico_map

In [8]:
# Another feature of Folium is generate different map styles.

# A. Stamen Toner Maps - are high-contrast B+W (black and white) maps. perfect for data mashups and exploring river meanders and coastal zones.

# Example: create a Stamen Toner map of canada with a zoom level of 4.
world_map = folium.Map(location=[56.130, -106.35], zoom_start=4, tiles='Stamen Toner')

# display map
world_map

In [9]:
# B. Stamen Terrain Maps -  maps that feature hill shading and natural vegetation colors. showcase advanced labeling and linework generalization of dual-carriageway roads.

# create a Stamen Terrain map of Canada with zoom level 4.
world_map = folium.Map(location=[56.130, -106.35], zoom_start=4, tiles='Stamen Terrain')

# display map
world_map

In [10]:
# Example: Create a map of Mexico to visualize its hill shading and natural vegetation. Use a zoom level of 6.

# define Mexico's geolocation coordinates
mexico_latitude = 23.6345 
mexico_longitude = -102.5528

# define the world map centered around Canada with a higher zoom level
mexico_map = folium.Map(location=[mexico_latitude, mexico_longitude], zoom_start=6, tiles='Stamen Terrain')

# display world map
mexico_map

In [11]:
# Maps with Markers

# Download and import the data on San Fran Police Department incidents using pandas read_csv() method.
# Download the dataset and read it into a pandas dataframe:

df_incidents = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/Police_Department_Incidents_-_Previous_Year__2016_.csv')
print('Dataset downloaded and read into a pandas dataframe!')

Dataset downloaded and read into a pandas dataframe!


In [12]:
# take a look at the first five items in our dataset.

df_incidents.head()

Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
0,120058272,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212120
1,120058272,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212168
2,141059263,WARRANTS,WARRANT ARREST,Monday,04/25/2016 12:00:00 AM,14:59,BAYVIEW,"ARREST, BOOKED",KEITH ST / SHAFTER AV,-122.388856,37.729981,"(37.7299809672996, -122.388856204292)",14105926363010
3,160013662,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016 12:00:00 AM,23:50,TENDERLOIN,NONE,JONES ST / OFARRELL ST,-122.412971,37.785788,"(37.7857883766888, -122.412970537591)",16001366271000
4,160002740,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016 12:00:00 AM,00:30,MISSION,NONE,16TH ST / MISSION ST,-122.419672,37.76505,"(37.7650501214668, -122.419671780296)",16000274071000


In [13]:
# So each row consists of 13 features:

# IncidntNum: Incident Number
# Category: Category of crime or incident
# Descript: Description of the crime or incident
# DayOfWeek: The day of week on which the incident occurred
# Date: The Date on which the incident occurred
# Time: The time of day on which the incident occurred
# PdDistrict: The police department district
# Resolution: The resolution of the crime in terms whether the perpetrator was arrested or not
# Address: The closest address to where the incident took place
# X: The longitude value of the crime location
# Y: The latitude value of the crime location
# Location: A tuple of the latitude and the longitude values
# PdId: The police department ID

# Find out how many entries there are in dataset.

df_incidents.shape

(150500, 13)

In [14]:
# So the dataframe consists of 150,500 crimes, which took place in the year 2016. 
# In order to reduce computational cost, just work with the first 100 incidents in this dataset.

# get the first 100 crimes in the df_incidents dataframe
limit = 100
df_incidents = df_incidents.iloc[0:limit, :]

# confirm that our dataframe now consists only of 100 crimes.
df_incidents.shape

(100, 13)

In [15]:
# Visualize where these crimes took place in the city of San Francisco. Will use the default style and we will initialize the zoom level to 12.

# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42

# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display the map of San Francisco
sanfran_map

In [16]:
# Superimpose the locations of the crimes onto the map. The way to do that in Folium is to create a feature group with its own features and style and then add it to the sanfran_map.

# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add incidents to map
sanfran_map.add_child(incidents)

In [17]:
# folium.vector_layers.CircleMarker()

# can also add some pop-up text that would get displayed when hover over a marker. 
# make each marker display the category of the crime when hovered over.

# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add pop-up text to each marker on the map
latitudes = list(df_incidents.Y)
longitudes = list(df_incidents.X)
labels = list(df_incidents.Category)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(sanfran_map)    

# add incidents to map
sanfran_map.add_child(incidents)

In [18]:
# Now able to know what crime category occurred at each marker.

# If the map to be so congested will all these markers, there are two remedies to this problem. 
# The simpler solution is to remove these location markers and just add the text to the circle markers themselves as follows:

# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# loop through the 100 crimes and add each to the map
for lat, lng, label in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(sanfran_map)

# show map
sanfran_map

In [19]:
# The other proper remedy is to group the markers into different clusters. 
# Each cluster is then represented by the number of crimes in each neighborhood. 
# These clusters can be thought of as pockets of San Francisco which can then analyze separately.

# To implement this, start off by instantiating a MarkerCluster object and adding all the data points in the dataframe to this object.

from folium import plugins

# start again with a clean copy of the map of San Francisco
sanfran_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(sanfran_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# display map
sanfran_map

# Notice how when zoom out all the way, all markers are grouped into one cluster, the global cluster, of 100 markers or crimes, which is the total number of crimes in our dataframe.
# Once you start zooming in, the global cluster will start breaking up into smaller clusters. Zooming in all the way will result in individual markers.