# [SOC-88] Mapping Crime in San Francisco

### Professor David Harding

## Table of Contents

[Introduction](#intro)

[The Data](#data)

[Base Maps](#1)
   - [Question 1](#q1)

[Markers](#2)
   - [Question 2](#q2)
   - [Question 3](#q3)
   - [Question 4](#q4)
    
[Choropleth Maps](#3)
   - [Question 5](#q5)
   - [Question 6](#q6)



## Introduction <a id='intro'></a>

In this homework, you will practice different data mapping techniques you learned about in lecture and lab. The data has been taken from [SF Data](https://data.sfgov.org/), San Francisco's open data site. 

There are two main data files used in this assignment: **SFPD_incidents_2020.csv** and **sfpd-police-districts.geojson**. 

The first file, **SFPD_incidents_2020.csv**, has records of all police incidents that took place in 2016. Its columns contain information such as the latitude-longitude information of incidents, police precinct and neighborhood in which the incident occurred, time and date of the report, type of crime, etc. 

The second file, **sfpd-police-districts.geojson**, contains geographic information about the boundaries of San Francisco police districts. These boundaries are necessary for making choropleth plots.

---


We will begin by running a code cell that will load the libraries you'll be using.

In [1]:
# load the necessary software
from datascience import *
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import datetime
import folium
import json

## The Data <a id='data'></a>


Our main dataset comes from the [city of San Francisco's open data portal](https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-2018-to-Present/wg3w-h783). 

Run the next cell to load the incident data.

In [2]:
# load SF Police Incident Data, 2020
incidents = Table().read_table('Data/SFPD_incidents_2020.csv')
incidents.show(5)

Incident Datetime,Incident Date,Incident Time,Incident Year,Incident Day of Week,Report Datetime,Row ID,Incident ID,Incident Number,CAD Number,Report Type Code,Report Type Description,Filed Online,Incident Code,Incident Category,Incident Subcategory,Incident Description,Resolution,Intersection,CNN,Police District,Analysis Neighborhood,Supervisor District,Latitude,Longitude,point
2020/01/07 04:00:00 AM,2020/01/07,04:00,2020,Tuesday,2020/02/06 09:41:00 AM,90012406372,900124,206027460,,II,Coplogic Initial,1.0,6372,Larceny Theft,Larceny Theft - Other,"Theft, Other Property, $50-$200",Open or Active,,,Out of SF,,,,,
2020/01/31 03:00:00 PM,2020/01/31,15:00,2020,Friday,2020/01/31 03:00:00 PM,89876863010,898768,190783352,,IS,Initial Supplement,,63010,Warrant,Other,"Warrant Arrest, Local SF Warrant",Cite or Arrest Adult,,,Out of SF,,,,,
2020/04/07 10:00:00 AM,2020/04/07,10:00,2020,Tuesday,2020/04/07 09:39:00 PM,95329775025,953297,200227624,200983000.0,IS,Initial Supplement,,75025,Non-Criminal,Non-Criminal,Search Warrant Service,Cite or Arrest Adult,,,Out of SF,,,,,
2020/02/12 02:00:00 PM,2020/02/12,14:00,2020,Wednesday,2020/02/13 02:30:00 PM,90267006244,902670,206033291,,II,Coplogic Initial,1.0,6244,Larceny Theft,Larceny - From Vehicle,"Theft, From Locked Vehicle, >$950",Open or Active,,,Northern,,,,,
2020/01/13 08:10:00 PM,2020/01/13,20:10,2020,Monday,2020/01/13 09:29:00 PM,90358806224,903588,206035407,,II,Coplogic Initial,1.0,6224,Larceny Theft,Larceny - From Vehicle,"Theft, From Unlocked Vehicle, >$950",Open or Active,,,Richmond,,,,,


Each row in this table represents a different incident reported to the San Francisco Police Department (SFPD). Most of the columns are fairly intuitive, but we'll narrow down to a few that are of particular interest:

- `Incident Number`and `Incident ID` are ID numbers to identify each different incident, used for organization within the police department.

- `Incident Category` classifies the incident as of 49 types. We can see all possible categories using the `group` method.

In [3]:
# show the unique categories from 'Incident Category'
incidents.group("Incident Category").show()

Incident Category,count
Arson,197
Assault,3347
Burglary,4078
Case Closure,176
Civil Sidewalks,25
Courtesy Report,137
Disorderly Conduct,982
Drug Offense,1356
Drug Violation,19
Embezzlement,85


- `Incident Description` gives more information on what occurred during the incident. You can think of the `Incident Description` as a subtype of the categories in `Incident Category`. Not only are there too many unique descriptions to list them all, but it's too descriptive for what we want to analyze later in this homework. It makes more sense to select a particular category and then list the possible descriptions for only that category. In the next cell, you can view the possible `Incident Description` values for incidents falling under the category `"Non-Criminal"`.

In [4]:
# show the unique incident descriptions for the 'Non-Criminal' category
incidents.where("Incident Category", "Non-Criminal").group('Incident Description').show()

Incident Description,count
Aided Case,528
Aided Case -Property for Destruction,27
"Aided Case, Injured or Sick Person",66
"Aided Case, Sick Person",9
"Aided case, Naloxone Deployment",132
"Dog, Barking",1
"Dog, Bite or Attack",104
"Dog, Stray or Vicious",4
"Firearm, Turned In by Public",37
Found Property,763


- `Resolution` gives information about what the police did for the respective incident. Once again, we can view all possible resolution options using the `group` method

In [5]:
# show the unique resolutions
incidents.group("Resolution").show()

Resolution,count
Cite or Arrest Adult,11945
Exceptional Adult,185
Open or Active,45652
Unfounded,260


- Finally, `Latitude`, `Longitude`, and `point` give geographic data about the incident. Think of `Latitude` as your `Y` and `Longitude` as your `X`. `point` has both the latitude and longitude together as a pair of integers.

## Base Maps <a id='1'></a>

### Question 1 <a id='q1'></a>
Create a base map centered on San Francisco. Adjust the zoom start to get an appropriate view of San Francisco and add appropriate map tiles.

*Note: we're going to be creating several maps in this homework, so it's easier to create variables for the starting coordinates, zoom, and tiles, and use them over and over again, rather than rewrite them in every map we make.*

In [None]:
# add coordinates for San Francisco
sf_coordinates = [37.773972, -122.431297]
sf_zoom_start = 10
sf_tiles = ...

# create a map of San Francisco
sf_map = folium.Map(location=..., zoom_start=..., tiles=...)
sf_map

## Markers <a id='2'></a>

In the next two cells, we've isolated two police incidents for you.

In [6]:
incidentA = incidents.where("Incident ID", 922733)
incidentA

Incident Datetime,Incident Date,Incident Time,Incident Year,Incident Day of Week,Report Datetime,Row ID,Incident ID,Incident Number,CAD Number,Report Type Code,Report Type Description,Filed Online,Incident Code,Incident Category,Incident Subcategory,Incident Description,Resolution,Intersection,CNN,Police District,Analysis Neighborhood,Supervisor District,Latitude,Longitude,point
2020/04/26 09:00:00 PM,2020/04/26,21:00,2020,Sunday,2020/04/27 03:20:00 PM,92273306244,922733,200263101,201182000.0,II,Initial,,6244,Larceny Theft,Larceny - From Vehicle,"Theft, From Locked Vehicle, >$950",Open or Active,HARKNESS AVE \ SAN BRUNO AVE,20556000.0,Ingleside,Visitacion Valley,10,37.7179,-122.4,"(37.717906309071864, -122.40013147694049)"


In [7]:
incidentB = incidents.where("Incident ID", 886770)
incidentB

Incident Datetime,Incident Date,Incident Time,Incident Year,Incident Day of Week,Report Datetime,Row ID,Incident ID,Incident Number,CAD Number,Report Type Code,Report Type Description,Filed Online,Incident Code,Incident Category,Incident Subcategory,Incident Description,Resolution,Intersection,CNN,Police District,Analysis Neighborhood,Supervisor District,Latitude,Longitude,point
2020/01/01 02:41:00 AM,2020/01/01,02:41,2020,Wednesday,2020/01/01 02:41:00 AM,88677026031,886770,200000389,200011000.0,II,Initial,,26031,Arson,Arson,Arson of Vehicle,Open or Active,23RD ST \ CAROLINA ST,33046000.0,Bayview,Potrero Hill,10,37.7547,-122.4,"(37.75469183949377, -122.39957958699718)"


### Question 2 <a id='q2'></a>
Create a marker for each of the above incidents, including:
* incident location
* an appropriate and informative pop-up (appears when you hover over the marker)
* an appropriate and informative tooltip (appears when you click on the marker)
* an appropriate color and type for the icon, given the type of incident

You can view the list of icon options at https://getbootstrap.com/docs/3.3/components/

In [None]:
# a clean map for the markers
marker_map = folium.Map(location=sf_coordinates, zoom_start=sf_zoom_start, tiles=sf_tiles)

# For Incident A
coordinateA = [..., ...]
popupA = ...
tooltipA = ...
folium.Marker(location=..., popup=..., 
              tooltip=..., icon=folium.Icon(color=..., icon=...)).add_to(marker_map)

# view the map
marker_map

In [None]:
# For Incident B
coordinateB = [..., ...]
popupB = ...
tooltipB = ...
folium.Marker(location=..., popup=..., 
              tooltip=..., icon=folium.Icon(color=..., icon=...)).add_to(marker_map)
marker_map

Next, we'd like to map all incidences of disorderly conduct.

First, we make a table that only contains disorderly conduct incidents.

In [8]:
# filter for just disorderly conduct
disorderly = incidents.where("Incident Category", "Disorderly Conduct")
disorderly.show(3)

Incident Datetime,Incident Date,Incident Time,Incident Year,Incident Day of Week,Report Datetime,Row ID,Incident ID,Incident Number,CAD Number,Report Type Code,Report Type Description,Filed Online,Incident Code,Incident Category,Incident Subcategory,Incident Description,Resolution,Intersection,CNN,Police District,Analysis Neighborhood,Supervisor District,Latitude,Longitude,point
2020/02/19 08:57:00 PM,2020/02/19,20:57,2020,Wednesday,2020/02/19 09:15:00 PM,96214819057,962148,200127290,200504000.0,IS,Initial Supplement,,19057,Disorderly Conduct,Intimidation,Terrorist Threats,Exceptional Adult,ANDOVER ST \ RICHLAND AVE,21200000.0,Ingleside,Bernal Heights,9,37.7356,-122.417,"(37.73560466456792, -122.41675952114524)"
2020/01/01 12:20:00 AM,2020/01/01,00:20,2020,Wednesday,2020/01/01 12:36:00 AM,88675519090,886755,200000094,200010000.0,II,Initial,,19090,Disorderly Conduct,Drunkenness,"Alcohol, Under Influence Of In Public Place",Open or Active,EDDY ST \ JONES ST,24929000.0,Tenderloin,Tenderloin,6,37.7839,-122.413,"(37.7839325760642, -122.41259527758581)"
2020/01/01 12:20:00 AM,2020/01/01,00:20,2020,Wednesday,2020/01/01 12:36:00 AM,88679519090,886795,200000094,200010000.0,IS,Initial Supplement,,19090,Disorderly Conduct,Drunkenness,"Alcohol, Under Influence Of In Public Place",Cite or Arrest Adult,EDDY ST \ JONES ST,24929000.0,Tenderloin,Tenderloin,6,37.7839,-122.413,"(37.7839325760642, -122.41259527758581)"


In [9]:
# This cell removes all the disorderly conduct incidents that didn't report a location
import math
no_loc_incidents = []
for i in range(disorderly.num_rows):
    incident = disorderly.take(i)
    if math.isnan(incident.column('Latitude').item(0)):
        no_loc_incidents.append(i)
disorderly.remove(no_loc_incidents)
disorderly.show(3)

Incident Datetime,Incident Date,Incident Time,Incident Year,Incident Day of Week,Report Datetime,Row ID,Incident ID,Incident Number,CAD Number,Report Type Code,Report Type Description,Filed Online,Incident Code,Incident Category,Incident Subcategory,Incident Description,Resolution,Intersection,CNN,Police District,Analysis Neighborhood,Supervisor District,Latitude,Longitude,point
2020/02/19 08:57:00 PM,2020/02/19,20:57,2020,Wednesday,2020/02/19 09:15:00 PM,96214819057,962148,200127290,200504000.0,IS,Initial Supplement,,19057,Disorderly Conduct,Intimidation,Terrorist Threats,Exceptional Adult,ANDOVER ST \ RICHLAND AVE,21200000.0,Ingleside,Bernal Heights,9,37.7356,-122.417,"(37.73560466456792, -122.41675952114524)"
2020/01/01 12:20:00 AM,2020/01/01,00:20,2020,Wednesday,2020/01/01 12:36:00 AM,88675519090,886755,200000094,200010000.0,II,Initial,,19090,Disorderly Conduct,Drunkenness,"Alcohol, Under Influence Of In Public Place",Open or Active,EDDY ST \ JONES ST,24929000.0,Tenderloin,Tenderloin,6,37.7839,-122.413,"(37.7839325760642, -122.41259527758581)"
2020/01/01 12:20:00 AM,2020/01/01,00:20,2020,Wednesday,2020/01/01 12:36:00 AM,88679519090,886795,200000094,200010000.0,IS,Initial Supplement,,19090,Disorderly Conduct,Drunkenness,"Alcohol, Under Influence Of In Public Place",Cite or Arrest Adult,EDDY ST \ JONES ST,24929000.0,Tenderloin,Tenderloin,6,37.7839,-122.413,"(37.7839325760642, -122.41259527758581)"


### Question 3 <a id='q3'></a>
Fill in the code below to create markers for all disorderly conduct incidents.

As in question 2, choose the appropriate coordinates, popup, tooltip, color, and icon for the type of incident.

In [None]:
# create a clean map for the disorderly conduct incidents
disorderly_map = folium.Map(location=sf_coordinates, tiles=sf_tiles, zoom_start=sf_zoom_start)

# make a marker for each disorderly conduct incident
for i in range(disorderly.num_rows):
    incidentC = disorderly.take(i)
    coordinateC = [..., ...]
    popupC = ...
    tooltipC = ...
    folium.Marker(location=..., popup=..., 
                  tooltip=..., icon=folium.Icon(color=..., icon=...)).add_to(disorderly_map)
    
# show the map
disorderly_map

### Question 4 <a id='q4'></a>

Describe the features you chose for questions 1, 2, and 3, including:
* map tiles
* marker icon
* marker color
* marker popup and tooltip

Why were those features good for the data in those questions?

*Replace this line with your answer*

## Choropleth maps <a id='3'></a>

In this section, you're going to create a choropleth map with the number of non-criminal mental health-related incidents in each police district.

First, we need to load the geojson file that gives the boundaries for the police districts. Run the next cell to load the geojson.

In [None]:
# load SFPD district boundaries
sf_districts = json.load(open('Data/sf-police-districts.geojson'))
sf_districts

You can see the districts overlaid onto the San Francisco Map by running the next cell.

In [None]:
# make the folium geojson object and add to a map of SF
m = folium.Map(sf_coordinates, zoom_start=sf_zoom_start, tiles=sf_tiles)

folium.GeoJson(
    sf_districts,
    style_function=lambda feature: {
        'fillColor': 'white',
        'color': 'blue',
        'weight': 2,
        'dashArray': '5, 5'
    }
).add_to(m)
m

To make our choropleth overlay, we must first get the counts of mental health incidents by district. First, we use the `where` method to select only the incidents that have a description of "Metal Health Detention".

In [None]:
# mental health related incidents
mental_health = incidents.where("Incident Description", are.equal_to("Mental Health Detention"))
mental_health.show(3)

Next, we use `group` to get the counts of mental health incidents per police district.

In [None]:
# get the counts of mental health incidents by district
mental_health_by_district = mental_health.group("Police District")
mental_health_by_district

And finally, we convert the counts to a DataFrame so that it works with the Folium software.

In [None]:
# convert to DataFrame
mental_health_df = mental_health_by_district.to_df()
mental_health_df['Police District'] = mental_health_df['Police District'].str.upper()
mental_health_df.head()

### Question 5 <a id='q5'></a>
Complete the following code to create a choropleth overlay showing the counts of mental health incidents for each police district. Choose an appropriate and informative:
* fill color (using a colormap)
* fill opacity
* legend name

You can find colormap options at https://matplotlib.org/gallery/color/colormap_reference.html

In [None]:
# create a clean map for the choropleth
m = folium.Map(sf_coordinates, zoom_start=sf_zoom_start, tiles=sf_tiles)

# create the choropleth overlay
folium.Choropleth(
    geo_data=...,
    data=...,
    columns=['...','...'],
    key_on='feature.properties.district',
    fill_color=...,
    fill_opacity=...,
    legend_name=...
).add_to(m)

m

### Question 6 <a id='q6'></a>
Explain your design choices for the choropleth map. What options did you consider? What options did you end up choosing, and why? Be sure to reference the context of the data when you explain your choices.

*Replace this line with your response*

----


Data Science Modules: http://data.berkeley.edu/education/modules

Data Science Offerings at Berkeley: https://data.berkeley.edu/academics/undergraduate-programs/data-science-offerings



Notebook developed by: Keeley Takimoto, Harold Cha