# [SOC-88] Mapping Lab 2: Choropleth Maps
## Professor David Harding

### Table of Contents
- [Introduction](#intro)
- [1. Intro to Geojson](#1)
- [2. Intro to Choropleth Maps](#2)
- [3. Intro to Colormaps](#3)
- [4. Choropleth Overlays](#4)
    - [Question 1: Reading Colormaps](#q1)
    - [Question 2: Bins](#q2)
    - [Question 3: Your Turn!](#q3)
- [Challenge Question](#q4)

**Dependencies**

In [None]:
# just run this cell
from datascience import *
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import folium
import json
import os

# Introduction <a id='intro'></a>
In this lab, we will cover the creation and design of choropleth maps. We will be using the same police incident data from the visualization homework and mapping lab 1. As a refresher, the police incident data is from [Open Data Minneapolis](http://opendata.minneapolismn.gov/). It contains records of all police incidents and its columns contain information such as the latitude-longitude information of incidents, police precinct and neighborhood in which the incident occurred, time and date of the report, type of crime, etc. 

In [None]:
incidents = Table().read_table('data/Police_Incidents_2019.csv')
incidents.show(5)

# 1. Intro to Geojson <a id = '1'></a>
[Geojson](https://geojson.org/) is a file format that is used to represent various types of geographical data. We won't get into the details, but geojson files are useful for storing geographic data in a simple way that computers are able to load quickly. They are python dictionaries that may contain data about shapes on a map, defined by a series of coordinates, along with names and other relevant information. 

Our neighborhood geojson file, loaded below as `neighborhoods`, contains the names and boundaries of neighborhoods in Minneapolis. Similarly, the police precints geojson file loaded as `precincts` contains information about the borders for Minneapolis police precincts.

These geojson files let us visualize boundaries on a map. Below, we use the `neighborhoods` and `precincts` geojson files to show the boundaries of neighborhoods in black and the precinct boundaries in blue.

In [None]:
neighborhoods = json.load(open('data/Minneapolis_Neighborhoods.geojson'))
precincts = json.load(open('data/Minneapolis_Police_Precincts.geojson'))

In [None]:
# example of what the json looks like
precincts

In [None]:
minneapolis_coords = [44.977, -93.265] # this is the geographic center of minneapolis
m = folium.Map(minneapolis_coords, zoom_start=12)

# neighborhoods
folium.GeoJson(
    neighborhoods,
    style_function=lambda feature: {
        'fillColor': 'white',
        'color': 'black',
        'weight': 2,
        'dashArray': '5, 5'
    }
).add_to(m)

# precincts
folium.GeoJson(
    precincts,
    style_function=lambda feature: {
        'fillColor': 'white',
        'color': 'blue',
        'weight': 2,
        'dashArray': '5, 5'
    }
).add_to(m)

m

# 2. Intro to Choropleth Maps <a id = '2'></a>
A choropleth overlay, or choropleth map, is a type of map which uses color to represent statistical information. Choropleth maps can be used to convey information such as population density, poverty rates, unemployment rates, or in our case, police incidents. They are often used when visualizing a statistic that varies geographically. Through the use of mapping techniques, a choropleth overlay can succintly represent geographical differences between areas. Below is an example of a choropleth map that shows the unemployment rate by state - later on in the lab, we will create our own choropleth map based on the police incidents data.

<img src="images/choropleth-example.PNG" style="width:600px">

# 3. Intro to Colormaps <a id='3'></a>


A colormap is a collection of colors that are used to represent sequences of information on a given scale. Matplotlib has many different colormap options, but there are a lot of things to keep in mind when choosing a colormap beside which one looks the prettiest. Matplotlib has a [useful guide](https://matplotlib.org/tutorials/colors/colormaps.html) for choosing colormaps. Some things to keep in mind:

- what type of data are you representing?
- is there a critical value in the data from which other values deviate?
- is there an intuitive color scheme? (for example, political preferences, or green is good and red is bad)

Most often, colormaps follow a uniform scale, where equal steps in the data are represented as equal steps in the color space.
There are many different colormap options, some of which are shown below. The full list can be found [here](https://matplotlib.org/gallery/color/colormap_reference.html).

<img src="images/colormap-example.PNG" style="width:600px">

In order to set a colormap option, we will provide an arugment to the `cmap` option in matplotlib. For example: `cmap = 'viridis'` sets the colormap to follow the viridis pattern as seen above. Colormap options are split into several categories based on function, and different options may convey different meanings. For example, one may want to use a different colormap when plotting qualitative data with no particular order, than when plotting diverging data where information deviates around a meaninful point. In folium, colormaps are set with the parameter `fill_color`, and there are limited supported colormaps. They are as follows: 'BuGn', 'BuPu', 'GnBu', 'OrRd', 'PuBu', 'PuBuGn', 'PuRd', 'RdPu', 'YlGn', 'YlGnBu', 'YlOrBr', and 'YlOrRd'.

# 4. Choropleth Overlays
In this section, we will go over how to add a choropleth overlay to a folium map. This will build on mapping lab 1, where you were introduced to basic mapping in folium. First, we'll load a blank map of Minneapolis.

In [None]:
minneapolis_coords = [44.977, -93.265]
m = folium.Map(minneapolis_coords, zoom_start=12)
m

In order to visualize the number of incidents by police precinct, we need to group our incidents data.

In [None]:
precinct_incidents = incidents.group('precinct')
precinct_incidents

'UI' seems to be some type of label for an unidentified or missing precinct, and since this won't be displayed in our choropleth overlay we can simply remove it from our table.

In [None]:
precinct_incidents = precinct_incidents.take[:5]
precinct_incidents

Folium uses [pandas dataframes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) instead of tables from the `datascience` library, so we will need to convert our table of incidents per precinct to a dataframe in order to shade by counts. We have provided code to do so in the cell below. A pandas dataframe is very similar to a table, but has slightly different functionality and methods.

We also need to make sure that the precinct labels match the labels in our geojson file, and since they're strings in the geojson we must convert them in the dataframe.

In [None]:
precinct_incidents_df = precinct_incidents.to_df()
precinct_incidents_df['precinct'] = ['1', '2', '3', '4', '5']
precinct_incidents_df

In the cell below, we create a choropleth map that shows the number of incidents per precinct in Minneapolis. We pass in our `precinct_incidents_df` to the keyword argument `data`, which provides the information for the color overlay. `columns` and `key_on` provide information on how to link the data from the dataframe to the geojson. The keyword arguments `bins`, `fill_color`, `fill_opacity`, and `legend_name` allow us to customize the design of our overlay. 

In [None]:
m = folium.Map(minneapolis_coords, zoom_start=12)
folium.Choropleth(
    geo_data=precincts,
    data=precinct_incidents_df,
    columns=['precinct', 'count'],
    key_on='feature.properties.PRECINCT',
    bins = 5,
    fill_color='YlOrRd',
    fill_opacity=0.8,
    legend_name='Number of Incidents'
).add_to(m)
m

With so many different keyword arguments, there is a lot of control that you have over the design of your choropleth map. You can change the colormap with the `fill_color` argument, you can change the bins with the `bins` argument, you can change the opacity of the shading with `fill_opacity`, and a lot more.

**Note:** Since we are "binding" data, as in displaying data from a table in our map, we are limited to a certain few supported colormaps. Above, we have used `YlOrRd` which creates a scale of the colors yellow, orange, and red. The list of folium supported colormaps are as follows: 'BuGn', 'BuPu', 'GnBu', 'OrRd', 'PuBu', 'PuBuGn', 'PuRd', 'RdPu', 'YlGn', 'YlGnBu', 'YlOrBr', and 'YlOrRd'. When selecting the colormap for your choropleth overlay, be sure to choose from the supported ones.

### Question 1: Reading Colormaps <a id='q1'></a>
Look at the choropleth map above. What can you say about the precinct that has the darkest red color? Look at the values of the `precinct_incidents` table and sort by descending number of counts. What do you notice about the distribution of counts? Does it have a major effect on the coloring of the map?

*Replace this line with your answer*

In [None]:
# your code here
precinct_incidents...

With the `bins` argument, we have control over how we divide up the coloring. We can pass in a list of the specific bins we want, so theoretically we could have bins so that each precinct is in its own bin.  

If you have an interesting distribution of counts with possible outliers, plotting the quantile may help us see better how incidents are distributed. `quantile` is a method of a pandas dataframe, and we've provided an example of how to use it below.

In [None]:
precinct_bins = list(precinct_incidents_df['count'].quantile([0, 0.25, 0.5, 0.75, 1]))
precinct_bins

In [None]:
m = folium.Map(minneapolis_coords, zoom_start=12)
folium.Choropleth(
    geo_data=precincts,
    data=precinct_incidents_df,
    columns=['precinct', 'count'],
    key_on='feature.properties.PRECINCT',
    bins = precinct_bins,
    fill_color='YlOrRd',
    fill_opacity=0.8,
    legend_name='Number of Incidents'
).add_to(m)
m

### Question 2: Bins <a id='q2'></a>
Does this map with more specific bins tell you anything more about the geographic distribution of incidents than the first map? In what scenarios would having more specific bins, or bins of different size, help in understanding where incidents take place?

*Replace this line with your answer*

### Question 3: Your turn! <a id='q3'></a>
Now that you've seen how to create a choropleth overlay for number of incidents per precinct, let's try making a choropleth overlay for the number of incidents per neighborhood. First, group the incidents data so that you get the count of incidents per neighborhood.

In [None]:
neighborhood_incidents = incidents...
neighborhood_incidents

As mentioned before, folium uses pandas dataframes instead of datascience tables. Run the cell below to convert your grouped table into a dataframe.

In [None]:
neighborhood_incidents_df = neighborhood_incidents.to_df()
neighborhood_incidents_df.head()

In the cell below, fill in the keyword arguments for `bins`, `fill_color`, `fill_opacity`, and `legend_name`. Think about how the data are distributed, and make your design choices appropriately. 

In [None]:
m = folium.Map(minneapolis_coords, zoom_start=12)
folium.Choropleth(
    geo_data=precincts,
    data=precinct_incidents_df,
    columns=['precinct', 'count'],
    key_on='feature.properties.PRECINCT',
    bins = ...,
    fill_color=...,
    fill_opacity=...,
    legend_name=...
).add_to(m)
m

Below, explain the design choices you made for your choropleth overlay.

*Replace this line with your answer*

# Challenge Question: Police Use of Force <a id='q4'></a>

In the first visualization homework, we also looked at the dataset for police use of force. We've loaded that data below in the `force` table. Use what you've learned in this lab to create a choropleth overlay for the cases of force use in each neighborhood. We've provided the skeleton code for the map, but you are responsible for getting the force data into the correct format. Remember that folium takes in pandas dataframes - see above questions on how to convert a table to a dataframe.

In [None]:
force = Table().read_table('data/Police_Use_of_Force.csv')
force.show(5)

In [None]:
# your code here

In [None]:
m = folium.Map(minneapolis_coords, zoom_start=12)
folium.Choropleth(
    geo_data=neighborhoods, # neighborhood geojson file
    data=..., # your neighborhood counts dataframe
    columns=['Neighborhood', 'count'], # columns of your dataframe 
    key_on='feature.properties.BDNAME', # neighborhood name from geojson file
    bins = ...,
    fill_color=...,
    fill_opacity=...,
    legend_name=...
).add_to(m)
m

**Describe your design choices for your map below**

*Replace this line with your answer*

## Bibliography 

- Folium - Image example of choropleth map. https://python-visualization.github.io/folium/quickstart.html
- Matplotlib - Image example of colormap. https://matplotlib.org/gallery/color/colormap_reference.html
- Open Data Minneapolis - Police Incident and Police Force data. http://opendata.minneapolismn.gov/

---

Notebook developed by: Keilyn Yuzuki

Data Science Modules: http://data.berkeley.edu/education/modules

Data Science Offerings at Berkeley: https://data.berkeley.edu/academics/undergraduate-programs/data-science-offerings