# [LEGALST-123] Lab 09: Folium Heatmaps Lab

---

In this lab, students will learn how to construct a heatmap, as well as an interactive heat map. This will also be a component of the take-home problem set. This builds ontop of the folium labs from last week.


In [1]:
# dependencies
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import folium
import json
import os

In [2]:
!pip install folium --upgrade
import folium.plugins # The Folium Javascript Map Library
from folium.plugins import HeatMap
from folium.plugins import HeatMapWithTime

Requirement already up-to-date: folium in c:\users\anike\appdata\roaming\python\python36\site-packages (0.8.0)
Requirement not upgraded as not directly required: numpy in c:\programdata\anaconda3\lib\site-packages (from folium) (1.14.3)
Requirement not upgraded as not directly required: jinja2 in c:\programdata\anaconda3\lib\site-packages (from folium) (2.10)
Requirement not upgraded as not directly required: requests in c:\programdata\anaconda3\lib\site-packages (from folium) (2.18.4)
Requirement not upgraded as not directly required: branca>=0.3.0 in c:\programdata\anaconda3\lib\site-packages (from folium) (0.3.1)
Requirement not upgraded as not directly required: six in c:\programdata\anaconda3\lib\site-packages (from folium) (1.11.0)
Requirement not upgraded as not directly required: MarkupSafe>=0.23 in c:\programdata\anaconda3\lib\site-packages (from jinja2->folium) (1.0)
Requirement not upgraded as not directly required: chardet<3.1.0,>=3.0.2 in c:\programdata\anaconda3\lib\site-

distributed 1.21.8 requires msgpack, which is not installed.
You are using pip version 10.0.1, however version 19.0.3 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.


---

## The Data <a id='data'></a>
---

Today we'll be working with data on Berkeley crime calls, courtesy of the Berkeley Police department. Take a look at the metadata [here.](https://data.cityofberkeley.info/Public-Safety/Berkeley-PD-Calls-for-Service/k2nh-s5h5)

Note: this data set has already undergone a fair amount of cleaning to format it for our purposes (e.g. extracting the longitude and latitude, removing null values, and dropping irrelevant columns). You can see the original data at the source website.

Then, run the cell below to load the data into a Dataframe.  

In [3]:
calls = pd.read_csv('data/berkeley_crime_0218.csv', index_col=0)
calls.head(5)

Unnamed: 0_level_0,OFFENSE,CVLEGEND,BLKADDR,City,State,Day,Lat,Lon,timestamp
CASENO,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
17076632,BURGLARY AUTO,BURGLARY - VEHICLE,1300 SAN PABLO AVE,Berkeley,CA,Monday,37.880262,-122.295809,2017-12-18 09:45:00
17092227,THEFT MISD. (UNDER $950),LARCENY,1600 SHATTUCK AVE,Berkeley,CA,Monday,37.878112,-122.269114,2017-10-30 10:25:00
18004102,BURGLARY COMMERCIAL,BURGLARY - COMMERCIAL,1400 SHATTUCK AVE,Berkeley,CA,Saturday,37.881957,-122.269551,2018-01-20 05:23:00
17065730,ALCOHOL OFFENSE,LIQUOR LAW VIOLATION,SOLANO AVENUE & COLUSA AVE,Berkeley,CA,Saturday,37.891368,-122.279257,2017-10-28 20:08:00
18000630,BURGLARY AUTO,BURGLARY - VEHICLE,800 POTTER ST,Berkeley,CA,Wednesday,37.851255,-122.292509,2018-01-03 19:00:00


When working with any new data set, it's a good idea to get to know it first. Use the following cell and the information on cityofberkeley.org to answer some basic questions:
- What information does this table contain? What are the different columns?
- How large is the data set? 
- What kinds of questions could we answer using this data set?

<b> Solution </b> :

In [4]:
# what are dimensions of dataframe
print('shape of dataframe as rows, columns is ',calls.shape)
# what are the columns
print('variables: ', list(calls.columns))

shape of dataframe as rows, columns is  (5030, 9)
variables:  ['OFFENSE', 'CVLEGEND', 'BLKADDR', 'City', 'State', 'Day', 'Lat', 'Lon', 'timestamp']


## Heatmap <a id='data'></a>

Let's see if we can figure out what heatmap does and why it is useful.But first, we're going to quickly review how to use folium.Map. Again, you should consult the [python visualizer](https://python-visualization.github.io/folium/quickstart.html) for a refresher in case you forget how folium works!

Plot a map of the United States again using folium.Map.

<b>Reminder</b>: It is in the order of lat, log and the larger the zoom_start is the closer the map is.

In [5]:
# First, we create a folium Map
example_map1 = folium.Map([39.83, -98.59], zoom_start=6)
example_map1

### Key Note

Heatmaps do not take Dataframes so you will need to provide a list of lat, lons, i.e. a list of lists. 

Imagine that it looks something like this: `[[lat, lon],[lat, lon],[lat, lon],[lat, lon],[lat, lon]]`. This means if you were given a Dataframe, there are a few steps you'd have to take.

1. Make sure the lat and lon are floats.
2. Filter the Dataframe for the correct rows and columns.

What is something else you believe you'll need to check for to make sure that Heatmap will work?

<b> Solution </b> : Check for NaNs. 

Our data set today has already had the NaNs filtered out, but that might not be true for data you work with in the future...

Run the next cell to generate a set of dummy `[[lat, lon]]` pairs for the HeatMap. Don't worry about the information itself. Instead, note how the array is formatted.

In [6]:
# The first two lines generate an array of small random numbers.
# The third line adds the random numbers to the pair [48, 5] to get 100 latitude, longitude pairs near [48, 5]
data = (np.random.normal(size=(100, 2)) *
        np.array([[1, 1]]) +
        np.array([[39.83, -98.59]])).tolist()

# Print first 10 sample points
data[:10]

[[38.6597033082904, -98.73327092660016],
 [40.65569799095573, -98.68904399931328],
 [39.58970172158679, -98.74610400764332],
 [40.31974338896467, -97.8155217866357],
 [40.205296277297066, -98.11471230794902],
 [39.98062254495512, -99.68613698891127],
 [38.96125317908028, -98.79038641200029],
 [38.542039324635056, -97.52940662699187],
 [41.32149009544228, -99.38955883982115],
 [39.78904322396528, -100.19112325550377]]

Then we can plot it on the map! The function is pretty simple: 
1. Create a Heatmap using the function `Heatmap(your_lat_lon_data)`
2. Add that Heatmap to your existing map with `add_to(your_map)`

In [7]:
# Add the HeatMap to the map
HeatMap(data).add_to(example_map1)

example_map1

Play around with your new Heatmap. What is it plotting? What kinds of things would a Heatmap be useful for?

<b> Solution</b>: 

In this case we're plotting some random noise distributed geographically around the center of the US. Heatmaps are generally really useful for visualizing geographic distributions (economics, crime, etc).

### Try It Out

Now, try making your own Heatmap using the Berkeley PD call data. First, plot a Folium Map of the Bay Area, just like you did last week.

In [8]:
#Plot the map of Berkeley
berk_coords = (37.87, -122.27) # Solution
berk_map = folium.Map(location=berk_coords, zoom_start=13) 
berk_map

Next, extract your latitude and longitude data from the `calls` Dataframe and save each to the variables `lat` and `lon`. We want the data as a numpy array, so index the Dataframe by the correct column (e.g. `calls["Column_I_Want"]`) to get a Series, then call `.values` to get an array.

In [9]:
lat = calls['Lat'].values 
lon = calls['Lon'].values 
lat

array([37.880262, 37.878112, 37.881957, ..., 37.87718 , 37.858132,
       37.862763])

We have the right data, but it isn't in the right shape: we want an array of arrays, where the first column is latitudes, the second column is longitudes, and each row is a `[lat, lon]` pair (see the example above). We can do this by:
1. **Stacking** the `lat` array on top of the `lon` array into one larger array with `np.vstack`
2. **Transposing** our stacked array so the latitude and longitude are vertical columns, not horizontal rows.

Hint 1: the stacking function call looks something like `np.vstack((top_array, bottom_array))`
Hint 2: you can transpose an array by calling `.transpose()` on the array

In [10]:
call_locs = np.vstack((lat, lon)).transpose().tolist() 
call_locs[:5]

[[37.880262, -122.295809],
 [37.878112, -122.269114],
 [37.881957, -122.269551],
 [37.891368, -122.279257],
 [37.851255, -122.292509]]

Now, you have everything you need to make your HeatMap! Do so in the cell below.

In [11]:
# Create a Heatmap with the call data.
heatmap = HeatMap(call_locs, radius = 10) 

# Add it to your Berkeley map.
berk_map.add_child(heatmap)

What conclusions can you draw from this Heatmap?

<b>Solution</b>:

**The heatmap shows a bunch of things. First, the calls seem to be recorded by intersection location, more or less. Second, the calls are concentrated along the main streets, especially south of campus, and at major intersections along University and San Pablo. The more residential parts of Berkeley both north and south are pretty quiet, except for a few hot spots (like California at Derby-Ward or thereabouts). Third, there are very few calls in North Berkeley and the Hills, except at Solano and Colusa and Marin and Euclid--traffic calls?**

## HeatMapwithTime <a id='data'></a>

Now what do you think is different with HeatMapwithTime?

<b> Solution</b>: 

We can encode an added dimension of time to all our data. We can understand how these distributions evolve over some time window which is really useful for understanding evolution of social phenomena.

In this example, we'll again use dummy data to show how it works. It follows a similar process to HeatMap. First, create another Folium Map centered at the geographical center of the USA.

In [12]:
# Create a folium Map at the USA's center
example_map2 = folium.Map([39.83, -98.59], zoom_start=6) # Solution
example_map2

Next, we will create more dummy location data to simulate locations associated with different dates. Don't worry too much about the code here, but you do need to understand how the output is shaped and why it needs to be shaped like that.

In [13]:
# This cell builds together an array of initial data to display on our HeatMapwithTime. Just as before, these are dummy 
# variables that are 100 copies of the center of the USA meant to simulate different locations in the area.
# Again, we have to use lon and lat in addition to time.
np.random.seed(3141592)
initial_data = (
    np.random.normal(size=(100, 2)) * np.array([[1, 1]]) +
    np.array([[39.83, -98.59]])
)

# Create even more random lat/lon pairs and group into 100 lists
# You don't need to know how to write this code
move_data = np.random.normal(size=(100, 2)) * 0.01

data = [(initial_data + move_data * i).tolist() for i in range(100)]
data[1][:5]

[[39.357912262232645, -99.3031447037493],
 [39.53515318785451, -99.33544006103158],
 [38.402343290775484, -100.42410963192582],
 [40.34765236417265, -100.02499065653514],
 [39.7451469875597, -99.02805186698448]]

Since we're using HeatMapWithTime, we need an extra parameter: the dates for each list of lat/lon pairs. Run the next cell to create one.

In [14]:
# Generate a set of dates for this dummy data.
# Luckily for us, when you test this out for yourself, dates came with your data set.
# You don't need to write out this code, but do look it over and see if you can understand it.
from datetime import datetime, timedelta

time_index = [
    (datetime.now() + k * timedelta(1)).strftime('%Y-%m-%d') for
    k in range(len(data))
]

Finally, create the HeatMapWithTime by calling the constructor function on the data and settng the index to the set of dates you generated. Then, add it to your Map.

In [15]:
# This is the code on how to run HeatMapwithTime. Looks similar to code we saw above right?
m = folium.Map([39.83, -98.59], zoom_start=6)

hm = HeatMapWithTime(
    data,
    index=time_index,
    auto_play=True,
    max_opacity=0.3
)

hm.add_to(m)

m

Now try for yourself using the Berkeley `calls` data set.

The first step is to get the data into the correct format. Create a new DataFrame with two columns: Date, containing the data in the calls "timestamp" column, and Location, containing the call location data you used to make your HeatMap (the stacked and transposed latitudes and longitudes).

Note, the current timestamp objects are reported as both date and time for a filing event. It may not be interesting for the resolution on a frame to be hourly, so we may recommend chopping off the time component of the timestamp and grouping by just day.

In [20]:
type(call_locs)

list

In [16]:
# Create a new Dataframe with the date and call location data
calls_loc_time = pd.DataFrame(
    data = {'Date': [c.split(" ")[0] for c in calls['timestamp']], 'Location': call_locs})

# Group by filing day and aggregate entries as a list
calls_loc_time = calls_loc_time.groupby('Date')['Location'].apply(list).reset_index()
calls_loc_time.head()

Unnamed: 0,Date,Location
0,2017-08-13,"[[37.870948, -122.27733], [37.862542, -122.290..."
1,2017-08-14,"[[37.871167, -122.268285], [37.856111, -122.26..."
2,2017-08-15,"[[37.873687, -122.268616], [37.877232, -122.27..."
3,2017-08-16,"[[37.878112, -122.269114], [37.871828, -122.27..."
4,2017-08-17,"[[37.873454, -122.27209], [37.895849, -122.263..."


Next, extract the dates and the grouped locations into two variables to put in your HeatMapWithTime, the same we did above with the regular HeatMap. A minor note, above we converted Series objects to arrays by calling `.values`, this technically resolves to a specific type of array that doesn't quite work with the Heatmap object. Simply wrap the call with a `list(your_series.values)` casting to solve this.

In [17]:
berk_dates = list(calls_loc_time['Date'].values) 
berk_loc_by_date = [list(loc_list) for loc_list in calls_loc_time['Location'].values] 

Finally, create a Folium map of Berkeley, then create a [HeatMapwithTime](https://python-visualization.github.io/folium/docs-v0.5.0/plugins.html) and add it to your Berkeley map. The call looks like `HeatMapWithTime(<grouped locations>, index=<dates>`). Click the link for more documentation. And, try adding the argument `auto_play=True`.

In [18]:
# Plot the heatmap of Berkeley crime
berk_coords = (37.87, -122.27) # Solution
berk_map2 = folium.Map(location=berk_coords, zoom_start=13) # Solution


hmwt_berk = HeatMapWithTime(
    berk_loc_by_date, 
    index=berk_dates, 
    auto_play=True
)

hmwt_berk.add_to(berk_map2)
berk_map2

What conclusions can you draw from this Heatmap?

<b>Solution</b>:

As with the static version of this heatmap, we see a lot more crime concentrated around the South and West parts of campus. Variations over time include number of crimes per day and some geographical movement. It's interesting to see there are certain blocks (Shattuck near campus) that consistently have crime reports. An interesting next step may be to classify these reports by type of crime to see if certain offense types exhibit greater locational than others. 

---
2019 Changes developed by Adithya Girish

Data Science Modules: http://data.berkeley.edu/education/modules