### LEGALST-190 Lab 2/13

---

In this lab, students will learn how to construct a heatmap, as well as an interactive heat map. This will also be a component of the take-home problem set. This builds ontop of the folium labs from last week.


In [1]:
# dependencies
from datascience import *
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import folium
import json
import os

In [2]:
!pip install folium --upgrade
import folium.plugins # The Folium Javascript Map Library
from folium.plugins import HeatMap
from folium.plugins import HeatMapWithTime

Requirement already up-to-date: folium in c:\users\anike\anaconda3\lib\site-packages
Requirement already up-to-date: requests in c:\users\anike\anaconda3\lib\site-packages (from folium)
Requirement already up-to-date: six in c:\users\anike\anaconda3\lib\site-packages (from folium)
Requirement already up-to-date: jinja2 in c:\users\anike\anaconda3\lib\site-packages (from folium)
Requirement already up-to-date: branca in c:\users\anike\anaconda3\lib\site-packages (from folium)
Requirement already up-to-date: chardet<3.1.0,>=3.0.2 in c:\users\anike\anaconda3\lib\site-packages (from requests->folium)
Requirement already up-to-date: idna<2.7,>=2.5 in c:\users\anike\anaconda3\lib\site-packages (from requests->folium)
Requirement already up-to-date: urllib3<1.23,>=1.21.1 in c:\users\anike\anaconda3\lib\site-packages (from requests->folium)
Requirement already up-to-date: certifi>=2017.4.17 in c:\users\anike\anaconda3\lib\site-packages (from requests->folium)
Requirement already up-to-date: Ma

---

## The Data <a id='data'></a>
---

Today we'll be working with data on Berkeley crime calls, courtesy of the Berkeley Police department. Take a look at the metadata [here.](https://data.cityofberkeley.info/Public-Safety/Berkeley-PD-Calls-for-Service/k2nh-s5h5)

Note: this data set has already undergone a fair amount of cleaning to format it for our purposes (e.g. extracting the longitude and latitude, removing null values, and dropping irrelevant columns). You can see the original data at the source website.

Then, run the cell below to load the data into a Table.  

In [3]:
calls = Table.read_table('data/berkeley_crime_0218.csv', index_col=0)
calls.show(5)

OFFENSE,CVLEGEND,BLKADDR,City,State,Day,Lat,Lon,timestamp
BURGLARY AUTO,BURGLARY - VEHICLE,1300 SAN PABLO AVE,Berkeley,CA,Monday,37.8803,-122.296,2017-12-18 09:45:00
THEFT MISD. (UNDER $950),LARCENY,1600 SHATTUCK AVE,Berkeley,CA,Monday,37.8781,-122.269,2017-10-30 10:25:00
BURGLARY COMMERCIAL,BURGLARY - COMMERCIAL,1400 SHATTUCK AVE,Berkeley,CA,Saturday,37.882,-122.27,2018-01-20 05:23:00
ALCOHOL OFFENSE,LIQUOR LAW VIOLATION,SOLANO AVENUE & COLUSA AVE,Berkeley,CA,Saturday,37.8914,-122.279,2017-10-28 20:08:00
BURGLARY AUTO,BURGLARY - VEHICLE,800 POTTER ST,Berkeley,CA,Wednesday,37.8513,-122.293,2018-01-03 19:00:00


When working with any new data set, it's a good idea to get to know it first. Use the following cell and the information on cityofberkeley.org to answer some basic questions:
- What information does this table contain? What are the different columns?
- How large is the data set? 
- What kinds of questions could we answer using this data set?

## Heatmap <a id='data'></a>

Let's see if we can figure out what heatmap does and why it is useful.But first, we're going to quickly review how to use folium.Map. Again, you should consult the [python visualizer](https://python-visualization.github.io/folium/quickstart.html) for a refresher in case you forget how folium works!

Plot a map of the United States again using folium.Map.

<b>Reminder</b>: It is in the order of lat, log and the larger the zoom_start is the closer the map is.

In [4]:
# First, we create a folium Map
example_map1 = folium.Map([39.83, -98.59], zoom_start=6)
example_map1

### Key Note

Heatmaps do not take Tables so you will need to provide a list of lat, lons, i.e. a list of lists. 

Imagine that it looks something like this: `[[lat, lon],[lat, lon],[lat, lon],[lat, lon],[lat, lon]]`. This means if you were given a Table, there are a few steps you'd have to take.

1. Make sure the lat and lon are floats.
2. Filter the Table for the correct rows and columns.

What is something else you believe you'll need to check for to make sure that Heatmap will work?

Our data set today has already had the NaNs filtered out, but that might not be true for data you work with in the future...

Run the next cell to generate a set of dummy `[[lat, lon]]` pairs for the HeatMap. Don't worry about the information itself. Instead, note how the array is formatted.

In [5]:
# The first two lines generate an array of small random numbers.
# The third line adds the random numbers to the pair [48, 5] to get 100 latitude, longitude pairs near [48, 5]
data = (np.random.normal(size=(100, 2)) *
        np.array([[1, 1]]) +
        np.array([[39.83, -98.59]])).tolist()
data

[[40.10934418951442, -98.316709413751],
 [40.22956072106896, -96.44774343976287],
 [39.87211266576602, -99.1430352392865],
 [39.56570911289435, -97.49733564478818],
 [40.002962493313, -99.11961217175848],
 [40.20948233543553, -97.98135963323405],
 [38.66840255810513, -99.41457302613887],
 [40.94704598947841, -98.63678739597316],
 [39.75856120654914, -98.87142844013334],
 [39.900915250498755, -99.86608956582039],
 [40.73432761011113, -97.16192233349608],
 [39.68990206474196, -97.24761716286774],
 [39.49264561596287, -99.00886095408727],
 [38.47840834494385, -98.47038871634376],
 [39.370954471640154, -98.72596954860235],
 [40.105889807509016, -99.72834369411348],
 [39.567608518636405, -98.77171869294949],
 [40.27648078555396, -98.14457276104234],
 [39.46890435629817, -100.50675230739144],
 [39.897517549258325, -99.35701276982527],
 [38.90053487665428, -99.39219564022254],
 [41.83935426988349, -97.9679677585273],
 [39.84188250981375, -98.25291715442432],
 [39.20450287588199, -97.742287389

Then we can plot it on the map! The function is pretty simple: 
1. Create a Heatmap using the function `Heatmap(your_lat_lon_data)`
2. Add that Heatmap to your existing map with `add_to(your_map)`

In [6]:
# Add the HeatMap to the map
HeatMap(data).add_to(example_map1)

example_map1

Play around with your new Heatmap. What is it plotting? What kinds of things would a Heatmap be useful for?

### Try It Out

Now, try making your own Heatmap using the Berkeley PD call data. First, plot a Folium Map of the Bay Area, just like you did last week.

In [7]:
#Plot the map of Berkeley
berk_coords = ...
berk_map = ...
berk_map

Ellipsis

Next, extract your latitude and longitude data from the `calls` Table and save each to the variables `lat` and `lon`. We want the data as a numpy array, so don't use the `select` function; instead, index the Table by the correct column (e.g. `calls["Column_I_Want"]`).

In [8]:
lat = ...
lon = ...
lat

Ellipsis

We have the right data, but it isn't in the right shape: we want an array of arrays, where the first column is latitudes, the second column is longitudes, and each row is a `[lat, lon]` pair (see the example above). We can do this by:
1. **Stacking** the `lat` array on top of the `lon` array into one larger array with `np.vstack`
2. **Transposing** our stacked array so the latitude and longitude are vertical columns, not horizontal rows.

Hint 1: the stacking function call looks something like `np.vstack((top_array, bottom_array))`
Hint 2: you can transpose an array by calling `.transpose()` on the array

In [9]:
call_locs = ...
call_locs[1]

TypeError: 'ellipsis' object is not subscriptable

Now, you have everything you need to make your HeatMap! Do so in the cell below.

In [10]:
#Create a Heatmap with the call data.
heatmap = ...

# Add it to your Berkeley map.
...

Ellipsis

What conclusions can you draw from this Heatmap?

## HeatMapwithTime <a id='data'></a>

Now what do you think is different with HeatMapwithTime?

In this example, we'll again use dummy data to show how it works. It follows a similar process to HeatMap. First, create another Folium Map centered at the geographical center of the USA.

In [11]:
# Create a folium Map at the USA's center
example_map2 = ...
example_map2

Ellipsis

Next, we will create more dummy location data to simulate locations associated with different dates. Don't worry too much about the code here, but you do need to understand how the output is shaped and why it needs to be shaped like that.

In [12]:
# This cell builds together an array of initial data to display on our HeatMapwithTime. Just as before, these are dummy 
# variables that are 100 copies of the center of the USA meant to simulate different locations in the area.
# Again, we have to use lon and lat in addition to time.
np.random.seed(3141592)
initial_data = (
    np.random.normal(size=(100, 2)) * np.array([[1, 1]]) +
    np.array([[48, 5]])
)

# Create even more random lat/lon pairs and group into 100 lists
# You don't need to know how to write this code
move_data = np.random.normal(size=(100, 2)) * 0.01

data = [(initial_data + move_data * i).tolist() for i in range(100)]
data[1]

[[47.52791226223265, 4.286855296250705],
 [47.70515318785451, 4.254559938968417],
 [46.572343290775486, 3.16589036807419],
 [48.517652364172655, 3.565009343464859],
 [47.9151469875597, 4.561948133015522],
 [47.73009213894775, 6.400078703194206],
 [49.26936459309292, 3.5699714093014285],
 [47.851397175019756, 3.8319427791390397],
 [48.04477172261817, 6.316007103139816],
 [46.479416288127055, 4.682820499868778],
 [48.24059429677636, 4.343909573828612],
 [49.09604239901807, 5.094088143933742],
 [48.28316532514145, 5.662784381971345],
 [48.12325977596014, 5.782001071996817],
 [46.44347099963196, 4.770453864106374],
 [49.29117678153405, 6.136711330741487],
 [47.00836652976502, 6.286928736906839],
 [46.86855706102264, 5.113019811657515],
 [48.612612228434514, 4.677966117580723],
 [48.84324127910088, 4.533402809342304],
 [49.816563455994675, 3.759361045283218],
 [47.402859931934984, 5.299758773151624],
 [47.59727798048993, 4.887621415574294],
 [47.96991918344748, 5.560256321942765],
 [47.5909

Since we're using HeatMapWithTime, we need an extra parameter: the dates for each list of lat/lon pairs. Run the next cell to create one.

In [13]:
# Generate a set of dates for this dummy data.
# Luckily for us, when you test this out for yourself, dates came with your data set.
# You don't need to write out this code, but do look it over and see if you can understand it.
from datetime import datetime, timedelta

time_index = [
    (datetime.now() + k * timedelta(1)).strftime('%Y-%m-%d') for
    k in range(len(data))
]

Finally, create the HeatMapWithTime by calling the constructor function on the data and settng the index to the set of dates you generated. Then, add it to your Map.

In [14]:
# This is the code on how to run HeatMapwithTime. Looks similar to code we saw above right?
m = folium.Map([48., 5.], zoom_start=6)

hm = HeatMapWithTime(
    data,
    index=time_index,
    auto_play=True,
)

hm.add_to(m)

m

Now try for yourself using the Berkeley `calls` data set.

The first step is to get the data into the correct format. Create a new Table with two columns: Date, containing the data in the calls "timestamp" column, and Location, containing the call location data you used to make your HeatMap (the stacked and transposed latitudes and longitudes).

Hint: check your 1-18 lab, or the Datascience Table documentation for [creating](http://data8.org/datascience/_autosummary/datascience.tables.Table.with_columns.html#datascience.tables.Table.with_columns) and [grouping](http://data8.org/datascience/_autosummary/datascience.tables.Table.group.html) Tables. You're going to want to call `group` with the `list` function as the aggregator.

In [None]:
# Create a new table with the date and call location data, grouped by the date. 

locs_and_dates = ...
locs_and_dates.show(5)

Next, extract the dates and the grouped locations into two variables to put in your HeatMapWithTime. Note:

* HeatMapWithTime needs lists, so you'll need to convert your dates to a list using `.tolist()`
* The Table Group function converts everthing to arrays, and each array needs to be converted to a list. This is super annoying, so we've given you the code to do it. Just extract the grouped locations from the correct column and put the extracted data in the ellipses on the second line.

In [None]:
berk_dates = ...
berk_loc_by_date = [[x.tolist() for x in y] for y in ...]

Finally, create a Folium map of Berkeley, then create a [HeatMapwithTime](https://python-visualization.github.io/folium/docs-v0.5.0/plugins.html) and add it to your Berkeley map. The call looks like `HeatMapWithTime(<grouped locations>, index=<dates>`). Click the link for more documentation. And, try adding the argument `autoplay=True`.

In [None]:
#Plot the heatmap of Berkeley crime
berk_coords = ...
berk_map2 = ...


hmwt_berk = HeatMapWithTime(
    ...,
    index=...,
)

hmwt_berk.add_to(berk_map2)
berk_map2

What conclusions can you draw from this Heatmap?