# Introduction

This lesson is the first of two that will show how to create maps in Python using Folium.

[Folium](https://python-visualization.github.io/folium/) is a wrapper that automates creating Leaflet maps. These maps (like Bokah graphs) are interactive: the user can zoom in and out, pan, etc., to explore the map.

The mapmaker doesn't need to work with HTML, CSS, or JavaScript: everything can be done within the Python ecosystem.

This makes it *much* easier to create a wide variety of different types of maps.

This lesson will show how to use Folium to create maps with points and circles, how to change their size and color, and how to annotate them with popup texts.

In [None]:
!pip3 install folium
!pip3 install xyzservices

from xyzservices import TileProvider

In [1]:
# import our tool libraries
import pandas as pd
import folium

According to this [Kaggle notebook](https://www.kaggle.com/code/alexisbcook/exercise-interactive-maps), adding

In [2]:
def embed_map(m, file_name):
    from IPython.display import IFrame
    m.save(file_name)
    return IFrame(file_name, width='100%', height='500px')

## Get the Data

Unlike many of our lessons, this one will draw on historical data, specifically a database of civil war battles.

The data we will be working with is [Jeffry Arnolds'](https://github.com/jrnold/acw_battle_data) [dataset](https://acw-battle-data.readthedocs.io/en/latest/). Arnold's data doesn't include lat/long data, so Karsdorp, Kestemont, and Allen added it as part of their "Narrating with Maps," chapter 7 of their excellent [*Humanities Data Analysis: Case Studies with Python*](https://www.amazon.com/Humanities-Data-Analysis-Studies-Python/dp/0691172366) (Princeton: Princeton University Press, 2021). Their code and data can be found at [Zenodo](https://zenodo.org/record/3563075).

I have taken the data developed as part of this chapter and saved it to my [Github Repo for this class](https://github.com/adamlporter/DataAnalysisClass). We will load the data from that repo.



In [2]:
df = pd.read_csv('https://raw.githubusercontent.com/adamlporter/DataAnalysisClass/master/cwsac_battle_locations_with_lat_long_tabdelim.csv',
                 sep = '\t',
                 parse_dates=['start_date','end_date'])

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 382 entries, 0 to 381
Data columns (total 25 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   battle           382 non-null    object        
 1   url              382 non-null    object        
 2   battle_name      382 non-null    object        
 3   other_names      197 non-null    object        
 4   state            382 non-null    object        
 5   locations        382 non-null    object        
 6   campaign         382 non-null    object        
 7   start_date       382 non-null    datetime64[ns]
 8   end_date         382 non-null    datetime64[ns]
 9   operation        382 non-null    int64         
 10  assoc_battles    2 non-null      object        
 11  results_text     382 non-null    object        
 12  result           382 non-null    object        
 13  forces_text      381 non-null    object        
 14  strength         59 non-null     float64  

## Draw a basic map

Let's start with a quick-and-dirty map.

Folium first requires us to initiate a Map object. We need to specify the center location of the map, the tile set we want to use, and a zoom level.

I generally center my maps on the center of the data. We can calculate this easily by finding the midpoint between the min and max values for lat / lon:
```
center = [ (df['lat'].max() + df['lat'].min())/2,
             (df['lon'].max() + df['lon'].min())/2]
```


### Select a basemap

Folium includes a number of [different tile sets](https://xyzservices.readthedocs.io/en/stable/gallery.html) that will provide the basemap, upon which our data will be displayed. The default is `OpenStreetMap` but this doesn't make sense for a Civil War map. So we will use the `OpenTopoMap` map instead.

Finally, we need to specify a `zoom_start` value. This will vary depending on the size of the map.

In [6]:
provider = TileProvider.from_qms("OpenTopoMap")

center = [ (df['lat'].max() + df['lat'].min())/2,(df['lon'].max() + df['lon'].min())/2]

m = folium.Map(location=center,
               zoom_start = 5)
folium.TileLayer(provider).add_to(m)

m # draw the (blank) map
#embed_map(m,'m1.html')

After we have initiated our map object, we can plot our data. Pandas gurus argue against using the `.iterrows()` because it is slow, but most maps use relatively small dataframes, so speed isn't *that* important.

For our first map, we will just iterate over the dataframe and plot battle locations. Folium has a large number of different ways to plot data.

If we use the `.Marker` method and just provide location information (lat/lon coordinates), Folium will create a map with the familiar 'info-sign' icons.

In [None]:
center = [ (df['lat'].max() + df['lat'].min())/2,(df['lon'].max() + df['lon'].min())/2]
m = folium.Map(location=center,
               zoom_start = 5)
folium.TileLayer(provider).add_to(m)

for idx,row in df.iterrows():
    folium.Marker(location = [row['lat'],row['lon']]
                         ).add_to(m)

m

### Change Glyphs and Colors

Rather than having a little white circle, you can specify a different icon for the pin.
Folium supports bootstrap, so you specify an icon from the [glyph list](https://getbootstrap.com/docs/3.3/components/) to put on the marker.

You can also specify different colors from the [default list](https://python-visualization.github.io/folium/modules.html#Icon):
```
['red', 'blue', 'green', 'purple', 'orange', 'darkred',
 'lightred', 'beige', 'darkblue', 'darkgreen', 'cadetblue',
 'darkpurple', 'white', 'pink', 'lightblue', 'lightgreen',
 'gray', 'black', 'lightgray']
```

To see this, I have created lists of the glyphs (not all of them!) and colors; we will assign them randomly to the different pins:

In [None]:
import random

m = folium.Map(location=center,
               zoom_start = 5)
folium.TileLayer(provider).add_to(m)

colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred',
        'lightred', 'beige', 'darkblue', 'darkgreen', 'cadetblue',
        'darkpurple', 'white', 'pink', 'lightblue', 'lightgreen',
        'gray', 'black', 'lightgray']

icons = ['home','flag','time','road','cog','trash','lock','headphones',
         'tag','video','book','tags','tint','question_sign','info-sign',
         'screenshot','exclamation-sign','plane','leaf','fire','gift','ok-sign',
         'certificate','thumbs-up','heart-empty','paperclip']

for idx,row in df.iterrows():
    folium.Marker(location = [row.lat,row.lon],
                icon=folium.Icon(icon = random.choice(icons),
                                 color = random.choice(colors)),
                         ).add_to(m)
m

This map looks silly, which is inappropriate given the seriousness of the map. More Americans died in the Civil War than in any other war in which the US has fought.

Mapmakers should think carefully about the sorts of colors and glyphs that are appropriate for the topic.

Folium allows us to add popup information to the markers by using either the `popup=` or `tooltip=` parameters.
* `popup=` requires the user to click on the point to see the information
* `tooltip=` will display the information as the user moves their pointer across the map

We can assign data from the dataframe directly, if it is a string. If the data we want to inclue in the popup is a number or date, we will need to do a little manipulation to turn it into a string. This is straight-forward with [f-string](https://realpython.com/python-f-strings/) formatting.

In [None]:
m = folium.Map(location=center,
               zoom_start = 5)
folium.TileLayer(provider).add_to(m)

for idx,row in df.iterrows():
  popup_text = f"{row.campaign}\nDate: {row.start_date:%Y-%m-%d}\nCasualities: {row.casualties:,.0f}"

  folium.Marker(location = [row.lat,row.lon],
                icon=folium.Icon(icon='star',color = 'green'),
                popup = popup_text
                         ).add_to(m)

m

The F-string formatting gives mapmakers enormous control over the text that appears.

For example, the campaign name includes dates:
```
Sand Creek Campaign [November 1964]
```
But the database includes specific dates. We don't need to specify a temporary variable: the f-string allows us to modify the text in place:
```python
row.campaign.split('[')[0]
```
will split the campaign text string at the opening bracket ("["). This results in two text strings; we select the first one by specifying the first string with the ```[0]``` index.

If we want to boldface the labels for "Date" and "Casualities", we can use the HTML code (```<B>...</B>``` to boldface the text between the tags.

Doing both of these, we can create a new f-string:

```python
  popup_text = f"{row.campaign.split('[')[0]}\n<B>Date:</B> {row.start_date:%Y-%m-%d}\n<B>Casualities:</B> {row.casualties:,.0f}"
```
I will redraw the map above with the new popup_text string. I will also change this to a `tool_tip`, so you can see the difference.

In [None]:
m = folium.Map(location=center, zoom_start = 5)
folium.TileLayer(provider).add_to(m)

for idx,row in df.iterrows():
  popup_text = f"{row.campaign.split('[')[0]}\n<B>Date:</B> {row.start_date:%Y-%m-%d}\n<B>Casualities:</B> {row.casualties:,.0f}"

  folium.Marker(location = [row.lat,row.lon],
                icon=folium.Icon(icon='star',color = 'green'),
                tooltip = popup_text
                         ).add_to(m)

m

## Other Markers

There are situations where the markers used on the above maps will be perfectly adequate. One problem with them is that they are large and tend to overlie each other, making it hard to see some markers, epecially on crowded maps.

Folium has two other markers that work better in this situation: `.Circle()` and `.CircleMarker()`. The first is measured in meters; the second in pixels. If the use the `.Circle()` marker, when we zoom in/out ont he map, the marker will change; if the use `.CircleMarker()` the dot will remain the same size.

These markers have some of the same attributes (color and popup) as the `.Marker()` we used before, but additionally, we can specify
* `radius` - the size of the circle (in meters or pixels)
* `fill` - should the circle be filled in or not
* `fill_opacity` - how opaque should the circle be (this is useful if we have larger circles that might overlie each other)

In [None]:
m = folium.Map(location=center, zoom_start = 5)
folium.TileLayer(provider).add_to(m)

for idx,row in df.iterrows():
    popup_text = f"{row.battle_name}\nDate: {row.start_date:%Y-%m-%d}\nCasualities: {row.casualties:,.0f}"
    folium.CircleMarker(location = [row.lat,row.lon],
                color = 'red',
                radius = 3,
                fill = True,
                fill_opacity = 1,
                tooltip = popup_text
                ).add_to(m)

m


When designing a map, users should think about the various ways they can convey information to the viewer. The map above shows all the battles in the database. Users can click on the points to get information about the battle (date, casualities, etc.). But we can actually convey some of this information graphically.

For example, we can vary the size of the marker to indicate the number of casualities. We can also change the color of the marker to indicate the year of the battle.

The former is pretty straightforward: we can set the radius value based on the casuality figures. To ensure the circles are not too small, we can use the `max()` function to select from a default value and the casuality-calculated value.

The latter is a bit more complex: we need to define a `dictionary` with key/value pairs for the years and the colors we wish to use. When we iterate through the dataframe, the color will be set by looking up the values in the dictionary.

In [None]:
m = folium.Map(location=center, zoom_start = 5)
folium.TileLayer(provider).add_to(m)

color_dict = {'1861':'red','1862':'blue','1863':'green','1864':'purple','1865':'orange'}

for idx,row in df.iterrows():
    popup_text = f"Battle Name: {row.battle_name}\nDate: {row.start_date:%Y-%m-%d}\nCasualities: {row.casualties:,.0f}"
    folium.CircleMarker(location = [row.lat,row.lon],
                radius = max(3,row.casualties/1000),
                color = color_dict[f"{row.start_date:%Y}"],
                fill = True,
                tooltip = popup_text
                ).add_to(m)

m

## Filter by Year

This map conveys more information than the earlier one: we can see which battles had the most casualities and, by paying attention to the colors, get some idea of how the war progressed.

It might be easier for us to look at one year at a time. This is easy to do with Pandas: we just need to filter the dataframe to select the year we wish to map.

In [None]:
import datetime

m = folium.Map(location=center, zoom_start = 5)
folium.TileLayer(provider).add_to(m)

year = 1863 # <= specify a year here and use it to filter the date field in the next two rows
filter = (df['start_date'] >= datetime.datetime.strptime(f"{year}-01-01",'%Y-%m-%d')) & \
            (df['start_date'] <= datetime.datetime.strptime(f"{year}-12-31",'%Y-%m-%d'))

for idx,row in df[filter].iterrows(): # <= the filtered DF will only have battles in the year specified
    popup_text = f"Battle Name: {row.battle_name}\nDate: {row.start_date:%Y-%m-%d}\nCasualities: {row.casualties:,.0f}"
    folium.CircleMarker(location = [row.lat,row.lon],
                radius = max(3,row.casualties/1000),
                color = 'red',
                fill = True,
                tooltip = popup_text
                ).add_to(m)

m

We could produce a series of maps, year by year, to show the different battles in the civil war.

Folium makes it possible for us to plot these all on the same map by specifying different layers.

In the following map, we loop through the years 1861 to 1855. For each year:
1. We define a layer as a `FeatureGroup()` with a specific name (the year value) and add it to the map.
1. We create a filtered version of the DF that includes only rows of data for the desired year.
1. We iterate through the filtered DF to create our markers. These are added **to the layer** (not to the map, as in earlier examples).
1. We have added a tool to add `LayerControl()` to the map. This is a little box in the upper right corner: if you click on it, it will display radio buttons for the different layers, so you can click on the one you want to display.

When we initialized the map, we told Folium **not** to have a basemap (`tiles = None`). This is because if we initialize the map with basetiles, it will appear in the Control box. But we don't want to be able to turn off the basemap! Instead, we add the basemap with the `TileLayer()` method, where we specify `control = False`, to prevent THIS layer from appearing in the control box.

In [None]:
m = folium.Map(location=center,zoom_start = 5,tiles = None)
folium.TileLayer(provider,overlay=True,control=False).add_to(m)

for year in range(1861,1866):
    layer = folium.FeatureGroup(name = year,overlay = False).add_to(m)

    filter = (df['start_date'] >= datetime.datetime.strptime(f"{year}-01-01",'%Y-%m-%d')) & \
            (df['start_date'] <= datetime.datetime.strptime(f"{year}-12-31",'%Y-%m-%d'))

    for idx,row in df[filter].iterrows():
        popup_text = f"Battle Name: {row.battle_name}\nDate: {row.start_date:%Y-%m-%d}\nCasualities: {row.casualties:,.0f}"
        folium.CircleMarker(location = [row.lat,row.lon],
                    radius = row.casualties/1000,
                    color = 'red',
                    fill = True,
                    tooltip = popup_text
                    ).add_to(layer)

folium.LayerControl().add_to(m)

m


## Group and Heatmap

Folium provides several other visualizations for mapmakers, notably a *cluster visualization* and a *heatmap visualization*. Because the Civil War data isn't particulary good for showing how these tools can be used (they excell when faced with many individual points), we will load the *Washington Post*'s Fatal Force database. We will use this data because we have examined it previously, so it should be familar.

### Group / Cluster Map

Folium will group items together and, as the user zooms in, ungroup them to show the individual points.

To use this, we need to import a special Folium tool and tell the system to add a `marker_cluster` layer to the map. (The variable name can, of course, be anything the mapmaker desires.)
```python
from folium.plugins import MarkerCluster
marker_cluster = folium.plugins.MarkerCluster().add_to(m)
```
When we iterate over the data, rather than adding the points to the map (as we did above), we add them to the `marker_cluster` layer.

We can add any of the markers we described above to the layer (`.Marker()`, `.Circle()`, `.CircleMarker()` etc.); as the user zooms in, these are the points that will be revealed.

In [None]:
ff_df = pd.read_csv('https://raw.githubusercontent.com/washingtonpost/data-police-shootings/master/v2/fatal-police-shootings-data.csv',parse_dates = ['date'])
ff_df = ff_df[ff_df['latitude'].notna()] # drop rows that do not have lat/lon data

In [None]:
provider = TileProvider.from_qms("OpenStreetMap.Mapnik")

m = folium.Map(location=[40, -96], zoom_start=4)

folium.TileLayer(provider).add_to(m)
# switching the basemap (tiles) to a modern map that shows streets and other information

from folium.plugins import MarkerCluster
marker_cluster = folium.plugins.MarkerCluster().add_to(m)

for idx,row in ff_df.iterrows():
    lat, lon = row.latitude, row.longitude
    folium.CircleMarker(location=[lat,lon],
        color = 'red',
        radius = 3,
        fill = True,
        fill_opacity = 1
        ).add_to(marker_cluster)

m

Output hidden; open in https://colab.research.google.com to view.

We can also format the markers using the tools described above.

In [None]:
m = folium.Map(location=[40, -96], zoom_start=4)
folium.TileLayer(provider).add_to(m)

from folium.plugins import MarkerCluster
marker_cluster = folium.plugins.MarkerCluster().add_to(m)

color_dict = {'2015':'red','2016':'orange','2017':'darkred',
              '2018': 'green','2019':'blue','2020':'purple',
              '2021':'white','2022':'black','2023':'darkgreen'}

for idx,row in ff_df.iterrows():
    popup_text = f"Year:{row.date:%Y}\nRace:{row.race}\nAge:{row.age:.0f}"
    lat, lon = row.latitude, row.longitude
    folium.Marker(location=[lat,lon],
                  icon=folium.Icon(color = color_dict[f"{row.date:%Y}"]
                                  ),
                  tooltip = popup_text,
                  ).add_to(marker_cluster)

m

KeyError: '2024'

### Heatmap



The [heatmap](https://python-visualization.github.io/folium/plugins.html#folium-plugins) tool doesn't allow us to add points with information (as we did above). Instead, it takes just a list of lat/lon points and processes them to create the heatmap. So rather than iterating through the DF, we will just pass the list of lat/lon points to the tool.

In [None]:
m = folium.Map(location=[40, -96], zoom_start=4)
folium.TileLayer(provider).add_to(m)

from folium.plugins import HeatMap

points = ff_df[['latitude','longitude']].values.tolist()

HeatMap(points,radius = 12).add_to(m)

m

Folium can present a heatmap of data that changes over time. This doesn't work especially well for the fatal force data because the data is sporadic. It would work better for things like traffic flows along major highways, since there is almost always a low-level base usage, which peaks during rush-hour.

But we can see how to arrange the data so that -- should you get a data set that would work better -- you can use this tool.

The `HeatMapWithTime()` method expects to have two groups of data, of the same length.
* `data=` needs to be a list of points in the form [lat,lon]
* `index=` needs to be a list of dates

One way to create this is to set up a dictionary, using some [tools](https://docs.python.org/3/library/collections.html) from the collections library.

The following code creates a dictionary whose default value is a `list`. It iterates over the DF, but since it uses [`itertuples()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.itertuples.html), it returns the values as tuples

For each date (in this case, I've specified Year-Month), it creates a string version of the datetime ("2020-01") as the key and appends a list of lat/lon data for the month.

The data looks like this:
```python
            [('2015-01',
              [[47.246826, -123.121592],
               [45.4874214, -122.8916961],
               [37.694766, -97.280554],
               [39.380084, -76.820805]]),
             ('2015-02',
              [[40.273404, -76.712841],
               [34.417432, -117.176872],
               [35.917642, -77.54755],
               [33.619301, -114.450926]]),
             ('2015-03',
              [[34.043131, -118.244634],
               [29.704199, -95.621853],
               ...
```
It then uses the `OrderedDict()` tool to sort the dictionary on its first element (the date field).

In the HeatMapWithTime()
* `list(data.values())` pulls out the values from the dictionary -- the list of lat/lon values -- and turns them into a list.

* `list(data.keys())` does the same thing for the dictionary's keys, which is the dates.

(I found this solution on [Stackoverflow](https://stackoverflow.com/questions/64325958/heatmapwithtime-plugin-in-folium).)

In [None]:
m = folium.Map(location=[40, -96], zoom_start=4)
folium.TileLayer(provider).add_to(m)

# https://stackoverflow.com/questions/64325958/heatmapwithtime-plugin-in-folium

from folium.plugins import HeatMapWithTime

from collections import defaultdict, OrderedDict

data = defaultdict(list)

for row in ff_df.itertuples():
  data[row.date.strftime("%Y-%m")].append([row.latitude,row.longitude])

data = OrderedDict(sorted(data.items(),key = lambda t: t[0]))

hm = HeatMapWithTime(data = list(data.values()),
                     index = list(data.keys()),
                     radius = 10,
                     auto_play = True)

hm.add_to(m)

m

# Your Turn

Pick either of the two datasets used above and experiment a bit:
* Take the Civil War battle data and create a cluster or heat map.
  * Advanced: make a heatmap with time for the Civil War data.
* Take the Fatal Force data and draw some maps with Markers or CircleMarkers.
* Make a map of the FF data that has layers for the different years (2015-2022).