---

<img src="./images/anchormen-logo.png" width="500">

---

# Spatial Visualization using geopandas

<img src="./images/choropleth_map_world.png" width="1000">

*Note: This notebook can be viewed as an interactive slideshow using the RISE library*

- RISE: https://github.com/damianavila/RISE
- 4 minute video tutorial: https://www.youtube.com/watch?v=sXyFa_r1nxA

## Contents

- A. Mapping libraries overview
- B. Geopandas overview
- C. Mapping example
- D. Choropleth maps
    - Exercise
- E. Working with Shapefiles
    - Exercise
- F. Interactive visualizations
    - Exercise
- G. Summary & Additional Resources
    - Optional: satellite data demo

# Introduction

### Power of maps
- Cholerage Outbreak 1854, Soho, Londen. Mapped by John Show.
- Outbreak caused by a water pump: cholera does not spread by air, but via water.


<img src="./images/cholera_outbreak_map.png" width="800">

### Maps are projections 
- To enable visualization on a flat surface, the spherical globe needs to be projected on a flat surface in some way
- All projections causes distortions, there is no one best projections, what is best depends on the applications.
- The Mercator projection has been developed for martime purposes, it preserves angles, but distortes areas especially around the poles. (developed by Gerardus Mercator, introduced in 1569)

<img src="./images/mercator_projection.jpg" width="500">

#### Example: mercator projection area distortion
<img src="./images/greenland_africa_mercator_projection.png" width="1000">

# A. Useful mapping libraries

- **Geopandas**
    - `+` Lightweight, ties several libraries together: pandas, shapely, gdal, fiona, pyproj, rtree
    - `+` Implements GeoDataFrame, GeoSeries objects
    - `-` Some dependencies have been known to cause (OS dependent) issues

Notes:
- pandas: for data handling, supports DataFrames, Series
- geos: Google Earth Overlay Server
- shapely: supports geometries (points, lines, polygons) and geometric operations on them
- gdal: Geospatial Data Abstraction Library, translator library for raster and vector geospatial data formats
- fiona: read and write real-world data using multi-layered GIS formats (such as shapefiles)
- pyproj: performs cartographic transformations and geodetic computations (projections)
- rtree: advanced spatial indexing features (such as nearest neighbor search, intersection search, etc.)

- **Gmplot**
    - `+` For quick plotting on google maps and generating interactive html
    - `-` Not very extensive

- **Plotly**
    - `+` For interactive plotting (not specifically spatial data)
    - `-` Some features require online plotly or MapBox account

- **Bokeh**
    - `+` For interactive plotting (not specifically spatial data)
    - `+` Supports interactive handlers (such as drop-down menus)
    - `-` Bit more complicated (runs local Bokeh server)

- **mplleaflet**
    - `+` Converts Matplotlib plots into Leaflet web maps, with OpenStreetMap background
    - `+` Easy to use
    - `+` Interactive
    - `-` Doesn't scale well

- **Folium**
    - `+` Converts Matplotlib plots into Leaflet web maps, with OpenStreetMap background
    - `+` Easy to use
    - `+` Interactive
    - `-` Doesn't scale well

- **Basemap**
    - `+` Powerful library
    - `+` Extensive projections
    - `+` Easy plotting coastlines, countries, rivers, backgrounds, etc
    - `+` Adjusting plotting granularity
    - `-` Bit harder to install (on Windows)
    - `-` Can be slow for some plots
    - `-` Harder to plot chloropleth maps

## Today's Focus

- Mostly `geopandas` and some `mplleaflet` for interactivity
- Optionally: `Folium`, some useful API's

# B. Geopandas Introduction

## Info Sources

- Geopandas documentation: http://geopandas.org

## Data Structures

`geopandas` implements two main data structures:

- **GeoSeries**
- **GeoDataFrame**

These are subclasses of `pandas` Series and DataFrame, respectively.

`geopandas` has three basic classes of geometric objects:

- **Points** (or multi-points)
- **Lines** (or multi-lines)
- **Polygons** (or multi-polygons)

These are actually `shapely` objects

## Geopandas features

GeoSeries and GeoDataFrame support **all pandas operations** (such as slicing, sampling, merging, etc).

On top of that `geopandas` supports:

- Reading, writing spatial data (Shapefiles)
- Standard datasets: world map, cities dataset
- Slicing, selecting spatial data
- Plotting maps
- Handling projections
- Distances between objects
- Spatial joins
- Geometric manipulations: centroid, boundary, convex_hull, rotate, scale, etc

## Installation

To install the released version, you can use pip:

    pip install geopandas

or you can install the conda package from the conda-forge channel:

    conda install -c conda-forge geopandas

[See http://geopandas.org/install.html]

# C. Mapping example

## Libraries and settings

In [None]:
# Libraries
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd

# Settings
%matplotlib inline
pd.options.mode.chained_assignment = None

## Standard datasets in geopandas

In [None]:
# Check which standard datasets are available
gpd.datasets.available

In [None]:
# Load datasets
gdf_countries = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
gdf_cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))

In [None]:
# View dataframe: countries
gdf_countries.head(3)

In [None]:
# Note the geometry column containing the spatial data
type(gdf_countries.geometry)

In [None]:
# View dataframe: cities
gdf_cities.head(3)

## Create simple spatial plot

In [None]:
# Plot and color by continent
gdf_countries.plot();

## Configure axes

In [None]:
# Set axes
fig, ax = plt.subplots(figsize=(12,12), subplot_kw={'aspect':'equal'})
ax.set_axis_off()
gdf_countries.plot(ax=ax, column='continent');

## Add cities

In [None]:
fig, ax = plt.subplots(figsize=(12,12), subplot_kw={'aspect':'equal'})
ax.set_axis_off()
gdf_countries.plot(ax=ax, column='continent')
# Add cities on top of countries
gdf_cities.plot(ax=ax, color='red', markersize=5);

## Annotate with country labels

In [None]:
fig, ax = plt.subplots(figsize=(12,12), subplot_kw={'aspect':'equal'})
ax.set_axis_off()
gdf_countries.plot(ax=ax, column='continent');
gdf_cities.plot(ax=ax, color='red', markersize=5)
# Add country labels
gdf_countries.apply(
    lambda x: ax.annotate(s=x.iso_a3, xy=x.geometry.centroid.coords[0], ha='center', size=3), 
    axis=1);

# D. Choropleth Maps

## Choropleth map

A choropleth map (from Greek χῶρος ("area/region") + πλῆθος ("multitude")) is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. [Source: https://en.wikipedia.org/wiki/Choropleth_map]

<img src="./images/australian_demographics_christianity_anglican_persons.png" width="500">

## Example data: GDP per capita

In [None]:
# Calculate GDP per Capita for all countries with nonzero population estimate and exluding Antarctica
gdf_countries_selected = gdf_countries[(gdf_countries.pop_est > 0) & (gdf_countries.name != "Antarctica")]
gdf_countries_selected['gdp_per_cap'] = gdf_countries_selected.gdp_md_est / gdf_countries_selected.pop_est * 1000000

In [None]:
# Check dataframe, enriched with gdp_per_cap
gdf_countries_selected.head(3)

## Create Plot

In [None]:
# Plot (requires PySal, to be installed using 'conda install pysal')
fig, ax = plt.subplots(figsize=(16,6), subplot_kw={'aspect':'equal'})
ax.set_axis_off()
gdf_countries_selected.plot(column='gdp_per_cap', cmap='RdYlBu', legend=True, ax=ax);

## Colormaps

Matplotlib supports a long list of color maps: https://matplotlib.org/users/colormaps.html

For a complete list, use the `colormaps()` method in `pyplot`

In [None]:
# Available colormaps
plt.colormaps()

## Exercise: creating choropleth maps

See notebook: `Exercise-1-choropleth-maps.ipynb`

# E. Working with Shapefiles 

## What are Shapefiles?

In the field of geographical visualization a lot of data is stored as `shapefiles`. Shapefile is a vector data format that stores features such as **points**, **lines** and **polygons**, which can represent roads, rivers, municipality borders etc.

A shapefile is not just a single file as the name suggests, but contains a **collection of files** with a common filename prefix, stored in the same directory!

A shapefile has **3 mandatory files**, with the following extensions:

- **`.shp`** — shape format; the **feature geometry itself**
- **`.shx`** — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly
- **`.dbf`** — attribute format; columnar attributes for each shape

There are also several optional files in the shapefile format. The most significant of these is the **`.prj`** file which describes the **coordinate system** and **projection** information used.

## Urban mapping example: the city of Utrecht

### Data source

We manually downloaded a detailed shapefile from **Urban Atlas**: https://www.eea.europa.eu/data-and-maps/data/urban-atlas/

Urban Atlas provides pan-European comparable land use and land cover data for Large Urban Zones.

### Reading shapefile

In [None]:
# Read shapefile
gdf_urban = gpd.read_file('./geodata/nl004l_utrecht/nl004l_utrecht.shp')

In [None]:
# Check geodataframe
type(gdf_urban)

In [None]:
# Check contents
gdf_urban.head(3)

### Some preprocessing (for quicker plotting)

In [None]:
# Select subset of data (for quicker plotting)
gdf_urban = gdf_urban[(gdf_urban.geometry.area > 100000)]
gdf_urban = gdf_urban[gdf_urban['ITEM'] != "Other roads and associated land"]

# Size of data_selected
print(gdf_urban.shape)

## Visualizing the shapefile contents

In [None]:
# Plot shapes with coloring based on 'ITEM' column
fig, ax = plt.subplots(figsize=(14,14), subplot_kw={'aspect':'equal'})
gdf_urban.plot(column='ITEM', ax=ax);

Optionally, zoom in to a rectangular area using the axis method in matplotlib.pyplot:
```
fig, ax = plt.subplots(figsize=(14,14), subplot_kw={'aspect':'equal'})
gdf_urban.plot(column='ITEM', ax=ax);
plt.axis((3980000,4000000,3220000,3240000))
plt.show()
```

## Exercise: working with shapefiles

See notebook: Exercise-2-shapefiles

# F. Interactive visualizations

## Mplleaflet

A Python library for creating **interactive, zoomable plots** on top of an **OpenStreetMap** layer, in a simple way

More info: https://github.com/jwass/mplleaflet

### Installation

    pip install mplleaflet

## Example: Roads in The Netherlands

### Data

Let's have a look at a shapefile that contains all major roads in The Netherlands.

This shapefile has been downloaded from **Rijkswaterstaat**, the government office responsible for all Dutch roads, waterways, etc.

**Data source**: https://www.rijkswaterstaat.nl/apps/geoservices/geodata/dmc/nwb-wegen/geogegevens/shapefile/

### Load shapefile with roads

In [None]:
# Load shapefile
#gdf_roads = gpd.read_file('./geodata/NWB-Light/nwb-light.shp')
gdf_roads = gpd.read_file("./geodata/NWB-Light-converted/nwb-light.shp")

In [None]:
# Check contents: every line seems to contain a road
gdf_roads.head()

### Plot roads (non-interactively first)

In [None]:
# Plot
fig, ax = plt.subplots(figsize=(10,10), subplot_kw={'aspect':'equal'})
gdf_roads.plot(column='ROUTE', legend=False, ax=ax);

### Some preprocessing

In [None]:
# Filter highways only (ROUTE starting with a number)
gdf_highways = gdf_roads[gdf_roads.ROUTE.str.match('^[0-9]')]

In [None]:
gdf_highways.head()

## Making the map interactive

In [None]:
# Libraries
import mplleaflet

# Convert to standard coordinate reference system (i.e. EPSG:4326 = WGS-84)
#gdf_highways_converted = gdf_highways.to_crs({'init': 'epsg:4326'})

# Interactive visualization
fig, ax = plt.subplots(figsize=(10,10), subplot_kw={'aspect':'equal'})
plot = gdf_highways.plot(color='red', ax=ax)
mplleaflet.display(fig=plot.figure)

## Caveat

For **large numbers of datapoints** mplleaflet is observed to have issues displaying them inline in a Jupyter Notebook.

## Tip

An alternative would be to **export** the rendered map to an **interactive html file**, use `mplleaflet.show()` instead of `mplleaflet.display()`

In [None]:
# Interactive visualization, exported to html file
fig, ax = plt.subplots(figsize=(10,10), subplot_kw={'aspect':'equal'})
plot = gdf_highways.plot(color='red', ax=ax)
mplleaflet.show(fig=plot.figure)

## A short note on dealing with coordinate systems

- Note that the coordinates of the previous were not in degrees longitude/latitude. Instead, they use the Dutch coordinate system of 'Rijksdriehoekcoordinaten'.
- `mplleaflet` can display different coordinate systems, by specifying them using the `crs` parameter (we observed issues in certain library versions though):

    `mplleaflet.display(fig, crs={'init': 'epsg:28992'})`

- `geopandas` handles coordinate systems as follows:
    - Getting coordinate system: `gdf.crs`
    - Setting coordinate system: `gdf.to_crs()`

In [None]:
# Get coordinate system (in this case custom defined)
gdf_highways.crs

In [None]:
# Changing coordinate system
gdf_highways.to_crs({'init': 'epsg:3395'}).head(3)

**More info**:

- List of coordinate systems: https://epsg.io
- Rijksdriehoekscoordinaten: https://nl.wikipedia.org/wiki/Rijksdriehoeksco%C3%B6rdinaten

## Exercise: interactive spatial visualizations

See notebook: `Exercise-3-interactive-spatial-visualizations.ipynb`

## Optional Exercise

See notebook: `Exercise-OPTIONAL-interactive-visualization-with-shapefile.ipynb`

# G. Optional: Demos

## Demo: Useful Functionality

- OpenStreetMap API
- Folium visualization
- More examples

See Demo notebook

## Demo: Working with Satellite Data

- Google Earth Engine API

See `additional-notebooks/Satellite-data-google-earth-engine.ipynb`

# H. Summary & Additional Resources

## Summary

In this lesson we learned about the following topics:

- **Mapping libraries overview**: geopandas, gmplot, plotly, folium, basemap
- **Geopandas basics**: data structures, basic features, installation
- **Mapping example**: standard datasets, creating spatial plots, adding points, labels
- **Choropleth maps**: how to create them, how to use different colormaps
- **Shapefiles**: their structure; how to obtain, load, preprocess and visualize shapefiles
- **Interactive maps**: using mplleaflet, folium, dealing with coordinate systems

All in all, we saw that `geopandas` is a very useful library for handling and visualizing geospatial data. Combined with the `mplleaflet` library it is possible to create nice interactive (zoomable) visualizations on top of OpenStreetMap maps.

Optionally, in the extra demo's we learned about extracting `OpenStreetMap` features, `Folium`, and working with Satellite Data using the `Google Earth Engine` API.

## Additional resources

- Geopandas documentation: http://geopandas.org/
- Geopandas example: https://gist.github.com/jorisvandenbossche/7b30ed43366a85af8626
- Geopandas video: https://www.youtube.com/watch?v=bWsA2R707BM
- Mplleaflet: https://github.com/jwass/mplleaflet
- Folium: https://python-visualization.github.io/folium
- Shapefiles (wikipedia): https://en.wikipedia.org/wiki/Shapefile
- Coordinate systems: https://epsg.io
- OpenStreetMap API: https://pypi.org/project/osmxtract/
- Google Earth Engine API: https://developers.google.com/earth-engine/python_install

## Geo data sources

- Natural Earth (10 m resolution geodata): http://www.naturalearthdata.com/downloads/10m-cultural-vectors
- Global Administrative Areas (GADM): http://www.gadm.org/
- Urban Data Atlas (European Environmental Agency): https://www.eea.europa.eu/data-and-maps/data/copernicus-land-monitoring-service-urban-atlas
- Geodata Netherlands (Rijkswaterstaat): https://www.rijkswaterstaat.nl/apps/geoservices/geodata/dmc/

---

<img src="./images/anchormen-logo.png" width="500">

---

# Thank you for your attention!