#### GISC 420 T1 2021
# **LAB ASSIGNMENT 1**

In [None]:
%matplotlib inline

import matplotlib
import matplotlib.pyplot as pyplot
import geopandas

There's quite a lot to think about here, *but not actually very much code to write*. For submission instructions see the last cells of the notebook. But before getting to that, there is a fair bit to work through and chew on.

# Hexbins and map projection
This assignment takes a look at the implications of map projection for a popular visualization technique *hexbinning* when applied to geographical data over large geographical extents.

The exercise is inspired by this recent paper, which deserves an award for its great title, if nothing else

+ Battersby, S. E., D. “daan” Strebe, and M. P. Finn. 2016. [Shapes on a plane: evaluating the impact of projection distortion on spatial binning](http://www.tandfonline.com/doi/full/10.1080/15230406.2016.1180263). Cartography and Geographic Information Science :1–12.

The essential point of the paper is that the assumption that hexbinning is a good way to visualize density depends on how much area distortion is induced by the map projections in use.

To explore this we work with a global dataset of earthquake incidents, and also develop some code (I've done most of this) to make hexbin maps using `geopandas`.

## Some evenly distributed (but random) point data
Before getting started, let's load the dataset we will work with.

In [None]:
random_pts = geopandas.read_file('dots.gpkg')
random_pts.plot(markersize=0.25)

I generated these data as a set of 10,000 random points making sure they are evenly distributed (but random) across all of Earth's surface. These are completely fictitious data. In previous versions of this lab I have used earthquake data, and tried lightning flash data and even UFO data but it turns out no real data are evenly distributed (geography is a thing after all), so the point of this lab is made better by a random data set than by any real dataset.

We can put these on a base map of the world, just so we are sure where we are. `geopandas` provides a built in world map dataset. We drop Antarctica, because it often causes problems when doing projection of the whole Earth (also because it shows you how to do this).

In [None]:
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# get rid of Antarctica
world = world[world.continent != 'Antarctica'] # != is the python for 'not equal to'
# reproject to match the quakes data
world = world.to_crs(random_pts.crs)

Now map them.

In [None]:
fig = pyplot.figure(figsize=(6,10))
base = fig.add_subplot(111)
world.plot(ax=base, facecolor='w', edgecolor='k', linewidth=0.35)
random_pts.plot(ax=base, markersize=0.25)

At this point, you may already smell a rat (hint: what projection are these data in?)

## Hexbin maps
The best way to appreciate what hexbins are is to try them. `pyplot` has a built in hexbin function. Give it a try by running the cell below.

In [None]:
import random
# make 500 random x,y coordinates
x = [random.random() for __ in range(1000)]
y = [random.random() for __ in range(1000)]

fig = pyplot.figure()
ax = fig.add_subplot(111)
ax.set_aspect('equal') # this makes it look a bit nicer
# plot hexbins
pyplot.hexbin(x, y, gridsize=15, cmap='Greys')
# put the points on top for reference, note r. specifies points, not lines!
pyplot.plot(x, y, 'r.', markersize=2)

The idea is that the colored in plot makes it easier to identify 'hotspots' or concentrations in the data.  It is particularly useful with large number of points (try changing the number of points in the cell above).

Now, this is fine as far as it goes, but may not be ideal if our $(x,y)$ coordinates are actually geographical coordinates, because lest we forget *the Earth is not flat*. 

The built in hexbin plot doesn't treat the two coordinates as equal which is why I added the line `ax.set_aspect('equal')`.  But a more insidious difficulty is that geographic coordinates affect the *area* of the hexagons, so that the supposedly unbiased picture of the variations in density of points across the 'map' may not be unbiased at all.

### A geographic hexbin map
So... I've written a small 'wrapper' function for the hexbin function (below) which we can use instead.

Take a look in the cell below, before running it. 

You **absolutely don't need to understand all that is happening here in detail**, because you are just going to use the `get_hexbin_data()` function. I've split things into three functions, as follows.

#### `get_hexgrid_size_and_extent()` 
This function determines the number of hexes in the $x$ and $y$ directions and also the $x$ and $y$ coordinate range of the plot.

In [None]:
import shapely # needed for geometry manipulations to make the hexes GeoSeries

# Returns a suitable grid dimension and extent for hexbinning
# based on an input points GeoDataframe. nx specifies the
# desired number of hexagons in the x direction. A one hex
# wide 'buffer' is included.
def get_hexgridsize_and_extent(pts, nx=15):
    # we need this later
    sqrt3 = 3 ** 0.5
    
    # see http://geopandas.org/reference.html#geopandas.GeoSeries.total_bounds
    x_min, y_min, x_max, y_max = pts.total_bounds
    x_range = x_max - x_min
    y_range = y_max - y_min

    # use this information to give us a little more room - a buffer
    x_buffer = x_range / (nx * 2)
    y_buffer = x_buffer * sqrt3 # y needs more because hexes are taller than they are wide
    x_min = x_min - x_buffer
    x_max = x_max + x_buffer
    y_min = y_min - y_buffer
    y_max = y_max + y_buffer
    
    # the hexbin function needs a grid and extent
    grid_dimensions = (nx, int(nx * y_range / x_range / sqrt3))
    pt_extent = (x_min, x_max, y_min, y_max)
    
    return grid_dimensions, pt_extent

#### `get_x_and_y()` 
This function gets simple lists of x and y coordinates from the provided points `GeoDataFrame`

In [None]:
# Returns simple lists of the x and y coordinates
# of a supplied points GeoDataframe
def get_x_and_y(pts):
    x = [p.x for p in pts.geometry]
    y = [p.y for p in pts.geometry]
    return x, y

#### `get_hexbin_data()`
This one does the work and returns the hexbins as a `GeoDataFrame`

In [None]:
# makes a hexbin GeoDataFrame from supplied pt layer with the specified 
# nx number of hexes across 
# the tricky part is extracting hexagons from the PathCollection
# returned by pyplot.hexbin()
def get_hexbin_data(pt_layer, nx=15, show=True):
    grid_dim, pt_extent = get_hexgridsize_and_extent(pt_layer, nx)
    x, y = get_x_and_y(pt_layer)

    # use pyplot.hexbin to perform the analysis
    # retaining the output, details of which are available at
    # http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hexbin
    hb = pyplot.hexbin(x, y, extent=pt_extent, gridsize=grid_dim, visible=show)
    
    # retrieve the base hexagon as a shapely Polygon from the hexbin results
    base_hex = shapely.geometry.polygon.Polygon([xy[0] for xy in hb.get_paths()[0].iter_segments()])
    # make the array of hexbins by iterating over the 'offsets'
    hex_shapes = [shapely.affinity.translate(base_hex, xoff=dx, yoff=dy) for (dx, dy) in hb.get_offsets()]
    # now make a geopandas GeoDataFrame with these as its geometry column
    hexes = geopandas.GeoDataFrame(geometry=geopandas.GeoSeries(hex_shapes))
    # also add the counts from the hexbin results
    hexes['n'] = list(hb.get_array())
    # set the CRS
    hexes.crs = pt_layer.crs
    
    # return the GeoDataFrame
    return hexes

### Using `get_hexbin_data()`
So now let's use this function.

Here, you need to pay close attention, because you will be asked to do something similar yourself.

First, run the function to make hexbins, retaining both the result as `hexbins`.

In [None]:
hexbins = get_hexbin_data(random_pts)

Don't worry for now about the goofy shape of the output, which is the raw result from the `pyplot.hexbin()` plot.

Next we use a `geopandas` overlay operation to make a basemap we can use in a final map.

In [None]:
# overlay the world data with the hexbins 
# this will clip it to the extent we need
w = geopandas.overlay(world, hexbins, 'intersection')

And now we can build our map.

In [None]:
fig = pyplot.figure(figsize=(5,8))
base = fig.add_subplot(111)

# plot the world data
w.plot(ax=base, facecolor='grey')
# plot the hexbins
hexbins.plot(ax=base, column='n', cmap='Reds', alpha=0.75, legend = True)

Now, we are ready to put everything together and complete the assignment.  

# Assignment instructions
So... here's what you are required to do.

First, **save a new copy of this notebook to work in**. _Include your name in the file name_, so it will be easy for me to keep track of whose work it is.

Then using an **appropriate equal area projection** (use [this website](http://projectionwizard.org/) to discover the PROJ4 code for suitable projections) make a second hexbin map to go alongside the base one in the code cell below. 

To do this, you'll want to use the `to_crs()` function on the quakes `random_pts` dataset *to make a new projected version*. You will also need to apply the same projection to the `world` dataset to get the basemap right. Make sure you *make a new variable* when you do these transformation, or things will probably get confused (call them something like `random_pts_p` and `world_p`.

When you have successfully made these maps, what do you notice?. Why are the two maps different? 

Comment on the difference you notice, and also any aspects of the coding in the markdown text cell below the code cell.

You should complete the code cell below, and also the markdown cell that follows with any commentary.

**When you are finished, save the completed notebook for upload to the dropbox provided on blackboard.**

### Below is the code cell you should work in
You may find it useful to make a duplicate of it before you start.

In [None]:
## Make a two panel plot for the two hexbin maps
fig = pyplot.figure(figsize=(9,9)) # you might want to change the size 
fig.suptitle("Hexbin maps of evenly distributed random points\nacross two projections", fontsize=14)

ax = fig.add_subplot(121)
ax.set_title(str(random_pts.crs))

hexbins = get_hexbin_data(random_pts, show=False)
w = geopandas.overlay(world, hexbins, 'intersection')

w.plot(ax=ax, facecolor='grey')
hexbins.plot(ax=ax, column='n', cmap="Reds", alpha=0.75)

## Equal-area hexbins in panel 2
## Find an appropriate equal area projection and
## include it in the code outlined below
ax = fig.add_subplot(122)
# reproject the quakes and world datasets
# and redo the hexbin mapping

### Below you can provide commentary on your answer
Focus particularly on any coding challenges you encountered, but do also consider issues relating to the different hexbin results you obtain.


**Double-click below to edit**