# Visualising Geographic Data With Python

* [Youtube Link](https://www.youtube.com/watch?v=ZIEyHdvF474)
* Speaker: Christopher Roach
* Date: 14/08/2016
* [Con info page](https://pydata.org/sfo2016/schedule/presentation/32/)

## Intro

There's [a few good visualisations of different projection innacuracies](https://youtu.be/ZIEyHdvF474?t=7m11s).

> all models are wrong, but some are useful
> 
> George Box

The same applies to maps - [no map can show the whole world without distortion](https://youtu.be/ZIEyHdvF474?t=5m53s) due to it being round, while the map is flat. The innacuracy comes from the projection.

On a practical note, [basemap](https://matplotlib.org/basemap/) is the relevant plugin for Matplotlib

### [A Tale of Two Coordinate Systems](https://youtu.be/ZIEyHdvF474?t=10m6s)

This section's an overview of moving from a geographic coordinate system to a planar one. This is a good runthrough of coordinate systems, so I'll make notes in full.

* Angular measurement.
* Surface to project those measurements onto (spheroid / ellipsoid).
* Offset for that surface.

[Another good visualisation, showing different types of level.](https://www.youtube.com/watch?v=ZIEyHdvF474&feature=youtu.be&t=11m0s)

* Geoid - The gravity profile of the Earth.
* Mean sea level.
* Spheroid / ellipsoid.

That then gives us two options for the type of datum:

* Local / regional datum - The centre of the geoid is offset from the Earth's mass.
    * Fits a local area well, but not for general use.
* Geocentric datum - The centre of the geoid is the same as the the Earth's mass
    * Fits the world as a whole.

Then map projections can be:

* Cylindrical - good for a linear strip around the world,
* Conical - accurate around a circlular strip,
* Azimuthal - most accurate at a particular point,

## [Good Map Design](https://www.youtube.com/watch?v=ZIEyHdvF474&feature=youtu.be&t=14m54s)

When designing a map, the projection is important - what are we preserving:

* Angle (Conformal) - e.g. Mercator (for navigation).
* Area (Equal Area)
* Distance (Equidistant) - remember this is distance from *a given point*.

As always, we lie when creating maps - just try to do it in a helpful way.

The Robinson Projection ends up with some nice results, being designed more for an aesthetically pleasing world map than a particular metric of accuracy. For a thematic map is should generally be fine (though is *not* good for reference maps).

### [Choosing the Right Map](https://www.youtube.com/watch?v=ZIEyHdvF474&feature=youtu.be&t=17m49s)

Four examples the speaker goes through:

* Choropleths
* Dot Density Maps
* Proportional / Graduated Symbol Maps - Data-based points (including charts as points).
* Catograms

### [Classifying Data](https://www.youtube.com/watch?v=ZIEyHdvF474&feature=youtu.be&t=19m32s)

Three examples of how to do this:

* Equal Intervals - Equal size of ranges
* Quantiles - Equal count of observations
* Natural Breaks - Maximise inter-class differences, minimise intra-class variance
    * Generally a good place to start.
    * A general algorithm is 'Jenks' - kinda like a univariant k-means classified.
* Manual Alterations - Always a good point.

As always - the histograms are a good place to visualise breaks.

### [Choosing the Right Colours](https://www.youtube.com/watch?v=ZIEyHdvF474&feature=youtu.be&t=21m03s)

* Sequential - Ordinal data, typically by lightness, sometimes by hue too.
* Diverging - Contrasting hue to show differences, lightness to show magnitude.
    * e.g. Two colours for the end-points of positive and negative with lightness indicating magnitude of each.
* Qualitative - Different groups, different hues.
    * So long as they don't look sequental.

As always, [Cynthia Brewer's work](http://colorbrewer2.org) is great.


## [Case Study - Population Growth](https://www.youtube.com/watch?v=ZIEyHdvF474&feature=youtu.be&t=22m04s)

An example of improving maps is [this data blog from the Washington Post](https://www.washingtonpost.com/news/wonk/wp/2016/04/11/the-dirty-little-secret-that-data-journalists-arent-telling-you/?utm_term=.4c5268f07ae4).

In it he pokes some holes in this map:

* Poor choice of classes - in almost all but a few cases, there's only 2 classes used.
* Poor choice of colour - this is a divergent data.

## [Examples](https://www.youtube.com/watch?v=ZIEyHdvF474&feature=youtu.be&t=23m33s)

These come from a book the author wrote [with notes being on his GitHub page](https://github.com/croach/oreilly-matplotlib-course).

For now I've not got basemap installed on Windows, so I'll not run through them here, but the above notes seem a great place to jump off from.
