# Geographic Data Visualization

In this lecture, we'll work some examples of *interactive, geographic visualization* using Python and the Plotly package. We'll be using Plotly much more in a bit when we learn about high-level, interactive HTML plotting. 

For today, we're focused on geographic visualization. Plotly makes it truly, unreasonably easy to create attractive maps. 

*To follow along with this lecture, you will need to install the `plotly` package in your PIC16B Anaconda environment. Basemaps will unfortunately not appear in the online version of these notes.*

## Creating Basemaps

In [17]:
import pandas as pd
coords = pd.DataFrame({
    "lon" : [-118.44145669988743], 
    "lat" : [34.06961990125789],
    "message" : ["Hello!"]
})
coords

Unnamed: 0,lon,lat,message
0,-118.441457,34.06962,Hello!


In [19]:
from plotly import express as px

fig = px.scatter_mapbox(coords, 
                        lat = "lat",
                        lon = "lon", 
                        hover_name = "message",
                        zoom = 17,
                        height = 300,
                        mapbox_style="open-street-map")

fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

Let's break this down a bit. The first line imports the `express` module of `plotly`, which provides a high-level interface to a variety of Plotly tools. One can also work directly with the low-level `graph_objects` module, which allows one a finer level of control over the settings of visualizations. We won't use `graph_objects` in this course. 

The magic happens starting on the third line, when we call `px.scatter_mapbox()`. The first argument must be a data frame. The `lat` and `lon` arguments tell `px` which columns contain the latitude and longitude coordinates. The `hover_name` specifies what should appear when we hover over the plotted point with our mouse. `zoom` controls the initial zoom level of the map, which can subsequently be modified by the user. `height` allows one to control the aspect ratio. There are many [other parameters](https://plotly.github.io/plotly.py-docs/generated/plotly.express.scatter_mapbox.html) to `px.scatter_mapbox()`. 

The final next two lines control which *map tiles* are used in the visualization and the amount of whitespace around the visualization. The final line actually displays the map. 

Now let's try changing up the zoom level and the map tiles. The `positron` tiles from CartoDB are very low-contrast, which is very helpful when creating plots that use these tiles as backgrounds. 

Maybe you dream of mountains, valleys, and beaches? 

Summing up, Plotly makes it unreasonably easy to create attractive, interactive maps in Python. Let's now go from "pretty maps" to "informative, scientific data graphics." 

# Visualizing Climate Measurement Stations

Let's now use our GHCN data on global temperatures to create some interesting visualizations. As a first step, we'll create a set of markers for different climate stations. First, let's grab the data on stations: 

In [None]:
import numpy as np

url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/noaa-ghcn/station-metadata.csv"
stations = pd.read_csv(url)
stations.head()

For the purposes of geographic plotting, the key columns here are the `LATITUDE` and `LONGITUDE` columns. Let's try plotting! 

Note that it might take a little while for the map to render. There are 27.5k points, which is kind of a lot! 

This is cool and interactive, but there are a few shortcomings if we want to display scientific information. It's hard to make comparisons -- for example, it looks like there might be a higher density of stations in the US than in many other areas, but it's hard to be sure from the map above. For comparing densities, *heatmaps* provided a useful approach. Ploty again makes this unreasonably easy. 

The colors get brighter and more intense the more stations there are in that area. We can notice a few things, such as the very high density of measurement stations in the US and Germany. 

However, it's harder to see patterns when we zoom in much more. If we want to look at patterns within Europe, for example, we might want to increase the radius. 

Experimentation with the [various arguments](https://python-visualization.github.io/folium/plugins.html) of the `HeatMap` function is usually necessary to obtain a good result. 

## Geographic Scatterplots

Another thing we might want to do is color code the climate stations according to some quantitative measure. Let's compute the average temperature in March for each one over the most recent decade, and use this to color code them. 

In [None]:
interval = "2011-2020"
url = f"https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/noaa-ghcn/decades/{interval}.csv"
temps = pd.read_csv(url)

First we'll compute the average in March for each station. 

Next, we'll *merge* the latitude/longitude data from the `stations` data frame. 

Great! This is the data we need. Now we can supply this data to `px.scatter_mapbox`, using as the value of `color` the name variable that we want use to shade the points. 

This plot makes it easy to see that countries near the equator tend to be warmer (at least in March). 

# Saving and Sharing

To save your visualization as HTML, just use `write_html` from `plotly.io`. 

You can then send this file to people you'd like to impress! You can also move this file to the `_includes` directory of your website, after which you can include it in your blog posts as demonstrated [here](https://pic16b.github.io/plotly-example/).

# Choropleths

A *choropleth* is a polygon-based visualization, in which different geographic polygons are assigned different colors. If you've ever seen a map of election results by state, or of CO2 emissions by country, you've seen a choropleth. 

Let's make one! We'll visualize the average March temperature for each country. We need two things: 

1. A data frame containing the average march temperature for each country. 
2. A GeoJSON file containing the coordinates for the country polygons. 

GeoJSON's are pretty complex files, but fortunately we don't really need to interact with them too much. The code below uses the `json` module to read a GeoJSON file from the web. This file contains the borders of countries. 

In [None]:
from urllib.request import urlopen
import json

countries_gj_url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/countries.geojson"

with urlopen(countries_gj_url) as response:
    countries_gj = json.load(response)

GeoJSON files can be very complicated, and often contain large quantities of metadata. For our purposes, we only need the name of the country and the shape in coordinates, which is supplied by the `geometry` feature: 

The next thing we need is temperature data! The code below uses the `merge` function introduced in a previous lecture to add the name of the country to the data frame containing the station temperature readings. 

In [None]:
countries_url = "https://raw.githubusercontent.com/mysociety/gaze/master/data/fips-10-4-to-iso-country-codes.csv"
countries = pd.read_csv(countries_url)

countries.head()

In [None]:
# extract the FIPS code in the temps data frame and merge


In [None]:
# compute the mean temperature in march, in degrees C. 


And now we're done with our data prep! We now need to use `px.choropleth` to create the map. We need to pass the data frame of temperature data, the GeoJSON file, and some additional information. 

- `locations`: We need to indicate which column in `march_avgs_by_country` to use as the identifiers of countries. 
- `locationmode`: We need to specify that the values in the columns passed to `locations` are names of countries and not, say, FIPS ID codes. 
- `color`: We need to state which column should be used to determine the color of each country. 

We did it! Drawing GeoJSON files can require substantial computational effort, and so it might take a while for this code to run. Note that there are a few countries that are missing data, indicated by light gray. These correspond to cases in which there wasn't an entry of `march_avgs_by_country` matching the country name in the GeoJSON. This can occur either because there truly is no data or because there was a discrepancy inthe labels. In the latter case, we could improve the situation by data cleaning.  

# Learn More

This is just a taste of geographic data visualization. There are many other kinds of tasks we might want to perform. You can find a number of helpful examples of using Plotly and Plotly Express to create attractive geographic data visualizations [here](https://plotly.com/python/maps/). 