# Choropleths

A *choropleth* is a polygon-based visualization, in which different geographic polygons are assigned different colors. If you've ever seen a map of election results by state, or of CO2 emissions by country, you've seen a choropleth. 

Let's make one! We'll visualize the average March temperature for each country. We need two things: 

1. A data frame containing the average march temperature for each country. 
2. A GeoJSON file containing the coordinates for the country polygons. 

GeoJSON's are pretty complex files, but fortunately we don't really need to interact with them too much. The code below uses the `json` module to read a GeoJSON file from the web. This file contains the borders of countries. 

In [None]:
from urllib.request import urlopen
import json

countries_gj_url = "https://raw.githubusercontent.com/pic16b-ucla/24F/main/datasets/countries.geojson"

with urlopen(countries_gj_url) as response:
    countries_gj = json.load(response)

In [None]:
import pandas as pd
interval = "2011-2020"
url = f"https://raw.githubusercontent.com/pic16b-ucla/24F/main/datasets/noaa-ghcn/decades/{interval}.csv"
temps = pd.read_csv(url)

GeoJSON files can be very complicated, and often contain large quantities of metadata. For our purposes, we only need the name of the country and the shape in coordinates, which is supplied by the `geometry` feature: 

In [None]:
countries_gj["features"][1]

The next thing we need is temperature data! The code below uses the `merge` function introduced in a previous lecture to add the name of the country to the data frame containing the station temperature readings. 

In [None]:
countries_url = "https://raw.githubusercontent.com/mysociety/gaze/master/data/fips-10-4-to-iso-country-codes.csv"
countries = pd.read_csv(countries_url)

countries.head()

In [None]:
# extract the FIPS code in the temps data frame and merge
temps["FIPS 10-4"] = temps["ID"].str[:2]
temps = pd.merge(temps, countries, on = "FIPS 10-4")

In [None]:
# compute the mean temperature in march, in degrees C. 
march_avgs_by_country = temps.groupby("Name")[["VALUE3"]].mean() / 100
march_avgs_by_country = march_avgs_by_country.reset_index()
march_avgs_by_country.head()

And now we're done with our data prep! We now need to use `px.choropleth` to create the map. We need to pass the data frame of temperature data, the GeoJSON file, and some additional information. 

- `locations`: We need to indicate which column in `march_avgs_by_country` to use as the identifiers of countries. 
- `locationmode`: We need to specify that the values in the columns passed to `locations` are names of countries and not, say, FIPS ID codes. 
- `color`: We need to state which column should be used to determine the color of each country. 

In [None]:
from plotly import express as px
fig = px.choropleth(march_avgs_by_country,
                    geojson=countries_gj,
                    locations = "Name",
                    locationmode= "country names",
                    color= "VALUE3",
                    height = 300)

fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

We did it! Drawing GeoJSON files can require substantial computational effort, and so it might take a while for this code to run. Note that there are a few countries that are missing data, indicated by light gray. These correspond to cases in which there wasn't an entry of `march_avgs_by_country` matching the country name in the GeoJSON. This can occur either because there truly is no data or because there was a discrepancy inthe labels. In the latter case, we could improve the situation by data cleaning.  

# Learn More

This is just a taste of geographic data visualization. There are many other kinds of tasks we might want to perform. You can find a number of helpful examples of using Plotly and Plotly Express to create attractive geographic data visualizations [here](https://plotly.com/python/maps/). 