**Data Visualization course - winter semester 20/21 - FU Berlin**

*Tutorials adapted from the [Information Visualization](https://infovis.fh-potsdam.de/tutorials/) course at the FH Potsdam*

# Tutorial 4: Geovisualization

In this installment of the information visualization tutorials we will be analyzing and visualizing geographic data; i.e., data that refers to geospatial entities. Geospatial entities can, for example, be particular places such as schools and libraries or political boundaries of cities or countries. Of course, this tutorial only scratches the surface. Consider this as a teaser into geovisualization, which in itself has become a branch of research and practice at the intersection of geography and visualization. We will only touch on a few basic steps to get your feet wet and hands dirty.


## 🛒 1. Prepare 

As you come to expect by now we first assemble our tools and then prepare the data. 

In [1]:
import altair as alt
import pandas as pd
from vega_datasets import data

### Load Data

As usual, we need to get ou data into our notebook first:

In [2]:
covid_data = pd.read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Additionally to our usual dataset we are going to use another dataset which contains a mapping of different country ISO codes and the avergae coordinates of each country.

In [3]:
code_lookup = pd.read_csv("country_lookup.csv")

In [4]:
code_lookup.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 256 entries, 0 to 255
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Country              256 non-null    object 
 1   Alpha-2 code         255 non-null    object 
 2   Alpha-3 code         256 non-null    object 
 3   Numeric code         256 non-null    int64  
 4   Latitude (average)   256 non-null    float64
 5   Longitude (average)  256 non-null    float64
dtypes: float64(2), int64(1), object(3)
memory usage: 12.1+ KB


Finally we also need data which tells us how countries actually look like in order to visualize them properly. This information is encoded in TopoJSON, an extension of GEOJSON, which is able to encode topology in the often used JSON serialization format.

In [5]:
countries = alt.topo_feature(data.world_110m.url, 'countries')

In [6]:
countries

UrlData({
  format: TopoDataFormat({
    feature: 'countries',
    type: 'topojson'
  }),
  url: 'https://vega.github.io/vega-datasets/data/world-110m.json'
})

## 2. Present

### Simple map projection

In [7]:
alt.Chart(countries).mark_geoshape(
    stroke='white',
    fill='#A9A9A9'
).project(
    type='mercator'
)

In [8]:
map = alt.Chart(countries).mark_geoshape(
    stroke='white',
    fill='#A9A9A9'
).project(
    type='mercator',
    scale=250,
    center=[20,55],
    clipExtent= [[0,0], [400, 300]]
)
map

### Graduated Symbols

In [19]:
country_infections = covid_data[['iso_code', 'total_cases_per_million']].groupby('iso_code').max().reset_index()

In [20]:
country_infections.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 211 entries, 0 to 210
Data columns (total 2 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   iso_code                 211 non-null    object 
 1   total_cases_per_million  210 non-null    float64
dtypes: float64(1), object(1)
memory usage: 3.4+ KB


In [21]:
merged_lookup_data = country_infections.merge(code_lookup, left_on='iso_code', right_on='Alpha-3 code').rename(columns={'Numeric code': 'id'})
merged_lookup_data

Unnamed: 0,iso_code,total_cases_per_million,Country,Alpha-2 code,Alpha-3 code,id,Latitude (average),Longitude (average)
0,ABW,39282.168,Aruba,AW,ABW,533,12.50,-69.9667
1,AFG,1022.366,Afghanistan,AF,AFG,4,33.00,65.0000
2,AGO,190.043,Angola,AO,AGO,24,-12.50,18.5000
3,AIA,199.973,Anguilla,AI,AIA,660,18.25,-63.1667
4,ALB,5350.963,Albania,AL,ALB,8,41.00,20.0000
...,...,...,...,...,...,...,...,...
213,VNM,11.393,Vietnam,VN,VNM,704,16.00,106.0000
214,YEM,68.900,Yemen,YE,YEM,887,15.00,48.0000
215,ZAF,11675.709,South Africa,ZA,ZAF,710,-29.00,24.0000
216,ZMB,840.842,Zambia,ZM,ZMB,894,-15.00,30.0000


In [22]:
symbols = alt.Chart(merged_lookup_data).mark_circle().encode(
    longitude='Longitude (average):Q',
    latitude='Latitude (average):Q',
    size=alt.Size('total_cases_per_million:Q', legend=None),
    tooltip=['Country', 'total_cases_per_million'],
).project(
    type='mercator',
    scale=250,
    center=[20,55],
    clipExtent= [[0,0], [400, 300]]
)

map + symbols

### Chloropleth Map

In [23]:
test_data = {c:1 for c in merged_lookup_data.id}

In [24]:
alt.Chart(countries).mark_geoshape(
    stroke='white'
).encode(
    color='total_cases_per_million:Q',
    tooltip=['Country:O','total_cases_per_million:Q']
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=merged_lookup_data, key='id', fields=['total_cases_per_million', 'Country'])
).project(
    type='mercator',
    scale=250,
    center=[20,55],
    clipExtent= [[0,0], [400, 300]]
)

Challenge visualization!

Try to visualize a region of your choice with one of the above shown methods! Tip: Try different projections as well!

## Sources

Tutorials & Documentation
- [Specifying Geospatial Data in Altair — Altair 4.1.0 documentation](https://altair-viz.github.io/user_guide/data.html#geospatial-data)
- [GeoPandas](https://geopandas.org)
- [OSMPythonTools](https://github.com/mocnik-science/osm-python-tools)
- [GeoPy](https://geopy.readthedocs.io/)

Additionally I recommend looking at the procedure described in the [original tutorial](https://infovis.fh-potsdam.de/tutorials/infovis8geovis.html) by the FH Potsdam, which uses a rather custom approach to visualizing geospatial data in Altair.