In [None]:
import pandas as pd
import geopandas as gpd

# `geopandas == pandas + geometry`

In [None]:
ages = pd.read_csv("data/welly-ages-final.csv")
ages

In [None]:
sa1_geoms = gpd.read_file("data/sa1-wellington.gpkg")
sa1_geoms

We can merge these datasets based on the shared SA1 codes, although we have to specify the attribute names since they don't match:

In [None]:
welly_ages = sa1_geoms.merge(
    ages, left_on = "SA12023_V1_00", right_on = "sa1_code")
welly_ages

Turns out that while the codes appear the same, in the left-hand table (the geometries) the code is stored as text, while in the right-hand table (the ages data) it is stored as an integer (this happens fairly often...). We change the type of one or the other to force the matching to work. Or, as below, we can add a new column with the right type and a name to match:

In [None]:
ages["SA12023_V1_00"] = ages.sa1_code.astype(str)
welly_ages = sa1_geoms.merge(ages)
welly_ages

And we can make a map!

In [None]:
ax = welly_ages.plot(
    column = "age_25_29", cmap = "Reds", k = 9, 
    ec = "k", lw = 0.1, figsize = (10, 10))
ax.set_axis_off()

## Before we all get too excited
Some background on `geopandas`. 

In essence, `geopandas` simply adds to `pandas` `GeoSeries` and `GeoDataFrame` classes of object. A `GeoSeries` is a `pandas` `Series` that contains geometries, and also knows what coordinate reference system it's in. And a `GeoDataFrame` is a `pandas` `DataFrame` that can contain one (or more) columns that are `GeoSeries`. Usually the geometry column will be called `geometry` or `geom`.

Let's take a look at the `GeoSeries` in this dataset.

In [None]:
welly_ages.geometry

OK... that's not hugely informative. What about a single (multi)polygon?

In [None]:
welly_ages.geometry[0]

This is the `shapely` module's slightly silly way of showing us a polygon (or any other geometry for that matter). `shapely` is the underlying package on which `geopandas`'s handling of geometry is based. To get a better idea of what's going on we can `print` a geometry.

In [None]:
print(f"{welly_ages.geometry[0]}")

If we want to look closer still we can use the [`shapely` API](https://shapely.readthedocs.io/) to interrogate a geomtry further. For example

In [None]:
[p for p in list(welly_ages.geometry[0].geoms)[0].exterior.coords]

or

In [None]:
welly_ages.geometry[0].area

or even

In [None]:
welly_ages.geometry[0].buffer(100)

But delving deeply into the details of how geometries are handled in `geopandas` is beyond the scope of these sessions. Suffice to say you can dig into the details of individual geometries, pick them apart, and rebuild them if needed (and if you know what you are doing).

It's much more likely you will apply geometric operations to geometries as collections of objects in `GeoDataFrame` form. In that context perhaps of more interest is the handling of coordinate reference systems.

In [None]:
welly_ages.crs

In [None]:
welly_ages.to_crs(3857).crs

Projecting data into a new coordinate reference system really is that simple!

In the next notebook, we'll make some maps.