# Data Example: arealstatistik (raster), and working with Canton shapefiles (vector)

This Notebook has two goals:

1. To introduce you to the Federal dataset 'arealstatistik'
2. <a href="#shapefiles">To show how you can clip raster files with shapefiles, here using the Swiss Cantons.</a>

---

In [None]:
# reload module before executing code
%load_ext autoreload
%autoreload 2

# define modules locations (you might have to adapt define_mod_locs.py)
%run ../sdc-notebooks/Tools/define_mod_locs.py

# Load the datacube
import rioxarray as rio
import datacube
dc = datacube.Datacube()

import matplotlib.pyplot as plt

## Getting to know arealstatistik

See also the PDF about this dataset available on Moodle.

In [None]:
# config_cell

product = ['arealstatistik']

# Here, the measurements are not individual colour bands, 
# but instead are the different surveys with the desired number of classes.
measurements = ['AS85_27','AS18_27', 'AS18_4']

# At 100 m resolution (see below), it is feasible to load the entire of Switzerland at once,
# or you can specify lat/lon as usual.
longitude =  (7.05, 7.2) 
latitude =  (46.7, 46.85) 
crs = 'epsg:4326'

# time is not a valid dimension for the arealstatistik datasets - time is denoted only through the measurement name.

output_crs = 'epsg:2056'
# Let's look at the data at their native resolution of 100x100 m
resolution = -100.0, 100.0

In [None]:
# For this example I have removed the `time` dimension (as it isn't valid), 
# and the `longitude` and `latitude` keywords (to return all of Switzerland)
ds_in = dc.load(product = product,
                measurements = measurements,
                output_crs = output_crs, 
                resolution = resolution)

In [None]:
ds_in

In [None]:
# With this horrible colour scheme we can quickly take a look at land cover in the 1985 period over 27 classes.
ds_in.AS85_27.plot(cmap='nipy_spectral', size=(10))
plt.gca().set_aspect('equal')

In [None]:
# We can look at a single category like this:
ds_in.AS18_27.where(ds_in.AS18_27 == 27).plot(cmap='nipy_spectral', size=(10), vmin=0, vmax=27)
plt.gca().set_aspect('equal')

In [None]:
# https://stackoverflow.com/questions/9707676/defining-a-discrete-colormap-for-imshow-in-matplotlib
from matplotlib import colors
cmap = colors.ListedColormap(['black', '#eff7e4', '#c3e3ae', '#d3f0fd'])
bounds=[1,2,3,4,5]
norm = colors.BoundaryNorm(bounds, cmap.N)

ds_in.AS18_4.plot(cmap=cmap, norm=norm, size=(10))
plt.gca().set_aspect('equal')

<a name="shapefiles" > </a>

## Looking at canton Fribourg in the arealstatistik dataset

To do this we need to supply some extra information. In your sgg00425 directory there should be a folder `swissBOUNDARIES3D`, which contains shapefiles of the different administrative levels of Switzerland that we downloaded from the Federal Office of Topography for you. (https://www.swisstopo.admin.ch/en/geodata/landscape/boundaries3d.html)

In [None]:
# To work with the cantons data we need two additional modules
# We also need to be sure that rioxarray has been loaded before the datacube was opened...
# ...in the first cell of the notebook, make sure that "import rioxarray as rio" appears before "import datacube"
import geopandas as gpd
import shapely

In [None]:
# Open the Cantons shapefile
cantons = gpd.read_file('swissbounds/swissBOUNDARIES3D_1_4_TLM_KANTONSGEBIET.shp')

In [None]:
# Take a look at what data are provided with the file
cantons.head()

In [None]:
# Let's take a look at canton Fribourg
fribourg = cantons[cantons.NAME == 'Fribourg']

In [None]:
# How many rows do you expect to see here?
fribourg

In [None]:
# The cantons come from the 'SwissBoundaries3D' dataset. 
# As this name suggests, they contain not only X,Y data but also Z (elevation) information.
# The DataCube cannot understand the Z information, so we need to use this function here to remove it.
# Don't worry about the warning which appears!
fribourg.geometry = shapely.force_2d(fribourg.geometry)

In [None]:
fribourg.geometry

In [None]:
# Let's plot just the canton of Fribourg
# Further information on this operation is here: https://corteva.github.io/rioxarray/stable/examples/clip_geom.html
ds_in.AS18_4.rio.clip(fribourg.geometry).plot()

In [None]:
# We can save just the canton's results for further analysis (for all measurements that we loaded)
stats_fribourg = ds_in.rio.clip(fribourg.geometry)

In [None]:
# How about a histogram to briefly summarise land cover in the period ending 2018?
# Remove the `0` category as this is the masked areas outside canton Fribourg
stats_fribourg.AS18_4.plot.hist(range=(1,4))

In [None]:
# If we want to take a more detailed look using Pandas, 
# then we can first use groupby to count the pixels in each category
# and then finally we save it to a Pandas Series.
stats_pd = stats_fribourg.AS18_4.groupby(stats_fribourg.AS18_4).count().to_pandas()
stats_pd

In [None]:
# Let's convert these to percentages.
# We need to get rid of the 0-class, which is just masked areas outside canton Fribourg.
stats_pd = stats_pd.loc[1:4]
percentages = 100 / stats_pd.sum() * stats_pd

# What we should find is that Fribourg is 55% agricultural land.
percentages