# Exercise 1.2: Visualize uncertainty
prepared by M.Hauser

In this exercise we will repeat the material we learned in the first exercise and will get to know more plotting functions:

 * `errorbar` to visualize error bars
 * `fill_between` to add uncertainty bands
 * `axhline` and `axvline` to add horizontal and vertical lines that span the whole axes
 * `axhspan` and `axvspan` to add horizontal and vertical patches that span the whole axes

As example data, we will use global mean temperature from all CMIP5 models (Taylor et al., 2012). The data was prepared in another [notebook](./../data/prepare_CMIP5_tas_time_series.ipynb).

We will develop a plot showing the time evolution and model uncertainty of global mean temperature from 1870 to 2100 using CMIP5 data.

## Imports

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

## Load  & process data

In [None]:
# load data

file = "../data/cmip5_tas_rcp85_ts.nc"

ds = xr.open_dataset(file)

# read data as numpy array
year = ds.year.values
tas = ds.tas.values

> We are reading a netCDF file using xarray and then read them as numpy arrays. Usually, you would keep them as xarray objects.

In [None]:
# print some info

print("shape of time axis:", year.shape)
print("shape of temperature data:", tas.shape)

print("")
print("Excerpt of time:", year[:5])
print("Excerpt of temperature data:", tas[:3, 0])

In [None]:
# calculate the anomaly with respect to 1971..2000

# select all years in this range
sel = (year >= 1971) & (year <= 2000)

# calculate the climatology for each model
clim = tas[:, sel].mean(axis=1)

# calculate the anomaly

# we need to add an axis such that the broadcasting works
tas_anom = tas - clim[:, np.newaxis]

### Explanation of the last line

`tas` has shape (40, 231) and `clim` has shape (40). To calculate `clim -  tas` the variable `clim` needs to have shape (40, 1) - it is then automatically broadcast to the shape (40, 231). This can be achieved with `clim[:, np.newaxis]`. For more details see the [numpy broadcasting rules](https://numpy.org/doc/stable/user/basics.broadcasting.html).

> In xarray the equivalent operation would be:

In [None]:
da_tas = ds.tas - ds.tas.sel(time=slice("1971", "2000")).mean("time")

# check the two ways give the same result
np.testing.assert_allclose(da_tas.values, tas_anom)

## Exercise 

 * Plot the multi model mean temperature anomaly (Hint: `mmm = tas_anom.mean(axis=0)`)
 * Make the line thicker (Hint: `linewidth` or `lw`).
 * Add x- and y- labels.
 * Add a title. Set the `fontsize` to 14
 * Add a horizontal line at 0. (Hint: `ax.plot(ax.get_xlim(), [0, 0], color='0.1')`)
 * Realize that this is not very helpful.
 * Use `ax.axhline` instead. Set the linewidth to 0.5, and the color to a light gray.
 * Use `ax.axvspan` to shade the years of the climatology (1971 to 2000).

In [None]:
f, ax = plt.subplots()

### Solution

In [None]:
f, ax = plt.subplots()

mmm = tas_anom.mean(axis=0)

h = ax.plot(year, mmm, lw=2)

ax.set_ylabel("T anomaly (°C)")
ax.set_xlabel("Time")

ax.set_title("Global mean temperature", fontsize=14)

ax.axhline(0, color="0.1", lw=0.5)

ax.axvspan(1971, 2000, color="0.75")

## Exercise
 
 * Continue with the previous plot (see below).
 * Add each model as individual line.
 * There are way too many models to differentiate between them, so paint them all in a light blue (e.g. use `"#a6bddb"` as color).

In [None]:
f, axs = plt.subplots()

mmm = tas_anom.mean(axis=0)

# this loops through each row in the array
for y in tas_anom:
    # plot here
    pass

h = ax.plot(year, mmm, lw=2)

ax.set_ylabel("T anomaly (°C)")
ax.set_xlabel("Time")

ax.set_title("Global mean temperature", fontsize=14)

ax.axhline(0, color="0.1", lw=0.5)

ax.axvspan(1971, 2000, color="0.75")

### Solution

In [None]:
f, ax = plt.subplots()

for y in tas_anom:
    ax.plot(year, y, "#a6bddb")


mmm = tas_anom.mean(axis=0)

h = ax.plot(year, mmm, lw=2)

ax.set_ylabel("T anomaly (°C)")
ax.set_xlabel("Time")

ax.set_title("Global mean temperature", fontsize=14)

ax.axhline(0, color="0.1", lw=0.5)

ax.axvspan(1971, 2000, color="0.85")

## Errorbar

Plotting the uncertainty of data may just be as important as plotting the data itself. A basic errorbar can be created using `plt.errorbar`.

#### create some data including uncertainty

In [None]:
x = np.arange(0, 2 * np.pi, 0.25)

y = np.sin(x) + np.random.randn(*x.shape) * 0.25

y_err = np.random.uniform(0.25, 0.75, x.shape)

In [None]:
plt.errorbar(x, y, yerr=y_err, linestyle="", marker="o")

There are various of ways to format the errorbars:

In [None]:
plt.errorbar(
    x,
    y,
    yerr=y_err,
    marker=".",
    linestyle="",
    color="black",
    ecolor="0.75",
    elinewidth=3,
    capsize=5,
);

## Exercise

 * Let's replace the individual models by errorbars indicating the standard deviation.
 * Replace the `ax.plot` command with `ax.errorbar`.
 * Use a slightly lighter blue (`"#74a9cf"`) for the color of the error bars. 
 * It has too many error lines. Read the docstring of errorbar (`ax.errorbar?`) to find out if we can only plot every 5th error bar.


In [None]:
f, ax = plt.subplots()

mmm = tas_anom.mean(axis=0)

# calculate std
# std =

# replace plot
h = ax.plot(year, mmm, lw=2)

ax.set_ylabel("T anomaly (°C)")
ax.set_xlabel("Time")

ax.set_title("Global mean temperature", fontsize=14)

ax.axhline(0, color="0.1", lw=0.5)

ax.axvspan(1971, 2000, color="0.85")

### Solution

In [None]:
f, ax = plt.subplots()

mmm = tas_anom.mean(axis=0)

# calculate std
std = tas_anom.std(axis=0)

# plot errorbar
ax.errorbar(year, mmm, lw=2, yerr=std, errorevery=5, elinewidth=1, ecolor="#74a9cf")

ax.set_ylabel("T anomaly (°C)")
ax.set_xlabel("Time")

ax.set_title("Global mean temperature", fontsize=14)

ax.axhline(0, color="0.1", lw=0.5)

ax.axvspan(1971, 2000, color="0.85")

## Continuous Errors

For continuous errors the `errorbar` function is not very convenient, but we can use `fill_between`. This function takes `x`, `y1`, and `y2` as input and shades the region between `y1` and `y2`.


In [None]:
x = np.arange(0, 2 * np.pi, 0.1)

f, ax = plt.subplots()

ax.fill_between(x, np.sin(x), np.cos(x), color="0.75")
ax.plot(x, (np.sin(x) + np.cos(x)) / 2)

## Exercise

 * Let's replace the individual models by a shaded region indicating the standard deviation.
 * Use `ax.fill_between`.
 * The box indicating the reference period gets plotted over the std of the models, use the `zorder` keyword in `axvspan` to correct this.
 


In [None]:
f, ax = plt.subplots()

mmm = tas_anom.mean(axis=0)

# calculate std
std = tas_anom.std(axis=0)

# plot here


h = ax.plot(year, mmm, lw=2)

ax.set_ylabel("T anomaly (°C)")
ax.set_xlabel("Time")

ax.set_title("Global mean temperature", fontsize=14)

ax.axhline(0, color="0.1", lw=0.5)

ax.axvspan(1971, 2000, color="0.85")

### Solution

In [None]:
f, ax = plt.subplots(1, 1)

mmm = tas_anom.mean(axis=0)
std = tas_anom.std(axis=0)

ax.fill_between(year, mmm - std, mmm + std, color="#a6bddb")

h = ax.plot(year, mmm, lw=2)

ax.set_ylabel("T anomaly (°C)")
ax.set_xlabel("Time")

ax.set_title("Global mean temperature", fontsize=14)

ax.axhline(0, color="0.1", lw=0.5)

ax.axvspan(1971, 2000, color="0.85", zorder=0)