<a href="https://colab.research.google.com/github/pySTEPS/ERAD-nowcasting-course-2022/blob/hands-on-users/hands-on-session-users/notebooks/block_02_input_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Excercise 1: Read, transform and visualize input data

In this example, we start with a couple of radar and nowcasting basics to get you familiar with the data and the available tools:


*   Installation of the pysteps nowcasting tool and required Python packages
*   Ways to load typical radar data from meteorological offices using pysteps
*   Radar rainfall data visualization tools
*   Pre-processing steps that are needed for nowcasting with radar data.


Let's first run the helper notebook to install pysteps and configure it. This step will need to be repeated for every excercise (unless you copy and paste the excercise information in the same notebook), because Colab notebooks are independent of each other and it's not possible to save the state of a notebook and use it in another one. Simply click on the run button in the "[  ]" area of the code block below. Note that installing pysteps and all required packages will take a few minutes.

In [None]:
from google.colab import drive
import os
# mount the Google Drive folder
# don't attempt to remount if the drive is already mounted
if not os.path.exists("/content/mnt/MyDrive"):
  drive.mount("mnt")
%cd '/content/mnt/MyDrive/Colab Notebooks'
# run the helper notebook to configure the environment
%run helper_setup_pip.ipynb

## Load the example dataset

Now that we have initialized the notebook, let's import the example KNMI dataset using the [load_dataset()](https://pysteps.readthedocs.io/en/latest/generated/pysteps.datasets.load_dataset.html) helper function from the `pysteps.datasets` module. The dataset contains radar-derived rain rates from the Netherlands for the 26th of August 2010. This was a day with record rainfall in the east of the Netherlands, which locally led to floods. This time series contains 14 elements (i.e. 1 hour and 10 minutes). This data is already available in the pysteps github repository, but if you have other radar data locally on your machine, you can import it with the same tool. With pysteps, it is directly possible to import data from a variety of meteorological offices (KNMI, RMI Belgium, MeteoSwiss, OPERA (European composite), MRMS (Continental United States composite), FMI in Finland and the Australian BoM. Other importers will have to be added manually, but since almost all meteorological offices use HDF5 formats for their radar data, this is generally not a difficult step when using the already existing importers as example.

In [None]:
from pysteps.datasets import load_dataset
precip, metadata, timestep = load_dataset('knmi')

The load_dataset() function returns the following values:

* precip: a numpy array with (time, y, x) dimensions
* metadata: a dictionary with additional information, see below
* timestep: separation between each sample in the time series (minutes)



Then we can print the metadata using [pprint](https://docs.python.org/3/library/pprint.html).

In [None]:
from pprint import pprint
pprint(metadata)

This should have printed the following key-value pairs:

*   `accutime`: accumulation time (minutes) for computing the quantity contained in the data
*   `cartesian_unit`: the distance unit of the geographical coordinates
*   `institution`: institution providing the data
*   `product`: name of the product
*   `projection`: PROJ-compatible projection definition
*   `threshold`: the minimum observed value
*   `timestamps`: list of timestamps, one for each element in the returned data array
*   `transform`: applied transformation to the data values (if any)
*   `unit`: the unit of the data
*   `x1`: x-coordinate of the lower-left corner of the domain in geographical coordinates
*   `x2`: x-coordinate of the upper-right corner of the domain in geographical coordinates
*   `xpixelsize`: pixel size in x-direction (meters)
*   `y1`: y-coordinate of the lower-left corner of the domain in geographical coordinates
*   `y2`: y-coordinate of the upper-right corner of the domain in geographical coordinates
*   `yorigin`: 'upper' or 'lower' depending on whether the origin of the coordinate system is in the lower-left or upper-left corner
*   `ypixelsize`: pixel size in y-direction (meters)
*   `zerovalue`: value corresponding to no precipitation
*   `zr_a`: the a-coefficient in the Z(R) relationship Z=a*R^b applied to the data (if representing rain rate)
*   `zr_b`: the b-coefficient in the Z(R) relationship Z=a*R^b applied to the data (if representing rain rate)


## Plot the data

A good start when working with radar data, is to visualize the data first. We will use the [plot_precip_field](https://pysteps.readthedocs.io/en/stable/generated/pysteps.visualization.precipfields.plot_precip_field.html#pysteps.visualization.precipfields.plot_precip_field) function from the `pysteps.visualization.precipfields` module to plot the data. Here we plot the last element of the time series and take its timestamp from the metadata. The plotting is done for one colormap implemented in pysteps (the standard pysteps colormap). Here we also plot the longitude-latitude lines by supplying the `drawlonlatlines` option in `map_kwargs`. Note that in addition to the no precipitation values (light color), we have the gray region containing NaN values (i.e. those outside the radar domain or not valid measurements).

**Exercise**

Use the information in [plot_precip_field](https://pysteps.readthedocs.io/en/stable/generated/pysteps.visualization.precipfields.plot_precip_field.html#pysteps.visualization.precipfields.plot_precip_field) to plot the data using a different color map. Try some and pick the one you like. In addition, add a title to your plot.
As you may have noticed, we generally plot (radar) rainfall data on a logarithmic axis. Do you have an idea why we do that?

In [None]:
from pysteps.visualization import plot_precip_field
from matplotlib import pyplot as plt

# Disable warnings
import warnings
warnings.filterwarnings("ignore")

plt.figure(figsize=(18, 5))
# set the title to the timestamp of the last precipitation field
plt.suptitle(metadata["timestamps"][-1])

map_kwargs = {"drawlonlatlines": True}

# plot the last precipitation field
plt.subplot(111)
plot_precip_field(
    precip[-1],
    geodata=metadata,
    colorscale="pysteps",
    map_kwargs=map_kwargs
) 

## Additional datasets and data processing

Next we load the FMI dataset that we will use in the following exercices. Again, this time series contains 14 elements (i.e. 1 hour and 10 minutes). For computational reasons (saves you time waiting), we upsample the data by a factor of two, so that the spatial resolution will be 2 km instead of the original 1 km resolution. This is done by using [utils.dimension.aggregate_fields_space](https://pysteps.readthedocs.io/en/stable/generated/pysteps.utils.dimension.aggregate_fields_space.html#pysteps.utils.dimension.aggregate_fields_space). Note that the metadata is also updated so that the spatial extent of the composite does not change, only its spatial resolution.

**Exercise**

Run the code block below and add the colorscale that you selected in the previous step.

In [None]:
from pysteps.datasets import load_dataset
from pysteps.utils.dimension import aggregate_fields_space

plt.figure(figsize=(7, 10))

precip, metadata, timestep = load_dataset('fmi')
print(f"Original shape: {precip.shape}")
# Set the aggregation window to 2*pixel size (km) in the x- and y-directions
precip, metadata = aggregate_fields_space(
    precip,
    metadata,
    (2*metadata["xpixelsize"], 2*metadata["ypixelsize"])
)
print(f"Shape after upsampling: {precip.shape}")

plot_precip_field(
    precip[-1],
    geodata=metadata,
    title=metadata["timestamps"][-1],
    map_kwargs=map_kwargs
)

## Rainfall rate distribution

Run the code below and and inspect the rainfall rate distribution of the FMI data that you have plotted in the previous step. 
What can you say about the distribution, is it a normal distribution?

In [None]:
import numpy as np

# Use the last available composite and discard any invalid values
valid_precip_values = precip[-1][~np.isnan(precip[-1])]

bins = np.linspace(0.1, 18, 20)

plt.figure()
plt.hist(valid_precip_values, bins=bins, log=True, edgecolor='black')
plt.autoscale(tight=True, axis='x')
plt.xlabel("Precipitation rate [mm/h]")
plt.ylabel("Counts")
plt.show()

### Data transformations

The histogram shows that precipitation rate values have a non-Gaussian and asymmetric distribution that is bounded at zero. Also, the probability of occurrence decays extremely fast with increasing precipitation rate values (note the logarithmic y-axis). This can cause issues when estimating the motion field or applying the nowcasting methods.

For the above reason, we can convert the precipitation rate values (in mm/h) to a more symmetric distribution by applying the following logarithmic transformation:

\begin{equation}
R\rightarrow
\begin{cases}
    10\log_{10}R, & \text{if } R\geq 0.1\text{mm h$^{-1}$} \\
    -15,          & \text{otherwise}
\end{cases}
\end{equation}

The transformed values correspond to logarithmic precipitation rates in units of dBR. The value of −15 dBR is equivalent to assigning a precipitation rate of approximately 0.03 mm h$^{−1}$ to the zeros. This can be done by using the `dB_transform` method in the [transformation](https://pysteps.readthedocs.io/en/stable/pysteps_reference/utils.html#pysteps-utils-transformation) module of pysteps.

Run the code below to apply a standard dBR transformation to the FMI data.

In [None]:
from pysteps.utils import transformation

# Log-transform the data to dBR with threshold of 0.1 mm/h and fill value of
# -15 dBR
precip_dbr, metadata_dbr = transformation.dB_transform(
    precip,
    metadata,
)

Let's again plot the distribution of the data after the transformation. What does the distribution look like now?

In [None]:
valid_precip_values = precip_dbr[-1][~np.isnan(precip_dbr[-1])]

bins = np.linspace(-10, 10, 25)

plt.figure()
plt.hist(valid_precip_values, bins=bins, edgecolor='black')
plt.autoscale(tight=True, axis='x')
plt.xlabel("Transformed precipitation rate [dB]")
plt.ylabel("Counts")
plt.show()

In principle, the above should resemble the normal distribution. However, the left side of the distribution is closer to uniform due to the low accuracy of radar observations in this range (i.e. low signal-to-noise ratio) and the limited numerical accuracy of the storage format of the FMI data. If we want to have normally distributed data, it's better to apply a different transformation. 

Have a look at [this example](https://pysteps.readthedocs.io/en/stable/auto_examples/data_transformations.html#sphx-glr-auto-examples-data-transformations-py) and the [transformation module](https://pysteps.readthedocs.io/en/stable/pysteps_reference/utils.html#pysteps-utils-transformation) and pick a better transformation. Apply this transformation in the code block below using the two previous code blocks as starting point. 

In [None]:
# Apply your new transformation here!

For this case, our preferred transformation is the normal quantile transformation. Did you also come to this conclusion? For more information about this method, see the reference [1] at the bottom of this page.

## References

[1] K. Bogner, F. Pappenberger and H. L. Cloke. Technical Note: The normal quantile transformation and its application in a flood forecasting system, Hydrol. Earth Syst. Sci., 16, 1085-1094, https://doi.org/10.5194/hess-16-1085-2012, 2012.