# Rhine level prediction

Predict the rhine level in Bonn.

Source for the data:

    Wasserstraßen- und Schifffahrtsverwaltung des Bundes (WSV),
    bereitgestellt durch die Bundesanstalt für Gewässerkunde (BfG).

which (unofficially) translates to

    German Federal Waterways and Shipping Administration (WSV),
    provided by the German Federal Institute of Hydrology (BfG).

You may need to install additional packages. This can be done using the following command:

In [6]:
import pandas as pd
import numpy as np
from mllab.rhinelevel import wsv, dwd
CACHE = './cache'

The dataset is expected to be stored in the folder `data` relative to this notebook.

In [7]:
levels = wsv.RiverLevelData('data/riverlevels.tar.bz2', CACHE)

Extract files...
  File Andernach-W15.zrx
  File Bingen-W15.zrx
  File Bonn-W15.zrx
  File Frankfurt-Osthafen-W15.zrx
  File Kalkofen-Neu-W15.zrx
  File Kaub-W15.zrx
  File Koblenz-Rh-W15.zrx
  File Koblenz-UP-Mosel-W15.zrx
  File Mainz-W15.zrx
  File Oberwinter-W15.zrx
  File Oestrich-W15.zrx
  File Raunheim-W15.zrx
  File Rockenau-SKA-W15.zrx
  File Speyer-W15.zrx
  File Worms-W15.zrx
Done.


Collect the station data into a Pandas DataFrame

## Real-time prediction

The real time measurements for the stations are obtained by using the recent=True parameter:

```python
df = levels.to_frame(recent=True)
```

This will return a dataframe with measurements from the last 30 days, also in 15 minute intervals.

In [15]:
df = levels.to_frame(recent=True)
df

Parse Andernach
Parse Bingen
Parse Bonn
Parse Frankfurt Osthafen
Parse Kalkofen Neu
Parse Kaub
Parse Koblenz
Parse Koblenz Up
Parse Mainz
Parse Oberwinter
Parse Oestrich
Parse Raunheim
Parse Rockenau Ska
Parse Speyer
Parse Worms


Unnamed: 0,Andernach,Bingen,Bonn,Frankfurt Osthafen,Kalkofen Neu,Kaub,Koblenz,Koblenz Up,Mainz,Oberwinter,Oestrich,Raunheim,Rockenau Ska,Speyer,Worms
2024-05-26 18:00:00+02:00,475.0,326.0,502.0,175.0,277.0,390.0,406.0,389.0,426.0,424.0,307.0,151.0,255.0,542.0,383.0
2024-05-26 18:15:00+02:00,475.0,326.0,502.0,173.0,276.0,390.0,406.0,389.0,427.0,424.0,307.0,147.0,255.0,542.0,383.0
2024-05-26 18:30:00+02:00,475.0,325.0,502.0,173.0,277.0,388.0,406.0,390.0,426.0,424.0,307.0,151.0,255.0,541.0,383.0
2024-05-26 18:45:00+02:00,475.0,325.0,502.0,173.0,277.0,389.0,405.0,386.0,426.0,424.0,307.0,150.0,255.0,542.0,383.0
2024-05-26 19:00:00+02:00,476.0,325.0,503.0,172.0,277.0,388.0,405.0,386.0,426.0,424.0,306.0,150.0,255.0,541.0,383.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-06-26 16:45:00+02:00,397.0,309.0,422.0,159.0,193.0,366.0,332.0,282.0,408.0,350.0,290.0,124.0,233.0,558.0,377.0
2024-06-26 17:00:00+02:00,397.0,307.0,422.0,159.0,193.0,366.0,332.0,281.0,408.0,350.0,290.0,129.0,232.0,558.0,377.0
2024-06-26 17:15:00+02:00,396.0,307.0,422.0,159.0,192.0,365.0,332.0,281.0,408.0,350.0,289.0,128.0,236.0,557.0,377.0
2024-06-26 17:30:00+02:00,396.0,307.0,421.0,160.0,190.0,365.0,332.0,281.0,408.0,350.0,289.0,129.0,234.0,557.0,376.0


### Pandas info

The DataFrame `df` is index by a time series, see the [documentation](https://pandas.pydata.org/pandas-docs/stable/timeseries.html) for more details.

To get a NumPy array just run `df.values`. You can select which columns you want by

```python
df[['Worms', 'Kaub']].values
```

## Remarks

- Not all stations start at the same time, in this case there are `NaN` values.
- Also, if there were some failures in which case there is also a `NaN` stored.
    - The [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/missing_data.html) might be helpful.
- You might want to select a subset of available stations at the beginning
- You can also check how far you can make predictions into the future
- A wavelet transformation could be interesting as a feature map, check out [PyWavelet](https://pywavelets.readthedocs.io/en/latest/)

In [None]:
df[['Worms', 'Kaub']].values