In [3]:
import numpy as np
import pandas as pd
import scipy

import matplotlib as mpl
import matplotlib.pyplot as plt

import io
import requests
from zipfile import ZipFile
from pathlib import Path
import calendar

# Local Effects of Climate Change

## Question

> How did the local climate in Gießen change over the past 70 years?

## Data sources

We need data for temperature, (relative) humidity, and precipitation collected near Gießen. Such data has been collected by Deutscher Wetterdienst (DWD) and is publically available via their open-data portal at https://opendata.dwd.de.

### Downloading the data

The function below will download are tarball containing the measurements as well as some metadata from a given URL. The tarball is extracted and placed in a specified directory.

In [17]:
def download_and_extract(
    url: str, 
    output_path: Path = Path('tmp/dwd')
) -> None:
    """download DWD climate data from url and extract."""
    output_path.mkdir(exist_ok=True, parents=True)

    (ZipFile(io.BytesIO(requests.get(url).content))
     .extractall(path=output_path)
    )

In [18]:
TMP_DIRECTORY = Path("_dwd")
DWD_data = 'https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/'
temperature_humidity = 'air_temperature/historical/stundenwerte_TU_01639_19500101_20221231_hist.zip'
precipitation = 'precipitation/historical/stundenwerte_RR_01639_19970320_20221231_hist.zip'

download_and_extract(url=DWD_data + temperature_humidity, output_path=TMP_DIRECTORY / "temperature_and_humidity")
download_and_extract(url=DWD_data + precipitation, output_path=TMP_DIRECTORY / "precipitation")

## Metadata

The downloaded datasets include specific *metadata files*. These contain the units, encodings, and meaning of the columns.

`(A)` For dates and measurement values, find out in which format they are available, and consider how you might convert them to usable and correct values for analysis with `pandas`.

* `txt`-Datei `produt_stunde_...`
    * Daten in einem csv.artigen Format
    * Trennzeichen ";"
* Zeitangeben ändert sich in 1996
    * ab 1996-10-01: UTC
    * davor: in MEZ = UTC+1
* Zeiten "encodet" als JahrMonatTagStunde
    * pandas wird das als `int64` importieren
    * wir müssen das konvertieren 
* Größen in physikalischen Einheiten:
    * Niederschlagsmenge als mm
    * Temperatur in ${}^\circ\textrm{C}$
    * Luftfeuchtigkeit in %
    * andere als Codes

## Data Import and Cleaning

### `(R)` Importing

Import the temperature and humidity data into a DataFrame `df_th`, and the precipitation data into a DataFrame `df_p`. Consider which columns you want to keep.

In [21]:
df_th = pd.read_csv(TMP_DIRECTORY / "temperature_and_humidity" / "produkt_tu_stunde_19500101_20201231_01639.txt", sep=";", encoding="latin1", na_values=-999, usecols=["MESS_DATUM", "TT_TU", "RF_TU"])
df_th

FileNotFoundError: [Errno 2] No such file or directory: '_dwd/temperature_and_humidity/produkt_tu_stunde_19500101_20201231_01639.txt'

In [7]:
...

Ellipsis

### `(A)` Tweaking

Modify the dataframes in the following manner:

* Rename the column labels in a reasonable manner.
* Convert the datetimes at which the measurements were conducted to a proper Pandas `datetime` format (make them refer to UTC) and make this column the index of the DataFrame.
* Remove all lines with missing values (due to, e.g., failed measurements)
* Apply type conversions where appropriate.


#### Notes
* Be careful when removing "missing values" (as they are called in [the documentation of the dataset](https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/precipitation/historical/DESCRIPTION_obsgermany_climate_hourly_precipitation_historical_en.pdf)). There might by many "missing values" in some colums but you will still want to keep the line in the DataFrame in order to notlose too many measurements. Counting the number of missing values in each column might help get an overview.
* we suggest placing all required code in a dedicated tweaking function, e.g.

```python
def tweak_temperature_and_humidity(df: pd.DataFrame) -> pd.DataFrame:
    # your code goes here

df_th = tweak_temperature_and_humidity(df_th)
```

In [8]:
...

Ellipsis

In [9]:
...

Ellipsis

### `(T)` Merge DataFrames

[Merge](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) the DataFrame containing the data for the temperature and the humidity with that containing the values for the amount of precipitation.

The new DataFrame (you might call it `df_weather`) shall also have the datetimes of the measurements in the index.

#### Notes 

* The time interval of available measurements in the *merged* DataFrame is determined by the DataFrame with the smaller interval.
* Try to use the new DataFrame in the following tasks. If you do not succeed in merging the two DataFrames it is also fine to use the individual DataFrames to solve the tasks.

In [10]:
df_weather = (
...
)

## Data Visualization

In the following tasks you will asked to visualise certain aspects of the data.

Try to make the visualisations as "compelling", understandable, and expressive as possible by adding e.g. axes labels (with units if required), plot titles and reasonable scales

### `(A)` Seasonal contributions to precipitation

Determine the contribution to the total yearly amount of precipitation of each [**meteorological season**](https://www.dwd.de/DE/service/lexikon/Functions/glossar.html?lv3=101324&lv2=101304). Visualise the results in a single plot and make sure that the contributions of the single seasons can be distinguished.

In [11]:
...

Ellipsis

### `(A)` The Summer of 2003

By looking at the plot from the previous task it becomes obvious that the amount of precipitation in 2003 was much lower than in other years.

Search for hints in the data that support this observation. Generate two plots that further explain the observation (e.g. distributions, seasonal effects).

*Note*: In fact, the [summer of 2003](https://en.wikipedia.org/wiki/2003_European_heat_wave) has been quite special from a metereological point of view.

In [12]:
...

Ellipsis

In [13]:
...

Ellipsis

## Data Analysis

### `(A)` Correlation between precipitation and humidity

Make a plot that correlates the amount of *actually fallen liquid precipitation* with the relative humidity. Also add temperature information in the *same* plot.

In [14]:
...

Ellipsis

### `(A)` Correlation between humidity and temperature

* Make a plot that correlates the relative humidity with the temperature. Can you see a trend? 

In [15]:
...

Ellipsis

* Make another plot that shows the (mean) temperature and the (mean) relative humidity over the course of the day.

In [16]:
...

Ellipsis