# Data loading module

The [`loader`](weatherlyser/loader.py) module has been prepared for your convenience to load the data from two sources:
* Open Meteo API
* Czech Hydrometeorological Institute
You will not need to modify this module up to just one exercise described below.

The module uses already some techniques and libraries that are worth mentioning:
* [`httpx`](https://www.python-httpx.org/) library for HTTP requests.
* [`Pydantic`](https://docs.pydantic.dev/latest/) library for easier and safer deserialisation of JSON responses.
* [`Pandera`](https://pandera.readthedocs.io/en/stable/) for Pandas schemas validation.
* Type annotations to be checked by [`mypy`](https://mypy.readthedocs.io/en/stable/).

### Load data from Open Meteo API

This is how we can use the loaders and processors for this workshop.
Note that we will use the higher level functions as we we are not interested in the details of the data loading.


In [None]:
import pandas as pd

from weatherlyser import api_models, loader, processors


request = api_models.ArchiveQueryParameters(
    latitude=50.1003,
    longitude=14.2555,
    hourly=[
        "temperature_2m",
        "relativehumidity_2m",

    ],
    start_date=pd.Timestamp("2010-03-21"),
    end_date=pd.Timestamp("2010-03-31"),
    models=["best_match", "era5"],
)

response = loader.load_open_meteo_archive_data(request)

response

In [None]:
df_wide = processors.open_meteo_response_to_dataframe(response)
df_wide

In [None]:
df_tidy = processors.tidy_open_meteo_dataframe(df_wide)
df_tidy

## Usage of Pandera in CHMI loader

* We have defined `CHMIDailyDataFrame` Pandera model to validate the outputs of `load_chmi_data`. *Do not look at it now---your task is to implement it in the next exercise.*
* This function does a non-trivial transformation of the input Excel file to a tidy DataFrame.

```python
@pa.check_types
def load_chmi_data(
    path: str | pathlib.Path = DEFAULT_CHMI_DATA_PATH,
) -> DataFrame[pa_models.CHMIDailyDataFrame]:
    """Load historical weather data from ČHMÚ"""
    excel_data = pd.ExcelFile(path)

    # Read all sheets but the first one
    extracted_sheets = [
        extract_and_clean_chmi_excel_sheet(excel_data, sheet_name)
        for sheet_name in excel_data.sheet_names[1:]
    ]
    return (
        pd.concat(extracted_sheets, axis=1)
        .rename(columns=CZ_EN_TRANSLATION)
        .rename(columns=lambda c: c.replace(" ", "_"))
    )
```

**Exercise:** 

*Important❗:* Before starting this exercise, checkout specific versions of some of the files in the `weatherlyser` package
as the current version contains the solution to this exercise already 😏.

To do this, run the following command:

In [1]:
!git checkout origin/remove-solution-04 -- weatherlyser/loader.py weatherlyser/pa_models.py

Implement a Pandera `DataFrameModel` called `CHMIDailyDataFrame` for the output of the `load_chmi_data` function and use it to validate the output of this function.

1. Create `CHMIDailyDataFrame` class in the `weatherlyser/pa_models.py` file.
2. Add all columns as `float` types, `coerce` the data by default: 
```
average_temperature
maximum_temperature
minimum_temperature
wind_speed
air_pressure
humidity
precipitation
total_snow_depth
sunshine
```
1. Add `date` index column (you will need `pandera.typing.Index` for this) with `pd.Datetime` type.
2. Add `@pa.check_types` decorator to the `load_chmi_data` function.
3. Test your solution by running `pytest tests/test_chmi_loader.py`.

*Optional:*

6. Add additional value checks, e.g. `air_pressure` to be within sensible limits.
7. Use data types that reflect more closely the data and / or use less memory.

After the exercise, *restore* the original version of the files by running:


In [2]:
!git checkout main -- weatherlyser/loader.py weatherlyser/pa_models.py