# Reporter Tutorial

This notebook shows the usage of the `hw_predictor/components/preprocessor` package. Note that the package
is designed to automatically work with Kubeflow Pipelines, but this will be an introduction to
better understand how implemented functions can be imported for experimentation use in notebooks.

In [1]:
# to ensure developed modules are reloaded automatically and there's no need
# to restart the kernel
%load_ext autoreload
%autoreload 2


In [2]:
from os import chdir

# change working directory to project's root path, this improves the interaction
# with the data/ and hw_predictor/ folders
chdir("../..")

# Imports

In [3]:
import pandas as pd
import hw_predictor.components.reporter.src as reporter

# Parameters

In [4]:
input_path = "data/test/output/stations"
station_id = 330020
year = 2022

seasonal = False
save = True
output_path = "data/test/output/stations"

# Code

Prior to code execution, have to ensure that needed project environment variables are set. This can
be done with the following command assuming there's already an `.env` file in the project root directory.

```bash
export $(cat .env | xargs)
```

As of Thu 28/12/2023, the following environment variables are needed:

```
METEOCHILE_USER=
METEOCHILE_API_KEY=
CDS_API_URL=
CDS_API_KEY=
CLUSTER_HOST=
CLUSTER_USER=
CLUSTER_PASSWORD=
```
check with Mauro Mendoza (msmendoza@uc.cl) for the values of these variables.

# Reporter data

In [5]:
if seasonal:
    summer_stats, winter_stats = reporter.meteochile.compute_year_stats(
        input_path,
        station_id,
        year,
        seasonal=seasonal,
        save=save,
        output_path=output_path,
    )
    yearly_summer_stats = reporter.meteochile.compute_agg_year_stats(
        summer_stats,
        station_id,
        season="summer",
        save=save,
        output_path=output_path,
    )
    yearly_winter_stats = reporter.meteochile.compute_agg_year_stats(
        winter_stats,
        station_id,
        season="winter",
        save=save,
        output_path=output_path,
    )
else:
    year_stats = reporter.meteochile.compute_year_stats(
        input_path,
        station_id,
        year,
        seasonal=seasonal,
        save=save,
        output_path=output_path,
    )
    reporter.meteochile.compute_agg_year_stats(
        year_stats,
        station_id,
        season="",
        save=save,
        output_path=output_path,
    )

In [6]:
pd.read_parquet("data/reporting/stations/330002/year_stats/default/2023_stats.parquet")

Unnamed: 0,start,end,duration,mean_temp,min_temp,max_temp,sum_intensity
0,2023-08-26,2023-08-28,3,18.6,17.3,20.0,3.8
1,2023-08-10,2023-08-13,4,19.375,17.0,26.5,12.0
2,2023-03-29,2023-03-31,3,21.633333,21.0,22.5,4.05
