# Ingestor Tutorial

This notebook shows the usage of the `hw_predictor/components/ingestor` package. Note that the package
is designed to automatically work with Kubeflow Pipelines, but this will be an introduction to
better understand how implemented functions can be imported for experimentation use in notebooks.

In [1]:
# to ensure developed modules are reloaded automatically and there's no need
# to restart the kernel
%load_ext autoreload
%autoreload 2


In [2]:
from os import chdir

# change working directory to project's root path, this improves the interaction
# with the data/ and hw_predictor/ folders
chdir("../..")

# Imports

In [3]:
import pandas as pd

import hw_predictor.components.ingestor.src as ing

# Parameters

In [4]:
input_path = "data/test/input"
station_id = 330020

# Code

Prior to code execution, have to ensure that needed project environment variables are set. This can
be done with the following command assuming there's already an `.env` file in the project root directory.

```bash
export $(cat .env | xargs)
```

As of Thu 28/12/2023, the following environment variables are needed:

```
METEOCHILE_USER=
METEOCHILE_API_KEY=
CDS_API_URL=
CDS_API_KEY=
CLUSTER_HOST=
CLUSTER_USER=
CLUSTER_PASSWORD=
```
check with Mauro Mendoza (msmendoza@uc.cl) for the values of these variables.

# Get stations metadata

In [5]:
ing.meteochile.get_stations_metadata(input_path)

[32m2023-12-28 10:44:34.417[0m | [1mINFO    [0m | [36mhw_predictor.components.ingestor.src.meteochile.get_stations_metadata[0m:[36mget_stations_metadata[0m:[36m38[0m - [1mgetting/updating stations metadata...[0m


[32m2023-12-28 10:44:34.607[0m | [1mINFO    [0m | [36mhw_predictor.components.ingestor.src.meteochile.get_stations_metadata[0m:[36mget_stations_metadata[0m:[36m121[0m - [1mstations metadata successfully saved[0m


In [6]:
pd.read_parquet(f"{input_path}/stations/metadata.parquet")

Unnamed: 0_level_0,WMO_code,ICAO_code,name,latitude,longitude,altitude,state_id,geographic_zone_id,data_link
national_code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
170001,85403,,Visviri Tenencia,-17.59500,-69.47750,4084,15,0,https://climatologia.meteochile.gob.cl/applica...
180005,85406,SCAR,"Chacalluta, Arica Ap.",-18.35555,-70.34028,50,15,1,https://climatologia.meteochile.gob.cl/applica...
180017,85405,,Putre,-18.20000,-69.56250,3532,15,2,https://climatologia.meteochile.gob.cl/applica...
180018,85407,,"Defensa Civil, Arica",-18.49111,-70.30139,71,15,1,https://climatologia.meteochile.gob.cl/applica...
180042,85408,,"Cerro Sombrero, Arica",-18.51250,-70.26556,122,15,1,https://climatologia.meteochile.gob.cl/applica...
...,...,...,...,...,...,...,...,...,...
950013,0,,"Base O""Higgins - INACH",-63.32167,-57.89778,1,12,1,https://climatologia.meteochile.gob.cl/applica...
950014,0,,Base Prat - INACH,-62.47889,-59.66389,1,12,1,https://climatologia.meteochile.gob.cl/applica...
950015,0,,Punta Armonía,-62.30361,-59.19667,1,12,1,https://climatologia.meteochile.gob.cl/applica...
950016,0,,Escudero,-62.20278,-58.96167,1,12,1,https://climatologia.meteochile.gob.cl/applica...


# Get Meteochile data

In [7]:
ing.meteochile.get_daily_temp_history(input_path, station_id)

[32m2023-12-28 10:44:34.664[0m | [1mINFO    [0m | [36mhw_predictor.components.ingestor.src.meteochile.get_daily_temp_history[0m:[36mget_daily_temp_history[0m:[36m32[0m - [1mgetting/updating daily temperature history for station 330020...[0m


[32m2023-12-28 10:44:34.906[0m | [1mINFO    [0m | [36mhw_predictor.components.ingestor.src.meteochile.get_daily_temp_history[0m:[36mget_daily_temp_history[0m:[36m108[0m - [1mdaily temperature history for station 330020 successfully saved[0m


Unnamed: 0_level_0,min_temp,max_temp,mean_temp,cond_mean_temp,hourly_data_count,00_temp,12_temp,min_temp_date,max_temp_date,process_date
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1950-01-01,,,22.8,,3.0,,,,,
1950-01-02,,,20.9,,3.0,,,,,
1950-01-03,,,20.3,,3.0,,,,,
1950-01-04,,,22.0,,3.0,,,,,
1950-01-05,,,22.4,,3.0,,,,,
...,...,...,...,...,...,...,...,...,...,...
2023-12-17,9.2,26.3,16.7,16.7,24.0,19.9,13.4,2023.0,2023.0,
2023-12-18,9.3,30.6,19.4,19.4,24.0,22.6,15.9,2023.0,2023.0,
2023-12-19,11.6,25.0,18.6,18.6,24.0,17.3,15.2,2023.0,2023.0,
2023-12-20,8.8,30.8,18.7,18.7,24.0,23.5,14.7,2023.0,2023.0,
