In [2]:
from pathlib import Path
import os

if Path('.').absolute().parents[1].name == 'ml_drought':
    os.chdir(Path('.').absolute().parents[1])

from src import engineer

# Engineer

The Engineer is responsible for taking the `preprocessed` data from the `data/interim/*_preprocessed/` directories and writing to the `data/features` directory. 

In doing so the `Engineer` creates `train` and `test` data for different month-years. 

The label on the directory `data/features/{experiment}/{year}_{month}` (for example: `data/features/nowcast/2015_1`) refers to the `target` timestep. Therefore, our `y.nc` has the timestep `January 2015` in this example.

<img src="img/engineer_diagram.png" style='background-color: #878787; border-radius: 25px; padding: 20px'>

### We currently have two `experiments` defined in the pipeline.

These two experiments are accessed through the `Engineer` class as an argument - `experiment: str`.

The **`OneMonthForecast`** experiment tries to predict the target variable next month. For example, we might use `total_preciptation` as our regressor (stored in `x.nc`) and want to predict vegetation health `VHI` stored in `y.nc`. 

We therefore use data for December 2014 (total_precipitation and VHI as an autoregressive component) to predict January 2015 VHI.

The **`Nowcast`** experiment suggests that we have information about variables other than the target variable for the target time. So we have `total_preciptiation` information in January 2015 and we want to use that information to predict January 2015 `VHI`. This experiment is a good way of incorporating SEAS5 forecast data.

- `x.nc` includes December 2014 `VHI` and `total_precipitation`, as well as January 2015 `total_precipitation` (non-target variable at target timestep).
- `y.nc` contains January 2015 `VHI` - our target variable.

In [16]:
engineer_one_month = engineer.Engineer(Path('data'), experiment='one_month_forecast')
help(engineer_one_month.engineer)

Help on method engineer in module src.engineer:

engineer(test_year: Union[int, List[int]], target_variable: str = 'VHI', pred_months: int = 12, expected_length: Union[int, NoneType] = 12) -> None method of src.engineer.Engineer instance
    Take all the preprocessed data generated by the preprocessing classes, and turn it
    into a single training file to be ingested by the machine learning models.
    
    :param test_year: Data to be used for testing. No data earlier than the earliest test year
        will be used for training. If a list is passed, a file for each year will be saved.
    :param target_variable: The variable to be predicted. Only this variable will be saved in
        the test netcdf files
    :param pred_months: The amount of months of data to feed as input to the model for
        it to make its prediction
    :param expected_length: The expected length of the x data along its time-dimension.
        If this is not None and an x array has a different time dimensi

In [17]:
engineer_one_month.engineer(test_year=2015, target_variable='VHI', pred_months=1, expected_length=1)

Processing data/interim/reanalysis-era5-single-levels-monthly-means_preprocessed/reanalysis-era5-single-levels-monthly-means_kenya.nc
Generating data for year: 2015, target month: 1
Max date: 2015-01-31, max input date: 2014-12-31, min input date: 2014-11-30
Wrong number of y values! Expected 1, got 0; returning None
Generating data for year: 2015, target month: 2
Max date: 2015-02-28, max input date: 2015-01-31, min input date: 2014-12-31
Wrong number of y values! Expected 1, got 0; returning None
Generating data for year: 2015, target month: 3
Max date: 2015-03-31, max input date: 2015-02-28, min input date: 2015-01-31
Wrong number of y values! Expected 1, got 0; returning None
Generating data for year: 2015, target month: 4
Max date: 2015-04-30, max input date: 2015-03-31, min input date: 2015-02-28
Wrong number of y values! Expected 1, got 0; returning None
Generating data for year: 2015, target month: 5
Max date: 2015-05-31, max input date: 2015-04-30, min input date: 2015-03-31
W

In [18]:
ls data/features/one_month_forecast

[1m[36mnowcast[m[m/            [1m[36mone_month_forecast[m[m/
