# 2. Forecasting

In the following notebooks, we'll run forecasts using two **zero-shot** models: ```Moirai``` and ```Chronos```, and two **traditional models**: ```NBeats``` and ```NHiTS```. We'll run forecasts without covariates (```NHiTS``` also with covariables) so we can compare the results later in the benchmark.

## 1. Pollen predictions
In the following four notebooks we are going to predict the amount of pollen and store it in a file in the ```PREDICTIONS_DIR``` directory, and the times of performing the fitting and the predictions in ```TIMING_DIR``` directory. 

### 1.1. Constants
- ```FIRST_DATE``` and ```LAST_DATE```. This is the range of dates for which there is data in the dataset.
```
FIRST_YEAR = 1993
LAST_YEAR = 2023

# First and last day in the datset
FIRST_DATE = date(FIRST_YEAR, 1, 1)
LAST_DATE = date(LAST_YEAR, 12, 31)
```

### 1.2. Parameters
- ```HORIZON_SIZE```. The size of the prediction horizon is the number of days that are predicted.
- ```INPUT_SIZE```. The input size is the number of days taken from the context to make the predictions.
- ```TRAIN_SIZE```. The train size is the number of days taken from the dataset to fit de model before making predictions.
- ```START_YEAR```, ```END_YEAR```. It is the range of years in which the predictions will be made, without taking into account the offset.
- ```OFFSET_DAYS```.  The distance between the first January value of ```START_YEAR``` and the first date that will have ```HORIZON_SIZE``` prediction steps. For example, ```OFFSET_DAYS=-5``` means we start from December 25th of the year prior to ```START_YEAR```, not from the first day of January to have ```HORIZON_SIZE``` predictions.
- ```START_TRAINING```, ```START_DATE - 1```. Date range for which you need pollen data for the training.
- ```START_DATE```, ```END_DATE```. Date range for which you need pollen data for the forecast.
- ```OUTPUT_DIR```. Directory where the predictions will be saved.

### 1.3. Time Series, START_DATE and END_DATE calculation. 
```en``` or ```y```. The endogenous time series is a serie  (```pd.serie```) that contains for the range of dates the pollen data to make the predictions and the training.
```ex``` or ```X```. The exogenous time series is a dataframe (```pd.dataframe```) that contains for the range of dates the rain, tmax, tmin, tmed data to make the predictions and the training.

The date range (```START_TRAINING```, ```END_DATE```) is obtained from the following variables: ```START_YEAR```, ```END_YEAR```, ```INPUT_SIZE```, ```TRAIN_SIZE```, ```OFFSET_DAYS```, ```HORIZON_SIZE```.
```
    # Define original boundaries
    orig_start = date(START_YEAR, 1, 1)
    orig_end = date(END_YEAR, 12, 31)

    # Compute how many days to shift start and end for the offset window
    offset_days_start = INPUT_SIZE + TRAIN_SIZE + max(0, -OFFSET_DAYS) + (HORIZON_SIZE - 1)
    offset_days_end = max(OFFSET_DAYS, 0) + (HORIZON_SIZE - 1)


    # Subtract offset days while skipping leap day
    START_TRAINING = orig_start - offset_days_start

    # Add offset days while skipping leap day
    END_DATE = orig_end - offset_days_end
```

**Note**: When calculating ```START_TRAINING``` and ```END_DATE```, it will be taken into account whether the year is a leap year or not. ```START_DATE``` is calculated as follows (training size is not added):  
```
orig_start = date(START_YEAR, 1, 1)
...
offset_days_start = INPUT_SIZE + max(0, -OFFSET_DAYS) + (HORIZON_SIZE - 1)
START_DATE = orig_start - offset_days_start
```

```
en = df[(df.index >= pd.Timestamp(TRAINING_DATE)) & (df.index <= pd.Timestamp(end_date))]["pollen"]
ex = df[(df.index >= pd.Timestamp(TRAINING_DATE)) & (df.index <= pd.Timestamp(end_date))].drop('pollen', axis=1)
```

**Note**: The code will run even if you cannot get all the necessary data: \[```START_DATE```, ```END_DATE```\] for both the endogenous and exogenous time series in the dataset. If the necessary data is available in the dataset, we will obtain horizon size predictions for all days of the year plus the offset of days.

**Note**: If we don't predict using covariates, we don't need to get the last horizon size - 1 rows from the dataset. If we do need them for predictions but don't have them, what we'll do is predict without exogenous variables for those last rows.

### 1.4. Predictions CSV Example
The resulting table, after obtaining the predictions and adapting the format in which it is displayed, has the following structure:

| date | pred_1 | ... | pred_h | real |
|:--:|:--:|:--:|:--:|:--:|
| Target forecast date | Forecast if the first day of the prediction is the target date  | ... | Forecast if the h day of the prediction is the target date |  Real value observed at Target date |

For example, if ```HORIZON_SIZE``` of 1 week, 1 year of ```INPUT_SIZE```(context), 0 years of ```TRAIN_SIZE```(fitting),```START_YEAR = 2000```, ```END_YEAR = 2022```, ```OFFSET_DAYS = -182``` and does not predict using covariates.  The ```START_DATE``` and ```END_DATE``` are *1997-06-27* and *2024-01-06* respectively. And the resulting CSV is:
```
date,pred_1,pred_2,pred_3,pred_4,pred_5,pred_6,pred_7,real
1999-06-27,3.75,,,,,,,4.55
1999-06-28,9.51,2.65,,,,,,7.12
1999-06-29,7.32,1.74,1.06,,,,,0.84
1999-06-30,5.99,2.60,1.37,6.48,,,,9.21
1999-07-01,1.56,3.28,0.45,8.12,5.31,,,2.17
1999-07-02,4.87,0.96,2.44,3.07,9.76,6.12,,1.03
1999-07-03,6.29,4.22,0.71,1.38,3.44,7.65,2.09,8.60
...
2022-12-31,4.08,7.11,2.53,9.44,1.65,3.86,0.47,0.92
2023-01-01,,1.46,3.29,6.51,7.40,8.80,1.00,5.72
2023-01-02,,,4.28,2.81,4.24,7.71,0.93,6.64
2023-01-03,,,,9.51,2.36,8.80,0.50,2.43
2023-01-04,,,,,2.22,0.71,0.03,4.81
2023-01-05,,,,,,6.79,0.93,0.37
2023-01-06,,,,,,,5.03,7.16
```
**Notes**
- Random data is provided as an example since it is omitted due to confidentiality issues.
- In this case, if we predict using covariates, the model predictions in the last rows of ```HORIZON_SIZE``` will be made without using covariates. Predictions will be made without using covariates because we don't have data in the dataset for the dates from *01/01/2024* to *01/06/2024*.

The file name will be as follows: 
```
suffix = "with_covariates" if uses_covariates else "without_covariates"
file_name = f"{model_name}_{train_size}_{input_size}_{horizon_size}_{suffix}.csv"
file_path = os.path.join(predictions_dir, file_name)
```
For example, the last example if ```predictions_dir``` is: *./outputs/predictions/* and ```model_name``` is: *moirai.deterministic.small* then ```file_path``` will be *./outputs/predictions/moirai.deterministic.small_0_365_7_without_covariates.csv*.

### 1.5. Arguments

For traditional models:
```
# 1 week of horizon
HORIZON_SIZE = 7

# 1 year of context
INPUT_SIZE = 365 

# 1 or 5 year of training (traditional models)
TRAIN_SIZE = 365 #* 5

# benchmark for 1999, 2000, ..., 2022, 2023 (24 years)
START_YEAR = 1999
END_YEAR = 2023

# half a yar of offset days
OFFSET_DAYS = -182
```

- **1 year of training**: ```START_TRAINING = 1997-06-27```, ```START_DATE = 1998-06-27``` and ```END_DATE = 2023-01-06```. But the range of dates for the data that we will be able to extract will be the following: \[1997-06-27, 2023-01-06\].
- **5 years of training**:```START_TRAINING = 1993-06-27```,  ```START_DATE = 1998-06-27``` and ```END_DATE = 2023-01-06```. But the range of dates for the data that we will be able to extract will be the following: \[1993-06-27, 2023-01-06\].

For zero-shot models:
```
# 1 week of horizon
HORIZON_SIZE = 7

# 1 year of context
INPUT_SIZE = 365 

# 0 year of training (zero-shot, no fitting)
TRAIN_SIZE = 0

# benchmark for 1999, 2000, ..., 2022, 2023 (24 years)
START_YEAR = 1999
END_YEAR = 2023

# half a yar of offset days
OFFSET_DAYS  = -182
```

-  **0 years of training**: ```START_TRAINING = 1998-06-27```,  ```START_DATE = 1998-06-27``` and ```END_DATE = 2023-01-06```. But the range of dates for the data that we will extract will be the following: \[1998-06-27, 2023-01-06\].

### 1.6. Timing CSV Example
The resulting table, after obtaining the time it takes the model to perform the fitting from ```START_TRAINING``` to ```START_DATE -1``` and the forecasts from ```START_DATE``` to ```END_DATE``` and adapting the format in which it is displayed, has the following structure: 

| model_name | uses_covariates | train_size | input_size | horizon_size |  offset_days | start_year | end_year | fit_time | predict_time |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| Name of the model | Whether covariates are used for predictions (True) or not (False) | The number of days with which the model is trained | The number of days the model uses as a context for making predictions | The number of days the model forecasts (prediction horizon) | The distance between the first January value of ```START_YEAR``` and the first date that will have ```HORIZON_SIZE``` prediction steps. | Initial year for which the predictions will be made, without taking into account the offset | Final year for which the predictions will be made, without taking into account the offset | Time it takes for a model to complete the fitting | Time it takes for a model to complete all the predictions |

An example, of timing CSV could be these:

```
model_name,uses_covariates,train_size,input_size,horizon_size,start_year,end_year,fit_time,predict_time
moirai.stochastic.base,False,0,365,7,-182,2000,2022,0.45,110.84
moirai.stochastic.large,False,0,365,7,-182,2000,2022,1.12,164.85
chronos.conservative.t5.tiny,False,0,365,7,-182,2000,2022,1.72,154.86
chronos.conservative.t5.mini,False,0,365,7,-182,2000,2022,0.98,156.92
chronos.conservative.t5.small,False,0,365,7,-182,2000,2022,0.86,193.32
chronos.conservative.t5.base,False,0,365,7,-182,2000,2022,0.88,335.94
```

The file name will be as follows: 
```
FILE_NAME = "models_runtime.csv"
...
file_path = Path(timing_dir) / FILE_NAME
```

For example, if ```timing_dir``` is: *./outputs/timing/* then ```file_path``` will be *./outputs/timing/models_runtime.csv*.

## Useful resources:
If you want to go deeper, I share the following resources below:

**Medium tutorial for working in sktime with Foundation Models**:
- [Forecasting using foundation models and sktime](https://medium.com/@benedikt_heidrich/forecasting-using-foundation-models-and-sktime-4d5a09909742)

**Sktime Documentation**:
- [Forecasting with sktime](https://www.sktime.net/en/stable/examples/01_forecasting.html) Deep documentation for forecast with sktime.
- [Forecasting Pipelines, Tuning, and AutoML](https://www.sktime.net/en/stable/examples/03b_forecasting_transformers_pipelines_tuning.html). Notebook about pipelining and tuning (grid search) for time series forecasting with sktime.
- [Global Forecasting](https://www.sktime.net/en/latest/examples/01c_forecasting_hierarchical_global.html#Global-forecasting-in-sktime). To find out what global forecasting is.

**Sktime Models API Reference**:
- [Moirai](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.moirai_forecaster.MOIRAIForecaster.html)
- [Chronos](https://www.sktime.net/en/stable/api_reference/auto_generated/sktime.forecasting.chronos.ChronosForecaster.html)
- [NHiTS](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.pytorchforecasting.PytorchForecastingNHiTS.html)
- [NBeats](https://www.sktime.net/en/stable/api_reference/auto_generated/sktime.forecasting.pytorchforecasting.PytorchForecastingNBeats.html#sktime.forecasting.pytorchforecasting.PytorchForecastingNBeats.predict)

**Sktime Table of Estimators**:
- [Estimators Overview sktime](https://www.sktime.net/en/latest/estimator_overview.html#filter=forecaster&tags=%7B%22capability%3Acategorical_in_X%22%3Afalse%2C%22capability%3Ainsample%22%3Afalse%2C%22capability%3Apred_int%22%3Afalse%2C%22capability%3Apred_int%3Ainsample%22%3Afalse%2C%22capability%3Amissing_values%22%3Afalse%2C%22ignores-exogeneous-X%22%3Atrue%2C%22scitype%3Ay%22%3Afalse%2C%22requires-fh-in-fit%22%3Afalse%2C%22X-y-must-have-same-index%22%3Afalse%2C%22python_dependencies%22%3Afalse%2C%22authors%22%3Afalse%2C%22maintainers%22%3Afalse%7D)

**Sktime API Reference**:
- [Pre-trained and foundation models](https://www.sktime.net/en/latest/api_reference/forecasting.html#pre-trained-and-foundation-models)

**➡️ [Next notebook: 02-01_Forecasting_with_Moirai](../notebooks/02-01_Forecasting_with_Moirai.ipynb)**