In [1]:
from futureexpert import ExpertClient
from futureexpert.forecast import export_result_overview_to_pandas, export_forecasts_to_pandas, export_forecasts_with_overview_to_pandas
from futureexpert.plot import plot_time_series, plot_forecast, plot_backtesting

# Working with forecast results
During the forecast calculation for your time series, a lot of additional information is gathered about your data. All this information is available in the object returned by `get_fc_results()`. This notebook shows you:
- export functions for result overviews
- plotting functions
- ways to extract specific information
- a detailed breakdown of the overall result structure,
- explanations for the individual results
- a summary table of the available forecasting methods with information about each method.

# Differentiating between MATCHER and FORECAST reports
First, we initialize the client. Then we get the last 10 reports by calling the `get_reports()` function. This list includes all FORECAST and MATCHER reports with the newest report being the first element. In this notebook, we want to work exclusively with forecasts, so we have to use the `get_report_type()` function to identy the FORECAST report by its string representation `MongoForecastingResultSink` (MATCHER results are identified by `CovariateSelection`).

In [None]:
client = ExpertClient()
last_reports = client.get_reports(limit=10)
for report in last_reports:
    if client.get_report_type(report.report_id) == 'MongoForecastingResultSink':
        most_recent_fc_report  = report
        break
most_recent_fc_report 

# Get results
We then use `get_fc_results()` to retreive the results of the last report. We will only look at the three best models. This function always returns a list of `ForecastResults`, each time series is represented by an individual element in that list. This is also true if your report only contains one time series.
Remember that your results are only stored seven days after they have been calculated. In the following steps, we look at the first element of that list.

In [None]:
forecast_results = client.get_fc_results(id=most_recent_fc_report .report_id, include_backtesting=True, include_k_best_models=3)
individual_forecast_result = forecast_results[0]

# Export functions
These functions allow you to export the most important features of the results to data frames. Each overview table provides information about the best model for all time series in the report. The functions are:
- `export_result_overview_to_pandas()`: name, grouping, a few selected time series characteristics, and model information
- `export_forecasts_to_pandas()`: name, time_stamp_utc, point_forecast_value, lower_limit_value, upper_limit_value
- `export_forecasts_with_overview_to_pandas()`: The forecasts and the overview combined

In [None]:
overview = export_result_overview_to_pandas(forecast_results)
overview.head(5)

In [None]:
fc = export_forecasts_to_pandas(forecast_results)
fc.head(5)

In [None]:

fc_with_overview = export_forecasts_with_overview_to_pandas(forecast_results)
fc_with_overview.head(5)

## Plots
Several plotting functions enable you to visualize different aspects of your data for one time series at a time:
- The input timeseries (with covariates, if available)
- The forecasts and other results (outliers, missing values, ...) for your forecast models
- The backtesting results for your forecasts

In [None]:
plot_time_series(ts=individual_forecast_result.input.actuals, plot_last_x_data_points_only=50, covariate=individual_forecast_result.input.covariates[0] if individual_forecast_result.input.covariates else None)

In [None]:
plot_forecast(result=individual_forecast_result, plot_last_x_data_points_only=50, ranks=[1,2,3])

In [None]:
plot_backtesting(result=individual_forecast_result, plot_last_x_data_points_only=50, ranks=[1,2,3], iteration=7)

# Structure of results
Each result consists of the following fields:
- input
- ts_characteristics
- changed_start_date
- changed_values
- models

We will now look at the most important elements of each of these fields and different ways to access them.

A comprehensive structure for all fields can be found in the [API docs](https://discovertomorrow.github.io/futureEXPERT/forecast.html#futureexpert.forecast.ForecastResult).

## Input
The `input` field contains information about the actuals and covariates. These include the values, name, and other metadata.

In [None]:
name = individual_forecast_result.input.actuals.name
grouping = individual_forecast_result.input.actuals.grouping
values = individual_forecast_result.input.actuals.values
covariate_names = [cov.ts.name for cov in individual_forecast_result.input.covariates]
print(f'The name of the  time series is: {name} and the following grouping information was defined: {grouping}.')
print(f'The last 20 values of the time series are: {values[len(values)-20:]}.')
print(f'The following covariates were used in the method selection: {covariate_names}. Covariates that have been used in the forecast creation can be found in the model results.')

## Time series characteristics
The `ts_characteristics` field provides a set of characteristics for each time series, determined during preprocessing for forecast creation:

In [None]:
individual_forecast_result.ts_characteristics

## Changed values
The fields `changed_start_date` and `changed_values` contain information about changes during preprocessing. For each modification, they include a `change_resaon`, such as removed leading zeros, replaced outliers, or handled missing values.

## Forecasting results
The `models` field contains a list of calculated models for the individual time series, including general information, backtesting forecasts, and future predicted values.

In [None]:
model_overview = []
for model in individual_forecast_result.models:
    model_information = {}
    model_information['name'] = model.model_name
    model_information['rank'] = model.model_selection.ranking.rank_position
    model_information['covariates'] = model.covariates
    model_overview.append(model_information)
model_overview

## Available Forecasting Methods

### Covariate extension strategy
If the provided covariates are shorter than the forecast horizon, a non-covariate forecasting method is used
to extend the forecast beyond the available horizon of the covariates. This is indicated in the model name:
`[Method] extended by [ExtensionStrategy]`.

The SmoothExtensionStrategy is a forecasting strategy that integrates the covariate-based forecast into the non-covariate forecasting method to produce a forecast free of structural breaks:

   1. Compute a forecast using all available external covariates for the forecast steps where they are available.
   2. Extend the actuals with the values from this forecast.
   3. Generate a forecast for the remaining forecast steps using a base model that does not rely
      on short external covariates, applied to the extended actuals.
   4. Combine the two forecasts: Use the covariate-based model for all forecast steps where covariates are available,
      and append the forecast from the base model for the remaining higher forecast steps.
      

### Modeling seasonalities for ML methods

To make seasonal modeling possible for machine learning models, artificial covariates are generated based on reasonable seasonal patterns for the given granularity.
If a seasonality is detected in the time series or provided by the user, the potential seasonal lengths are narrowed down to multiples or divisors of the identified seasonality.
For multiple seasonalities, only the first or most significant one is used.


### Model Overview

||Method name|Category|Ability to capture seasonal patterns|Works with covariates|Source|Additional information|
|--:|:--|:--|:--|:--|:--|:--|
|0|AdaBoost|machine learning|True|True|[scikit learn](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html)|Forecasts are generated using AdaBoost, a [Gradient Boosting Decision Tree Algorithm](https://www.future-forecasting.de/kb/ov_machine-learning/#gradient-boosting-decision-tree-algorithm), with a selected set of lagged actual values and engineered covariates, including trend and/or seasonal indicators.|
|1|Aft4Sporadic|statistical, intermittent|True|False|prognostica’s proprietary development<br><br>Use the survival analysis implementation available in the XGBoost package (docs are available [here](https://xgboost.readthedocs.io/en/stable/tutorials/aft_survival_analysis.html)). In addition to XGBoost’s standard functionality, the Aft4Sporadic implementation estimates the scale parameter, sigma, of the residual distribution and automaticaly selects the best the best-fitting residual distribution.<br><br><br>|Apply only to sporadic time series. This method performs best with time series that have nonzero values fluctuating around a constant level (i.e., no trend), are highly intermittent, and have long forecast horizons. It employs a non-linear, GBT-based survival regression to predict when the next nonzero value will occur. This regression incorporates seasonal patterns (in the sporadicity patterns!) and considers the time elapsed since the most recent nonzero values (from historical data) to estimate the interval from the current date (end of the time series) to the next expected nonzero occurrence. An average of the most recent nonzero values provides the predicted level of the next nonzero value. The approach repeats recursively to provide forecasts for the entire forecast horizon.<br><br>Currently, external covariates are not supported.|
|2|Arima|statistical|True|True|[pmd arima](https://alkaline-ml.com/pmdarima/)|[ARIMA - _future_ knowledge base](https://www.future-forecasting.de/kb/ov_time-series-forecasting/#autoregressive-integrated-moving-average-arima)|
|3|AutoEsCov|statistical|TRUE, up to 3 `season_length`s at once|True|prognostica’s in-house development|[Exponential smoothing with covariates - _future_ knowledge base](https://www.future-forecasting.de/kb/ov_time-series-forecasting/#exponentielleglaettung-id)|
|4|Cart|machine learning|True|True|[scikit learn](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html)|Forecasts are generated using [Classification And Regression Tree (CART)](https://www.future-forecasting.de/kb/ov_machine-learning/#cart-id) with a selected set of lagged actual values and engineered covariates, including trend and/or seasonal indicators.|
|5|CatBoost|machine learning|True|True|[catboost](https://github.com/catboost/catboost)|Forecasts are generated using CatBoost, a [Gradient Boosting Decision Tree Algorithm](https://www.future-forecasting.de/kb/ov_machine-learning/#cart-id), with a selected set of lagged actual values and engineered covariates, including trend and/or seasonal indicators.|
|6|Croston|statistical, intermittent|False|False|prognostica's implementation, builds on statsmodels ES.|[Croston - _future_ knowledge base](https://www.future-forecasting.de/kb/ov_time-series-forecasting/#croston-id)|
|7|Ensemble|hybrid|True (depending on the base models)|True (depending on the base models)|prognostica's implementation|[Ensemble methods - _future_ knowledge base](https://www.future-forecasting.de/kb/ov_time-series-forecasting/#ensemblemethoden-id)|
|8|Es|statistical|True|False|[statsmodel](https://www.statsmodels.org/dev/examples/notebooks/generated/exponential_smoothing.html)|[Exponential smoothing - _future_ knowledge base](https://www.future-forecasting.de/kb/ov_time-series-forecasting/#exponentielleglaettung-id)|
|9|ExtraTrees|machine learning|True|True|[scikit learn](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html)|Forecasts are generated using ExtraTrees, an extremely randomized trees algorithm (based on [Random Forests](https://www.future-forecasting.de/kb/ov_machine-learning/#randomforest-id)), with a selected set of lagged actual values and engineered covariates, including trend and/or seasonal indicators.|
|10|Glmnet|machine learning|True|True|[scikit learn](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html)|Forecasts are generated using [Regulized Regression](https://www.future-forecasting.de/kb/ov_regression/#regularizedregression-id) with a selected set of lagged actual values and engineered covariates, including trend and/or seasonal indicators.|
|11|GranularityMovingAverage|statistical|False|False|prognostica's implementation|[Moving average - _future_ knowledge base](https://www.future-forecasting.de/kb/ov_time-series-forecasting/#gleitendermittelwert-id)|
|12|InterpolId|statistical, intermittent|True|False|prognostica’s proprietary development|Forecasting based on generalized Croston approach. <br>The decomposed time series are analyzed independently, allowing the use of different models for each component. InterpolID considers trends and seasonal patterns in both the inter-demand interval lengths and the demand values.|
|13|LightGbm|machine learning|True|True|[LightGBM](https://github.com/microsoft/LightGBM)|Forecasts are generated using LightGBM with a selected set of lagged actual values and engineered covariates, including trend and/or seasonal indicators.|
|14|LinearRegression|machine learning|True|True|[scikit learn](https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.LinearRegression.html)|Forecasts are generated using [Linear Regression](https://www.future-forecasting.de/kb/ov_regression/#linearregression) with a selected set of lagged actual values and engineered covariates, including trend and/or seasonal indicators.|
|15|MedianAs|statistical|True|False|prognostica’s proprietary development|Based on the main seasonality of the time series, this method estimates average demand patterns within identical seasonal periods (e.g., monthly or weekly values). This method is well suited for time series with additive seasonalities or linear trends. However, the model does not accommodate covariate modeling and is not suited for multiple or multiplicative seasonalities, or non-linear trends.|
|16|MedianPattern|statistical, intermittent|True|False|prognostica’s proprietary development|This model is a variant of the MedianAS model. However, unlike MedianAS, the pattern length here is not determined by the seasonality of the data. Instead, it is set to a fixed length based on the data granularity (e.g., 12 for monthly data), assuming no trend component in the model. This approach is especially effective for forecasting sporadic time series in a cyclical manner, even when the data lacks clear seasonal behavior.|
|17|MLP|machine learning|True|True|[scikit learn](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html)|Forecasts are generated using MLP with a selected set of lagged actual values and engineered covariates, including trend and/or seasonal indicators.|
|18|MostCommonValue|statistical|False|False|prognostica's implementation|The most common value of the data history is used as forecast for all forecast steps|
|19|MovingAverage|statistical|False|False|prognostica's implementation|[Moving average - _future_ knowledge base](https://www.future-forecasting.de/kb/ov_time-series-forecasting/#gleitendermittelwert-id)|
|20|Naive|statistical|True|False|prognostica's implementation|[Naive method - _future_ knowledge base](https://www.future-forecasting.de/kb/ov_time-series-forecasting/#naiveprognose-id)|
|21|RandomForest|machine learning|True|True|[scikit learn](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)|Forecasts are generated using [Random forest](https://www.future-forecasting.de/kb/ov_machine-learning/#randomforest-id) with a selected set of lagged actual values and engineered covariates, including trend and/or seasonal indicators.|
|22|SeasonLagMovingAverage|statistical|False|False|prognostica's implementation|[Moving average - _future_ knowledge base](https://www.future-forecasting.de/kb/ov_time-series-forecasting/#gleitendermittelwert-id)|
|23|Svm|machine learning|True|True|[scikit learn](https://scikit-learn.org/1.5/modules/svm.html)|Forecasts are generated using [Random Forest](https://www.future-forecasting.de/kb/ov_machine-learning/#svm-id) with a selected set of lagged actual values and engineered covariates, including trend and/or seasonal indicators.|
|24|Tbats|statistical|TRUE, also multiple `season_length`s|False|[tbats](https://github.com/intive-DataScience/tbats)|[Tbats - _future_ knowledge base](https://www.future-forecasting.de/kb/ov_time-series-forecasting/#tbats-id)|
|25|Theta|statistical|True|False|[statsmodel](https://www.statsmodels.org/dev/generated/statsmodels.tsa.forecasting.theta.ThetaModel.html)||
|26|Tsb|statistical, intermittent|False|False|prognostica's implementation, based on [statsmodel ES](https://www.statsmodels.org/dev/examples/notebooks/generated/exponential_smoothing.html)|[TSB - _future_ knowledge base](https://www.future-forecasting.de/kb/ov_time-series-forecasting/#tsb-id)|
|27|XGBoost|machine learning|True|True|[dmlc XGBoost](https://github.com/dmlc/xgboost/)|Forecasts are generated using [Gradient Boosting Decision Tree Algorithm](https://www.future-forecasting.de/kb/ov_machine-learning/#gradient-boosting-decision-tree-algorithm) with a selected set of lagged actual values and engineered covariates, including trend and/or seasonal indicators.|
|28|ZeroFc|statistical|False|False|prognostica's implementation|All forecast values are zero.|