# Monthly forecasts using economical indicators as covariates
In this example, we have sales data for which we want to create monthly forecasts. We also have a set of macro-economic indicators, which we expect to influence the future sales strongly. We want to know which of these indicators improve the forecast the most and then want to create forecasts for 10 months using these indicators.
We need to do the following:

1. Preprocess data and create time series for both the covariates and the forecasting object (_CHECK-IN_).
    - For a first impression, we want to exclude the "Consumer index".
2. Find the best lags for each covariate (_MATCHER_).see the notebook
3. Create forecasts (_FORECAST_).


For detailed documentation on the functionality, limitations and configurations of _MATCHER_, see the notebook [Cov MATCHER and FORECAST](notebooks/cov_matcher_and_forecast.ipynb).

For general documentation on _EXPERT_ (user credentials, _CHECK-IN_, _FORECAST_, ...) see the notebook [Getting Started](getting_started.ipynb).

# Initialize client

In [None]:
from futureexpert import (DataDefinition,
                          ExpertClient,
                          FileSpecification,
                          FilterSettings,
                          ForecastingConfig,
                          LagSelectionConfig,
                          MatcherConfig,
                          MethodSelectionConfig,
                          ReportConfig,
                          TsCreationConfig)

client = ExpertClient(user='', password='')

# Step 1: Prepare timeseries with _CHECK-IN_
We prepare time series from the raw data using _CHECK-IN_. We need to make sure that
- the columns are defined correctly (date, value, and group columns).
- The data formats are correct (delimiter, decimal, date format).
- The covariates and the forecasting object do not have any missing values after the preparation (otherwise _MATCHER_ will not be able to calculate a result).
- All covariates and the forecasting object share the same granularity (in this case: monthly).
- The forecasting object has at least 78 data points.
- All covariates have at least 96 data points.
- A single file or data frame contains all covariates.

In [None]:
import futureexpert.checkin as checkin

# check in covariates
covs_check_in_results = client.check_in_time_series(raw_data_source='../example_data/monthly_business_inds.csv',
                                                    data_definition=DataDefinition(date_columns=checkin.DateColumn(name='time', format='%Y-%m-%d'),
                                                                                   value_columns=[
                                                                                       checkin.ValueColumn(name='value')],
                                                                                   group_columns=[checkin.GroupColumn(name="name")]),
                                                    config_ts_creation=TsCreationConfig(time_granularity='monthly',
                                                                                        value_columns_to_save=['value'],
                                                                                        grouping_level=["name"],
                                                                                        missing_value_handler="setToZero",
                                                                                        start_date="2007-01-01",
                                                                                        filter=[FilterSettings(type="exclusion", variable="name", items=["Consumer index"])]),
                                                    file_specification=FileSpecification(delimiter=',', decimal='.'))

In [None]:
# check in ts
ts_check_in_results = client.check_in_time_series(raw_data_source='../example_data/example_customer_data.csv',
                                                  data_definition=DataDefinition(date_columns=checkin.DateColumn(name='month_start', format='%Y-%m-%d'),
                                                                                 value_columns=[checkin.ValueColumn(name='value')]),
                                                  config_ts_creation=TsCreationConfig(time_granularity='monthly',
                                                                                      start_date="2008-01-01",
                                                                                      value_columns_to_save=['value']),
                                                  file_specification=FileSpecification(delimiter=',', decimal='.'))

# Find the best lags per covariate using _MATCHER_

We use a lag selection of lags between 0 and 6 instead of the default for monthly data being -2 to 6.

In [None]:
config_matcher = MatcherConfig(title='Covariate selection for sales data and macro-economic indicators',
                               actuals_version=ts_check_in_results.version_id,
                               covs_versions=[covs_check_in_results.version_id],
                               lag_selection=LagSelectionConfig(min_lag=0, max_lag=6))

matcher_identifier = client.start_matcher(config=config_matcher)

# Results of covariate selection

Now we wait for the jobs to be finished. We then get the results via `matcher_identifier`.

In [None]:
import time

# Watch the current status of the matcher report
while not (current_status := client.get_report_status(id=matcher_identifier, include_error_reason=True)).is_finished:
    time.sleep(10)  # Wait between status requests

current_status.print()

results = client.get_matcher_results(matcher_identifier)

## Check results
Now we take a look at the ranking of the indicators. As expected, some of the indicators outperform the benchmark model. That means we can expect the indicators to improve the forecasts in a meaningful way.

In [None]:
for ts_result in results:
    for r in ts_result.ranking:
        print(r)

# Create a forecast with _FORECAST_
Now we use the `report_id` of _MATCHER_, the `version_id` of the checked in covariates and the `version_id` of the checked in time series to create forecasts for the next 10 months using the indicators. Also, relying on the assumption that all covariates influence the forecast, we include the ensembling-strategy to create forecasts based on all* models.


**Not all forecasting models necessarily go into the final result of the ensemble-strategy. There are a few intrinsic checks and selections that further improve the accuracy.*

In [None]:
fc_report_config = ReportConfig(title='Test fc with cov selection',
                                forecasting=ForecastingConfig(fc_horizon=6, use_ensemble=True),
                                method_selection=MethodSelectionConfig(number_iterations=8),
                                matcher_report_id=matcher_identifier.report_id,
                                covs_versions=[covs_check_in_results.version_id])

forecast_identifier = client.start_forecast(version=ts_check_in_results.version_id, config=fc_report_config)
# Watch the current status of the forecasting report
while not (current_status := client.get_report_status(id=forecast_identifier)).is_finished:
    time.sleep(10)  # Wait between status requests

# Retrieve the final results
results = client.get_fc_results(id=forecast_identifier, include_backtesting=True, include_k_best_models=100)

## Overview of model ranking
Now we can look at the rank of all models that created plausible forecasts. Here we can also already see the ranks of the models that used covariates. In this case, the model using the "afo" covariate won. The model using the Business Index covariate is on rank 3. Both forecasts with a covariate use the model **ExtendedCov**, which makes sure that the indicator influences all forecast steps. Ensemble, so a combination of all the best models, is on rank 4. 

In [None]:
for ts_result in results:
    for mo in ts_result.models:
        print(f'{mo.model_name}({mo.model_selection.ranking}): {mo.covariates}')

# Visualize the results
Use the plot functionality to inspect the results. We can find the information about the used covariates per forecasting model in the title of the plot. We only focus on the best ten models based on their ranking.

In [None]:
from futureexpert import plot

forecasts = results[0]
plot.plot_forecast(forecasts, plot_last_x_data_points_only=365, ranks=range(10))