# _MATCHER_ via _EXPERT_
The _MATCHER_ module ranks a given set of covariates according to their predictive power. First, for each covariate, it determines the best lag (time shift) for every forecasting object. Then, _MATCHER_ evaluates the predictive power of the models including one of the covariates against the benchmark model, which does not use any covariates.
## Requirements
To generate results with _MATCHER_, the data must meet the following conditions. If the data is not suitable, you will receive appropriate feedback.
- All covariates and the forecasting object share the same granularity.
- No missing values in the forecasting object or the covariates.
- The forecasting object has at least 78 data points.
- All covariates have at least 96 data points.
- A single file or data frame contains all covariates.


This notebook includes an example with daily data. For another example using monthly data, see [cov_matcher_and_forecast_monthly](notebooks/cov_matcher_and_forecast_monthly.ipynb).

In [None]:
from futureexpert import (DataDefinition,
                          ExpertClient,
                          FileSpecification,
                          ForecastingConfig,
                          MatcherConfig,
                          MethodSelectionConfig,
                          ReportConfig,
                          TsCreationConfig)

client = ExpertClient(user='', password='')

## First steps
Data versions must be created using _CHECK-IN_ for both the covariates and the time series for which a prediction will be calculated later.

Particular care must be taken here to ensure that the same granularity is selected for all data. If desired, you can already define here whether missing values should be replaced by 0.
 
See the notebook [Getting Started](getting_started.ipynb) for more details on how to configure _CHECK-IN_.

In [None]:
import futureexpert.checkin as checkin

# Check in covariates
covs_check_in_results = client.check_in_time_series(raw_data_source='../example_data/working_days_bavaria.csv',
                                                    data_definition=DataDefinition(date_columns=checkin.DateColumn(name='time', format='%Y-%m-%d'),
                                                                                   value_columns=[checkin.ValueColumn(name='value')]),
                                                    config_ts_creation=TsCreationConfig(time_granularity='daily',
                                                                                        value_columns_to_save=['value']),
                                                    file_specification=FileSpecification(delimiter=';', decimal=','))

# Check in forecasting object
ts_check_in_results = client.check_in_time_series(raw_data_source='../example_data/bicycle_data_single.csv',
                                                  data_definition=DataDefinition(date_columns=checkin.DateColumn(name='date', format='%Y-%m-%d'),
                                                                                 value_columns=[checkin.ValueColumn(name='value')]),
                                                  config_ts_creation=TsCreationConfig(time_granularity='daily',
                                                                                      value_columns_to_save=['value']),
                                                  file_specification=FileSpecification(delimiter=','))

## Configure _MATCHER_
The two versions of covariates and forecasting object must be specified for the _MATCHER_ report. A name can also be created for the report.
Furthermore, `min_lag` and `max_lag` can be defined. These are used to control which time shifts are to be tested. The following default values are tested for the different granularities if no values have been defined:

- monthly: -2 to 6
- daily: -2 to 4
- weekly, hourly and halfhourly : -2 to 12
- quarter: -2 to 2
- yearly: -2 to 1

In some cases, results cannot be calculated for all lags. When this happens, only a subset of these lags are tested.

In [None]:
config_matcher = MatcherConfig(title='Covariate MATCHER started with EXPERT',
                               actuals_version=ts_check_in_results.version_id,
                               covs_versions=[covs_check_in_results.version_id])

## Start _MATCHER_
Once the report has been defined, _MATCHER_ can be started. It might take a few minutes before the results are available, the current status can be accessed with `get_report_status()`.
Possible errors that may occur during the calculation:
- Covariates contain missing values. The affected covariates are listed in the error message. If this error occurs, you should check whether the granularity has been set correctly (e.g. daily instead of monthly data). If there are still missing values in the data, they can be replaced by 0s during _CHECK-IN_. Alternatively, individual covariates can also be completely removed by setting appropriate filters at _CHECK-IN_.
- The status *no evaluation* is returned for one or more time series. In this case, some requirement for the data is not met. You will find more information on that in the status description.

In [None]:
matcher_identifier = client.start_matcher(config=config_matcher)

In [None]:
import time

# Watch the current status of the matcher report
while not (current_status := client.get_report_status(id=matcher_identifier)).is_finished:
    current_status.print()
    print('Waiting another 30 seconds to finish matcher...')
    time.sleep(30)  # Wait between status requests

current_status.print()

matcher_results = client.get_matcher_results(matcher_identifier)

## Check results
Results can be inspected via the results object. You can find the ranking of all predictive covariates based on their respective predictive power for each time series.  The covariate with rank 1 is the covariate (or indicator) with the strongest predictive power for the forecasting object. Covariates with a higher rank than the benchmark model without any covariates have predictive power for the forecasting object, while covariates with a lower rank do not explain the forecasting object. Non-leading indicators will not appear in the results.

The indicator ranking (result of _MATCHER_) can be used as input for _FORECAST_. A forecast will be generated for every indicator in the ranking using a suitable method. 

In [None]:
for ts_result in matcher_results:
    for r in ts_result.ranking:
        print(r)

In [None]:
from futureexpert import plot

for ts_result in matcher_results:
    for model_rank in ts_result.ranking:
        if model_rank.covariates:
            plot.plot_time_series(ts_result.actuals, covariate= model_rank.covariates[0],plot_last_x_data_points_only=365)

### Do you want to adjust the results before forecasting? (optional)
If you want to adjust the results before forecasting, you convert the _MATCHER_ ranking to a covariate configuration for _FORECAST_ locally instead of just referencing the report ID of the _MATCHER_ result.

In [None]:
covs_config = [res.convert_ranking_to_forecast_config() for res in matcher_results]
covs_config

You can adjust the results either before or after this conversion:

1. Directly adjust the list obtained from get_matcher_results. e.g remove covariates or change lags.
2. Adjust the configuration after converting the results to the format needed for the forecast.

To use the adjusted results in your forecast run, use the parameter `covs_configuration` and unset the parameter `matcher_report_id`.

## Start forecast
Specify the id of the _MATCHER_ report in the forecasting configuration. A lag no longer needs to be defined; the necessary information is taken from the _MATCHER_ result. For the seven best covariates, an individual model is created. Which covariate is used in a model is indicated in the model name.


If you want to create forecasts with manually chosen lags, check the documentation in the notebook [forecasts_with_covariates](notebooks/forecast_with_covariates.ipynb).

In [None]:
fc_report_config = ReportConfig(title='Test fc with cov selection',
                                forecasting=ForecastingConfig(fc_horizon=10),
                                method_selection=MethodSelectionConfig(number_iterations=6),
                                matcher_report_id=matcher_identifier.report_id,
                                covs_versions=[covs_check_in_results.version_id])

forecast_identifier = client.start_forecast(version=ts_check_in_results.version_id, config=fc_report_config)

# Get the results

In [None]:
# Watch the current status of the forecasting report
while not (current_status := client.get_report_status(id=forecast_identifier)).is_finished:
    current_status.print()
    print('Waiting another 30 seconds to finish forecasting...')
    time.sleep(30)  # Wait between status requests

current_status.print()

# Retrieve the final results
results = client.get_fc_results(id=forecast_identifier, include_backtesting=True, include_k_best_models=100)

# Check used covariates
For every model of every forecasted time series, check which indicator was used.

In [None]:
for ts_result in results:
    for model in ts_result.models:
        print(f'{model.model_name}({model.model_selection.ranking}): {model.covariates}')

# Use combination of _MATCHER_ and _FORECAST_ ranking

An alterantive ranking can be created using the function `replace_ranking_with_matcher_ranking`. In the result the _MATCHER_ ranking has priority over the _FORECAST_ ranking. For the none-covariate model only the best none-covariten model from the _FORECAST_ run is added to the ranking next to all covariate models.

In [None]:
import futureexpert.forecast
new_ranked_results = futureexpert.forecast.combine_forecast_ranking_with_matcher_ranking(forecast_results=results, matcher_results=matcher_results)

In [None]:
for ts_result in new_ranked_results:
    for model in ts_result.models:
        print(f'{model.model_name}({model.model_selection.ranking}): {model.covariates}')