### Analyzing the Effects of Top/Bottom Coding on The Accuracy of Exponential Smoothing Forecasts

***

#### Analysis Flow:

* Inputs:
    * Time series data
    * Forecasting Model
    * Protection Method

* Analysis:
    * Pre-process data
    * Fit forecasting models
    * Generate forecasts for a variety of horizons

* Outputs:
    * Forecasts for original data
    * Forecasts for confidential data
    * Accuracy for local forecasts
    * Global accuracy
    * % improvements/reduction in forecast accuracy across all series

***

## Import Modules

Note the functions imported from `helper_functions`, as these are custom functions written by the paper authors. See `helper_functions.py` for comments and functions descriptions.

In [38]:
# general modules
import pandas as pd
import numpy as np
import sktime

# import exponential smoothing forecasting model
from sktime.forecasting.exp_smoothing import ExponentialSmoothing

# functions for transformation+forecasting pipeline
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.forecasting.compose import EnsembleForecaster

# time series transformations
from sktime.transformations.series.detrend import ConditionalDeseasonalizer

##### the `helper_functions.py` file contains many custom functions we wrote to aid in our analysis
##### `full_coding_analysis` combines all of the following - train-test split data,
##### data protection, train models, compare accuracies, return accuracy results
from helper_functions import full_coding_analysis

# suppress warnings from exponential smoothing model not converging
import warnings
warnings.filterwarnings('ignore')

Import data.

In [39]:
# import weekly finance time series
Y = pd.read_csv("../../Data/Train/Clean/weekly_finance_clean.csv")

## SES

In [40]:
# define forecasting model
# perform multiplicative deseasonalization conditional on autocorrelation test for seasonality

forecaster = TransformedTargetForecaster(
    [
        ("deseasonalize", ConditionalDeseasonalizer(model="multiplicative", sp=52)),
        ("forecast", ExponentialSmoothing(use_boxcox=False)),
    ]
)

We obtain results for a combination of forecast horizons, coding types (top and bottom), and coding percentages:

* Forecast Horizons: (1, 5, 15)
* Coding Types: (Top, Bottom)
* Coding Percentages: (0.10, 0.20, 0.40)

In [4]:
results_dict_ses = {}
types = ["Top", "Bottom"]
percentages = [0.10, 0.20, 0.40]
horizons = [1, 5, 15]

In [5]:
for t in types:
    for p in percentages:
        for h in horizons:
            results_dict_ses["h="+str(h)+", "+t+" "+str(p)] = full_coding_analysis(Y, forecaster, forecast_horizon=h, coding_type=t, coding_percentage=p)

In [6]:
results_dict_ses

{'h=1, Top 0.1': {'% Series with improved accuracy:': array([29.9, 29.9]),
  '% Series with worsened accuracy:': array([61., 61.]),
  '% Series with unchanged accuracy:': array([9.1, 9.1]),
  '% Change global accuracy:': array([-82.6, -82.6])},
 'h=5, Top 0.1': {'% Series with improved accuracy:': array([26.8, 26.8]),
  '% Series with worsened accuracy:': array([64., 64.]),
  '% Series with unchanged accuracy:': array([9.1, 9.1]),
  '% Change global accuracy:': array([-53.6, -43.7])},
 'h=15, Top 0.1': {'% Series with improved accuracy:': array([26.2, 25. ]),
  '% Series with worsened accuracy:': array([63.4, 64.6]),
  '% Series with unchanged accuracy:': array([10.4, 10.4]),
  '% Change global accuracy:': array([-37. , -28.8])},
 'h=1, Top 0.2': {'% Series with improved accuracy:': array([23.8, 23.8]),
  '% Series with worsened accuracy:': array([67.7, 67.7]),
  '% Series with unchanged accuracy:': array([8.5, 8.5]),
  '% Change global accuracy:': array([-179.8, -179.8])},
 'h=5, Top 

***
***

## DES

In [7]:
# define forecasting model
# perform multiplicative deseasonalization conditional on autocorrelation test for seasonality

forecaster = TransformedTargetForecaster(
    [
        ("deseasonalize", ConditionalDeseasonalizer(model="multiplicative", sp=52)),
        ("forecast", ExponentialSmoothing(trend="additive", use_boxcox=False)),
    ]
)

We obtain results for a combination of forecast horizons, coding types (top and bottom), and coding percentages:

* Forecast Horizons: (1, 5, 15)
* Coding Types: (Top, Bottom)
* Coding Percentages: (0.10, 0.20, 0.40)

In [8]:
results_dict_des = {}
types = ["Top", "Bottom"]
percentages = [0.10, 0.20, 0.40]
horizons = [1, 5, 15]

In [9]:
for t in types:
    for p in percentages:
        for h in horizons:
            results_dict_des["h="+str(h)+", "+t+" "+str(p)] = full_coding_analysis(Y, forecaster, forecast_horizon=h, coding_type=t, coding_percentage=p)

In [16]:
results_dict_des

{'h=1, Top 0.1': {'% Series with improved accuracy:': array([35.4, 35.4]),
  '% Series with worsened accuracy:': array([64.6, 64.6]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-83.2, -83.2])},
 'h=5, Top 0.1': {'% Series with improved accuracy:': array([31.7, 28.7]),
  '% Series with worsened accuracy:': array([68.3, 71.3]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-47.3, -38.7])},
 'h=15, Top 0.1': {'% Series with improved accuracy:': array([35.4, 34.1]),
  '% Series with worsened accuracy:': array([64.6, 65.9]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-27.1, -20.1])},
 'h=1, Top 0.2': {'% Series with improved accuracy:': array([32.3, 32.3]),
  '% Series with worsened accuracy:': array([67.7, 67.7]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-181.7, -181.7])},
 'h=5, Top 0.2': 

***
***

## DES With Damped Trend

In [10]:
# define forecasting model
# perform multiplicative deseasonalization conditional on autocorrelation test for seasonality

forecaster = TransformedTargetForecaster(
    [
        ("deseasonalize", ConditionalDeseasonalizer(model="multiplicative", sp=52)),
        ("forecast", ExponentialSmoothing(trend="additive", damped_trend=True, use_boxcox=False)),
    ]
)

We obtain results for a combination of forecast horizons, coding types (top and bottom), and coding percentages:

* Forecast Horizons: (1, 5, 15)
* Coding Types: (Top, Bottom)
* Coding Percentages: (0.10, 0.20, 0.40)

In [11]:
results_dict_ddes = {}
types = ["Top", "Bottom"]
percentages = [0.10, 0.20, 0.40]
horizons = [1, 5, 15]

In [12]:
for t in types:
    for p in percentages:
        for h in horizons:
            results_dict_ddes["h="+str(h)+", "+t+" "+str(p)] = full_coding_analysis(Y, forecaster, forecast_horizon=h, coding_type=t, coding_percentage=p)

In [17]:
results_dict_ddes

{'h=1, Top 0.1': {'% Series with improved accuracy:': array([32.9, 32.9]),
  '% Series with worsened accuracy:': array([66.5, 66.5]),
  '% Series with unchanged accuracy:': array([0.6, 0.6]),
  '% Change global accuracy:': array([-82.8, -82.8])},
 'h=5, Top 0.1': {'% Series with improved accuracy:': array([31.7, 28.7]),
  '% Series with worsened accuracy:': array([67.7, 70.7]),
  '% Series with unchanged accuracy:': array([0.6, 0.6]),
  '% Change global accuracy:': array([-53.1, -43.4])},
 'h=15, Top 0.1': {'% Series with improved accuracy:': array([34.1, 31.7]),
  '% Series with worsened accuracy:': array([65.9, 68.3]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-36.1, -28.3])},
 'h=1, Top 0.2': {'% Series with improved accuracy:': array([28., 28.]),
  '% Series with worsened accuracy:': array([72., 72.]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-179.4, -179.4])},
 'h=5, Top 0.2': 

***
***

## TES

In [18]:
# define forecasting model
# perform multiplicative deseasonalization conditional on autocorrelation test for seasonality

forecaster = TransformedTargetForecaster(
    [
        ("forecast", ExponentialSmoothing(trend="additive",
                                          seasonal="multiplicative",
                                          sp=52,
                                          damped_trend=False, 
                                          use_boxcox=False)),
    ]
)

We obtain results for a combination of forecast horizons, coding types (top and bottom), and coding percentages:

* Forecast Horizons: (1, 5, 15)
* Coding Types: (Top, Bottom)
* Coding Percentages: (0.10, 0.20, 0.40)

In [19]:
results_dict_tes = {}
types = ["Top", "Bottom"]
percentages = [0.10, 0.20, 0.40]
horizons = [1, 5, 15]

In [20]:
for t in types:
    for p in percentages:
        for h in horizons:
            results_dict_tes["h="+str(h)+", "+t+" "+str(p)] = full_coding_analysis(Y, forecaster, forecast_horizon=h, coding_type=t, coding_percentage=p)

In [24]:
results_dict_tes

{'h=1, Top 0.1': {'% Series with improved accuracy:': array([36.6, 36.6]),
  '% Series with worsened accuracy:': array([63.4, 63.4]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-62.2, -62.2])},
 'h=5, Top 0.1': {'% Series with improved accuracy:': array([31.7, 31.7]),
  '% Series with worsened accuracy:': array([68.3, 68.3]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-51.7, -43.7])},
 'h=15, Top 0.1': {'% Series with improved accuracy:': array([34.1, 34.1]),
  '% Series with worsened accuracy:': array([65.9, 65.9]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-44.3, -36.2])},
 'h=1, Top 0.2': {'% Series with improved accuracy:': array([24.4, 24.4]),
  '% Series with worsened accuracy:': array([75.6, 75.6]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-136.9, -136.9])},
 'h=5, Top 0.2': 

***
***

## TES With Damped Trend

In [21]:
# define forecasting model
# perform multiplicative deseasonalization conditional on autocorrelation test for seasonality

forecaster = TransformedTargetForecaster(
    [
        ("forecast", ExponentialSmoothing(trend="additive",
                                          seasonal="multiplicative",
                                          sp=52,
                                          damped_trend=True, 
                                          use_boxcox=False)),
    ]
)

We obtain results for a combination of forecast horizons, coding types (top and bottom), and coding percentages:

* Forecast Horizons: (1, 5, 15)
* Coding Types: (Top, Bottom)
* Coding Percentages: (0.10, 0.20, 0.40)

In [22]:
results_dict_dtes = {}
types = ["Top", "Bottom"]
percentages = [0.10, 0.20, 0.40]
horizons = [1, 5, 15]

In [23]:
for t in types:
    for p in percentages:
        for h in horizons:
            results_dict_dtes["h="+str(h)+", "+t+" "+str(p)] = full_coding_analysis(Y, forecaster, forecast_horizon=h, coding_type=t, coding_percentage=p)

In [25]:
results_dict_dtes

{'h=1, Top 0.1': {'% Series with improved accuracy:': array([35.4, 35.4]),
  '% Series with worsened accuracy:': array([64.6, 64.6]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-70.9, -70.9])},
 'h=5, Top 0.1': {'% Series with improved accuracy:': array([31.1, 27.4]),
  '% Series with worsened accuracy:': array([68.9, 72.6]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-55.7, -46.1])},
 'h=15, Top 0.1': {'% Series with improved accuracy:': array([34.1, 36.6]),
  '% Series with worsened accuracy:': array([65.9, 63.4]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-46.5, -38.4])},
 'h=1, Top 0.2': {'% Series with improved accuracy:': array([26.8, 26.8]),
  '% Series with worsened accuracy:': array([73.2, 73.2]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-139.7, -139.7])},
 'h=5, Top 0.2': 

***
***

## Ensemble Forecaster

In [29]:
ses = ExponentialSmoothing(use_boxcox=False)
holt = ExponentialSmoothing(trend="additive", damped_trend=False, use_boxcox=False)
damped = ExponentialSmoothing(trend="additive", damped_trend=True, use_boxcox=False)

forecaster = TransformedTargetForecaster(
    [
        ("deseasonalize", ConditionalDeseasonalizer(model="multiplicative", sp=52)),
        ("forecast", EnsembleForecaster(
            [
                ("ses", ses),
                ("holt", holt),
                ("damped", damped),
            ]
        )),
    ]
)

We obtain results for a combination of forecast horizons, coding types (top and bottom), and coding percentages:

* Forecast Horizons: (1, 5, 15)
* Coding Types: (Top, Bottom)
* Coding Percentages: (0.10, 0.20, 0.40)

In [30]:
results_dict_ensemble = {}
types = ["Top", "Bottom"]
percentages = [0.10, 0.20, 0.40]
horizons = [1, 5, 15]

In [31]:
for t in types:
    for p in percentages:
        for h in horizons:
            results_dict_ensemble["h="+str(h)+", "+t+" "+str(p)] = full_coding_analysis(Y, forecaster, forecast_horizon=h, coding_type=t, coding_percentage=p)

In [32]:
results_dict_ensemble

{'h=1, Top 0.1': {'% Series with improved accuracy:': array([34.1, 34.1]),
  '% Series with worsened accuracy:': array([65.9, 65.9]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-83.6, -83.6])},
 'h=5, Top 0.1': {'% Series with improved accuracy:': array([29.3, 26.2]),
  '% Series with worsened accuracy:': array([70.7, 73.8]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-53.6, -44. ])},
 'h=15, Top 0.1': {'% Series with improved accuracy:': array([31.7, 29.9]),
  '% Series with worsened accuracy:': array([68.3, 70.1]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-37.9, -29.8])},
 'h=1, Top 0.2': {'% Series with improved accuracy:': array([29.3, 29.3]),
  '% Series with worsened accuracy:': array([70.7, 70.7]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change global accuracy:': array([-181.6, -181.6])},
 'h=5, Top 0.2': 