### Analyzing the Effects of Top/Bottom Coding on The Accuracy of Exponential Smoothing Forecasts

***

#### Analysis Flow:

* Inputs:
    * Time series data
    * Forecasting Model
    * Protection Method

* Analysis:
    * Pre-process data
    * Fit forecasting models
    * Generate forecasts for a variety of horizons

* Outputs:
    * Forecasts for original data
    * Forecasts for confidential data
    * Accuracy for local forecasts
    * Global accuracy
    * % improvements/reduction in forecast accuracy across all series

***

## Import Modules

Note the functions imported from `helper_functions`, as these are custom functions written by the paper authors. See `helper_functions.py` for comments and functions descriptions.

In [1]:
# general modules
import pandas as pd
import numpy as np

# import exponential smoothing forecasting model
from sktime.forecasting.exp_smoothing import ExponentialSmoothing

# functions for transformation+forecasting pipeline
from sktime.forecasting.compose import TransformedTargetForecaster

# time series transformations
from sktime.transformations.series.detrend import ConditionalDeseasonalizer

##### the `helper_functions.py` file contains many custom functions we wrote to aid in our analysis
##### `full_coding_analysis` combines all of the following - train-test split data,
##### data protection, train models, compare accuracies, return accuracy results
from helper_functions import full_coding_analysis

# suppress warnings from exponential smoothing model not converging
import warnings
warnings.filterwarnings('ignore')

Import data.

In [2]:
# import weekly finance time series
Y = pd.read_csv("../../Data/Train/Clean/weekly_finance_clean.csv")

## SES

In [3]:
# define forecasting model
# perform multiplicative deseasonalization conditional on autocorrelation test for seasonality

forecaster = TransformedTargetForecaster(
    [
        ("deseasonalize", ConditionalDeseasonalizer(model="multiplicative", sp=52)),
        ("forecast", ExponentialSmoothing(use_boxcox=False)),
    ]
)

We obtain results for a combination of forecast horizons, coding types (top and bottom), and coding percentages:

* Forecast Horizons: (1, 5, 15)
* Coding Types: (Top, Bottom)
* Coding Percentages: (0.10, 0.20, 0.40)

In [4]:
results_dict_ses = {}
types = ["Top", "Bottom"]
percentages = [0.10, 0.20, 0.40]
horizons = [1, 5, 15]

In [5]:
for t in types:
    for p in percentages:
        for h in horizons:
            results_dict_ses["h="+str(h)+", "+t+" "+str(p)] = full_coding_analysis(Y, forecaster, forecast_horizon=h, coding_type=t, coding_percentage=p)

In [6]:
results_dict_ses

{'h=1, Top 0.1': {'% of forecasted points adjusted downward:': 72.0,
  '% of forecasted points adjusted upward:': 18.9,
  '% Series with improved accuracy:': array([29.9, 29.9]),
  '% Series with worsened accuracy:': array([61., 61.]),
  '% Series with unchanged accuracy:': array([9.1, 9.1]),
  '% Change mean global accuracy:': array([-82.6, -82.6]),
  '% Change median global accuracy:': array([-179., -179.])},
 'h=5, Top 0.1': {'% of forecasted points adjusted downward:': 73.5,
  '% of forecasted points adjusted upward:': 17.299999999999997,
  '% Series with improved accuracy:': array([26.8, 26.8]),
  '% Series with worsened accuracy:': array([64., 64.]),
  '% Series with unchanged accuracy:': array([9.1, 9.1]),
  '% Change mean global accuracy:': array([-53.6, -43.7]),
  '% Change median global accuracy:': array([-79. , -64.9])},
 'h=15, Top 0.1': {'% of forecasted points adjusted downward:': 73.0,
  '% of forecasted points adjusted upward:': 16.6,
  '% Series with improved accuracy:

***
***

## DES

In [7]:
# define forecasting model
# perform multiplicative deseasonalization conditional on autocorrelation test for seasonality

forecaster = TransformedTargetForecaster(
    [
        ("deseasonalize", ConditionalDeseasonalizer(model="multiplicative", sp=52)),
        ("forecast", ExponentialSmoothing(trend="additive", use_boxcox=False)),
    ]
)

We obtain results for a combination of forecast horizons, coding types (top and bottom), and coding percentages:

* Forecast Horizons: (1, 5, 15)
* Coding Types: (Top, Bottom)
* Coding Percentages: (0.10, 0.20, 0.40)

In [8]:
results_dict_des = {}
types = ["Top", "Bottom"]
percentages = [0.10, 0.20, 0.40]
horizons = [1, 5, 15]

In [9]:
for t in types:
    for p in percentages:
        for h in horizons:
            results_dict_des["h="+str(h)+", "+t+" "+str(p)] = full_coding_analysis(Y, forecaster, forecast_horizon=h, coding_type=t, coding_percentage=p)

In [10]:
results_dict_des

{'h=1, Top 0.1': {'% of forecasted points adjusted downward:': 75.6,
  '% of forecasted points adjusted upward:': 24.4,
  '% Series with improved accuracy:': array([35.4, 35.4]),
  '% Series with worsened accuracy:': array([64.6, 64.6]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change mean global accuracy:': array([-83.2, -83.2]),
  '% Change median global accuracy:': array([-231.7, -231.7])},
 'h=5, Top 0.1': {'% of forecasted points adjusted downward:': 81.69999999999999,
  '% of forecasted points adjusted upward:': 18.3,
  '% Series with improved accuracy:': array([31.7, 28.7]),
  '% Series with worsened accuracy:': array([68.3, 71.3]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change mean global accuracy:': array([-47.3, -38.7]),
  '% Change median global accuracy:': array([-82.3, -67. ])},
 'h=15, Top 0.1': {'% of forecasted points adjusted downward:': 72.8,
  '% of forecasted points adjusted upward:': 27.200000000000003,
  '% Series with imp

***
***

## TES

In [11]:
# define forecasting model
# perform multiplicative deseasonalization conditional on autocorrelation test for seasonality

forecaster = TransformedTargetForecaster(
    [
        ("forecast", ExponentialSmoothing(trend="additive",
                                          seasonal="multiplicative",
                                          sp=52,
                                          damped_trend=False, 
                                          use_boxcox=False)),
    ]
)

We obtain results for a combination of forecast horizons, coding types (top and bottom), and coding percentages:

* Forecast Horizons: (1, 5, 15)
* Coding Types: (Top, Bottom)
* Coding Percentages: (0.10, 0.20, 0.40)

In [12]:
results_dict_tes = {}
types = ["Top", "Bottom"]
percentages = [0.10, 0.20, 0.40]
horizons = [1, 5, 15]

In [13]:
for t in types:
    for p in percentages:
        for h in horizons:
            results_dict_tes["h="+str(h)+", "+t+" "+str(p)] = full_coding_analysis(Y, forecaster, forecast_horizon=h, coding_type=t, coding_percentage=p)

In [14]:
results_dict_tes

{'h=1, Top 0.1': {'% of forecasted points adjusted downward:': 76.8,
  '% of forecasted points adjusted upward:': 23.200000000000003,
  '% Series with improved accuracy:': array([36.6, 36.6]),
  '% Series with worsened accuracy:': array([63.4, 63.4]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change mean global accuracy:': array([-62.2, -62.2]),
  '% Change median global accuracy:': array([-172.1, -172.1])},
 'h=5, Top 0.1': {'% of forecasted points adjusted downward:': 78.0,
  '% of forecasted points adjusted upward:': 22.0,
  '% Series with improved accuracy:': array([31.7, 31.7]),
  '% Series with worsened accuracy:': array([68.3, 68.3]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change mean global accuracy:': array([-51.7, -43.7]),
  '% Change median global accuracy:': array([-92.4, -80.8])},
 'h=15, Top 0.1': {'% of forecasted points adjusted downward:': 79.60000000000001,
  '% of forecasted points adjusted upward:': 20.4,
  '% Series with imp