### Analyzing the Effects of Top/Bottom Coding on The Accuracy of Exponential Smoothing Forecasts

***

#### Analysis Flow:

* Inputs:
    * Time series data
    * Forecasting Model
    * Protection Method

* Analysis:
    * Pre-process data
    * Fit forecasting models
    * Generate forecasts for a variety of horizons

* Outputs:
    * Forecasts for original data
    * Forecasts for confidential data
    * Accuracy for local forecasts
    * Global accuracy
    * % improvements/reduction in forecast accuracy across all series

***

## Import Modules

Note the functions imported from `helper_functions`, as these are custom functions written by the paper authors. See `helper_functions.py` for comments and functions descriptions.

In [1]:
# general modules
import pandas as pd
import numpy as np
import sktime

# import exponential smoothing forecasting model
from sktime.forecasting.exp_smoothing import ExponentialSmoothing

# functions for transformation+forecasting pipeline
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.forecasting.compose import EnsembleForecaster

# time series transformations
from sktime.transformations.series.detrend import ConditionalDeseasonalizer, Detrender

##### the `helper_functions.py` file contains many custom functions we wrote to aid in our analysis
##### `full_coding_analysis` combines all of the following - train-test split data,
##### data protection, train models, compare accuracies, return accuracy results
from helper_functions import full_coding_analysis

# suppress warnings from exponential smoothing model not converging
import warnings
warnings.filterwarnings('ignore')

Import data.

In [2]:
# import weekly finance time series
Y = np.genfromtxt("../../Data/Train/Clean/weekly_finance_clean.csv", delimiter = ',', skip_header = 1)
Y = pd.DataFrame(Y)

In [3]:
detrender = Detrender()
detrended_series = [detrender.fit_transform(series) for _ , series in Y.iterrows()]
detrended_series = [i+np.abs(np.min(i))+1.0 for i in detrended_series]
Y = pd.concat(detrended_series, axis=1).T

## SES

In [4]:
# define forecasting model
# perform multiplicative deseasonalization conditional on autocorrelation test for seasonality

forecaster = TransformedTargetForecaster(
    [
        ("deseasonalize", ConditionalDeseasonalizer(model="multiplicative", sp=52)),
        ("forecast", ExponentialSmoothing(use_boxcox=False)),
    ]
)

We obtain results for a combination of forecast horizons, coding types (top and bottom), and coding percentages:

* Forecast Horizons: (1, 5, 15)
* Coding Types: (Top, Bottom)
* Coding Percentages: (0.10, 0.20, 0.40)

In [5]:
results_dict_ses = {}
types = ["Top", "Bottom"]
percentages = [0.10, 0.20, 0.40]
horizons = [1, 5, 15]

In [6]:
for t in types:
    for p in percentages:
        for h in horizons:
            results_dict_ses["h="+str(h)+", "+t+" "+str(p)] = full_coding_analysis(Y, forecaster, forecast_horizon=h, coding_type=t, coding_percentage=p)

In [7]:
results_dict_ses

{'h=1, Top 0.1': {'% of forecasted points adjusted downward:': 62.8,
  '% of forecasted points adjusted upward:': 30.5,
  '% Series with improved accuracy:': array([45.7, 45.7]),
  '% Series with worsened accuracy:': array([47.6, 47.6]),
  '% Series with unchanged accuracy:': array([6.7, 6.7]),
  '% Change mean global accuracy:': array([-25.2, -25.2]),
  '% Change median global accuracy:': array([-27.8, -27.8])},
 'h=5, Top 0.1': {'% of forecasted points adjusted downward:': 63.5,
  '% of forecasted points adjusted upward:': 27.3,
  '% Series with improved accuracy:': array([39.6, 39.6]),
  '% Series with worsened accuracy:': array([51.2, 51.2]),
  '% Series with unchanged accuracy:': array([9.1, 9.1]),
  '% Change mean global accuracy:': array([-6.4, -5.7]),
  '% Change median global accuracy:': array([-3.4, -4.5])},
 'h=15, Top 0.1': {'% of forecasted points adjusted downward:': 58.5,
  '% of forecasted points adjusted upward:': 32.4,
  '% Series with improved accuracy:': array([42.1

***
***

## DES

In [8]:
# define forecasting model
# perform multiplicative deseasonalization conditional on autocorrelation test for seasonality

forecaster = TransformedTargetForecaster(
    [
        ("deseasonalize", ConditionalDeseasonalizer(model="multiplicative", sp=52)),
        ("forecast", ExponentialSmoothing(trend="additive", use_boxcox=False)),
    ]
)

We obtain results for a combination of forecast horizons, coding types (top and bottom), and coding percentages:

* Forecast Horizons: (1, 5, 15)
* Coding Types: (Top, Bottom)
* Coding Percentages: (0.10, 0.20, 0.40)

In [9]:
results_dict_des = {}
types = ["Top", "Bottom"]
percentages = [0.10, 0.20, 0.40]
horizons = [1, 5, 15]

In [10]:
for t in types:
    for p in percentages:
        for h in horizons:
            results_dict_des["h="+str(h)+", "+t+" "+str(p)] = full_coding_analysis(Y, forecaster, forecast_horizon=h, coding_type=t, coding_percentage=p)

In [11]:
results_dict_des

{'h=1, Top 0.1': {'% of forecasted points adjusted downward:': 63.4,
  '% of forecasted points adjusted upward:': 36.6,
  '% Series with improved accuracy:': array([51.8, 51.8]),
  '% Series with worsened accuracy:': array([48.2, 48.2]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change mean global accuracy:': array([-23.1, -23.1]),
  '% Change median global accuracy:': array([-37.1, -37.1])},
 'h=5, Top 0.1': {'% of forecasted points adjusted downward:': 68.2,
  '% of forecasted points adjusted upward:': 31.8,
  '% Series with improved accuracy:': array([48.2, 48.8]),
  '% Series with worsened accuracy:': array([51.8, 51.2]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change mean global accuracy:': array([-2.2, -2.4]),
  '% Change median global accuracy:': array([-7.6, -5.1])},
 'h=15, Top 0.1': {'% of forecasted points adjusted downward:': 55.7,
  '% of forecasted points adjusted upward:': 44.3,
  '% Series with improved accuracy:': array([43.9, 47

***
***

## TES

In [12]:
# define forecasting model
# perform multiplicative deseasonalization conditional on autocorrelation test for seasonality

forecaster = TransformedTargetForecaster(
    [
        ("forecast", ExponentialSmoothing(trend="additive",
                                          seasonal="multiplicative",
                                          sp=52,
                                          damped_trend=False, 
                                          use_boxcox=False)),
    ]
)

We obtain results for a combination of forecast horizons, coding types (top and bottom), and coding percentages:

* Forecast Horizons: (1, 5, 15)
* Coding Types: (Top, Bottom)
* Coding Percentages: (0.10, 0.20, 0.40)

In [13]:
results_dict_tes = {}
types = ["Top", "Bottom"]
percentages = [0.10, 0.20, 0.40]
horizons = [1, 5, 15]

In [14]:
for t in types:
    for p in percentages:
        for h in horizons:
            results_dict_tes["h="+str(h)+", "+t+" "+str(p)] = full_coding_analysis(Y, forecaster, forecast_horizon=h, coding_type=t, coding_percentage=p)

In [15]:
results_dict_tes

{'h=1, Top 0.1': {'% of forecasted points adjusted downward:': 58.5,
  '% of forecasted points adjusted upward:': 41.5,
  '% Series with improved accuracy:': array([48.2, 48.2]),
  '% Series with worsened accuracy:': array([51.8, 51.8]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change mean global accuracy:': array([-9.4, -9.4]),
  '% Change median global accuracy:': array([-11.4, -11.4])},
 'h=5, Top 0.1': {'% of forecasted points adjusted downward:': 60.199999999999996,
  '% of forecasted points adjusted upward:': 39.800000000000004,
  '% Series with improved accuracy:': array([50.6, 51.2]),
  '% Series with worsened accuracy:': array([49.4, 48.8]),
  '% Series with unchanged accuracy:': array([0., 0.]),
  '% Change mean global accuracy:': array([-38.1, -34.9]),
  '% Change median global accuracy:': array([-1.4,  1.9])},
 'h=15, Top 0.1': {'% of forecasted points adjusted downward:': 60.5,
  '% of forecasted points adjusted upward:': 39.5,
  '% Series with improved