# Code for Top and Bottom Coding Protection Methods

***

Borrowing the notation from [Schneider & Lee (2022)](https://arxiv.org/abs/2106.16085):

There are $J$ times series denoted $y_j$.

The confidential (actual) value of $y_j$ at time $t$ is denoted $A_{j,t}$.

Under `Top Coding`, the top $p$ percent of observations are replaced with the $1-p$ quantile. Under `Bottom Coding`, the bottom $p$ percent of observations are replaced with the $p$ quantile.

***

In [1]:
# general modules
import numpy as np
import pandas as pd

# nice time series plots
from sktime.utils.plotting import plot_series

In [2]:
# import weekly finance series, skipping column names
Y = np.genfromtxt("../../../Data/Train/Clean/weekly_finance_clean.csv", delimiter = ',', skip_header = 1)

In [3]:
y = Y[0,:]

In [5]:
q = np.quantile(y, q=0.1)

In [12]:
y_bc = np.array([i if i > q else q for i in y])

In [None]:
def coding_protection(series, coding_type, percent_protected):
    if coding_type=="Bottom":
        q = np.quantile(series, q=percent_protected)
        series_bc = [i if i > q else q for i in series]
    elif coding_type=="Top":
        q = np.quantile(series, q=1-percent_protected)
        series_bc = [i if i < q else q for i in series]