In [42]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import matplotlib.pyplot as plt

In [4]:
# File paths
SYNTH_ADDITIVE_PATH = 'SYNTHDataset/SYNTH_additive.csv'
SYNTH_ADITIVE_REVERSALS_PATH = 'SYNTHDataset/SYNTH_additive_reversals.csv'

In [7]:
ts_additive = pd.read_csv(SYNTH_ADDITIVE_PATH, index_col=0)
ts_additive_reversals = pd.read_csv(SYNTH_ADITIVE_REVERSALS_PATH, index_col=0)

Due to the specifics of trends and reversals, the synthetic dataset with trend and the synthetic dataset with trend and reversals have different ranges --- this is due to the fact that the series with additive trend is constantly increasing while the one with reversals does not. We have so far not been able to model reversals in a way that an overall rising trend is maintained, but are working to fix that

In [27]:
# Assuming ts_additive is your DataFrame or Series
print('Descriptive Statistics of Synthetic Data with Additive Trend')
print('-'*80)
print(ts_additive.describe().loc[['max', 'mean', 'std', 'min']])
print('='*80)
print('Descriptive Statistics of Synthetic Data with Additive Trend and Reversals')
print('-'*80)
print(ts_additive_reversals.describe().loc[['max', 'mean', 'std', 'min']])


Descriptive Statistics of Synthetic Data with Additive Trend
--------------------------------------------------------------------------------
          TARGET
max   294.487557
mean  187.090119
std    51.309079
min    76.911257
Descriptive Statistics of Synthetic Data with Additive Trend and Reversals
--------------------------------------------------------------------------------
          TARGET
max   136.278350
mean  101.670413
std    11.880123
min    65.445931


Here I will simulate train-test-split similar to what the models have:

In [56]:
border_1 = 12*30*24
border_2 = border_1 + 4*30*24
border_3 = border_2 + 4*30*24

train_additive, val_additive, test_additive = ts_additive[:border_1], ts_additive[border_1:border_2], ts_additive[border_2:border_3]
train_aditive_reversals, val_additive_reversals, test_additive_reversals = ts_additive_reversals[:border_1], ts_additive_reversals[border_1:border_2], ts_additive_reversals[border_2:border_3]


# I will only transform the test sets just for demonstration
scaler = StandardScaler()
scaler.fit(train_additive)
scaled_test_additive = scaler.transform(test_additive)

# Reinitialize just in case
scaler = StandardScaler()
scaler.fit(train_aditive_reversals)
scaled_test_additive_reversals = scaler.transform(test_additive_reversals)

In [57]:
print('Descriptive Statistics of Scaled Synthetic Data with Additive Trend')
print('-'*80)
print(f'max     {scaled_test_additive.max():.4f}\nmean     {scaled_test_additive.mean():.4f}\nstd     {scaled_test_additive.std():.4f}\nmin     {scaled_test_additive.min():.4f}')
print('='*80)
print('Descriptive Statistics of Scaled Synthetic Data with Additive Trend and Reversals')
print('-'*80)
print(f'max     {scaled_test_additive_reversals.max():.4f}\nmean     {scaled_test_additive_reversals.mean():.4f}\nstd     {scaled_test_additive_reversals.std():.4f}\nmin     {scaled_test_additive_reversals.min():.4f}')
print('='*80)



Descriptive Statistics of Scaled Synthetic Data with Additive Trend
--------------------------------------------------------------------------------
max     4.4837
mean     3.1865
std     0.5036
min     1.6672
Descriptive Statistics of Scaled Synthetic Data with Additive Trend and Reversals
--------------------------------------------------------------------------------
max     1.5657
mean     -0.5691
std     0.8611
min     -3.0646


The discrepancy between the values obtained by StandardScaling and thus difference in the range/scale of MSE scores can be attributed to the fact the two series, one with trend only and one with trend and reversals have different means and different standard deviations.
The discrepancy can also be explained by the fact that the scaler is fitted on the training sets and the synthetic series with trend but without reversals is not mean-stationary --- therefore the test set has vastly different mean.

One more consideration here is the specifics of the data generator function we use --- in its current form, instead of 'adding' a trend to an existing series per se, it generates a new series with a trend --- this is something that we are working to fix at this moment to tailor the function better to our needs. But with or without that, that range of the synthetic series with trend and synthetic series with trend and reversals is likely to be different due to the specifics of StandardScaling.
