# *Ambrosia* advanced metric transformation tools overview

This notebook contains examples of using classes ``Cuped``, ``MultiCuped`` and ``MLVarianceReducer`` designed to reduce the variance of target metrics. Synthetically generated data is used for this purpose. **This data is artificial, so everything turned out very well.**

There will be no theoretical aspects and details of these techniques, they will be added later. Use this notebook as API tutorial only.

In [2]:
import pandas as pd
from ambrosia.preprocessing import Cuped, MultiCuped, MLVarianceReducer

Load data

In [3]:
data = pd.read_csv('../tests/test_data/var_table.csv')

In [4]:
data.head()

Unnamed: 0,feature_1,feature_2,feature_3,target
0,-2.426916,5.575498,43.505323,187.385459
1,-2.745189,7.995822,19.942889,99.691566
2,2.437555,17.254237,33.091612,188.880782
3,6.202871,28.913551,25.026746,199.53256
4,3.099725,3.771417,26.403917,121.956238


In [5]:
target_column = 'target'

## CUPED

In [6]:
cuped = Cuped()

Fit and transform

In [7]:
cuped.fit_transform(
    dataframe=data,
    target_column=target_column,
    covariate_column='feature_2',
    transformed_name='target_cuped',
    inplace=True,
)

ambrosia LOGGER: After transformation СUPED for target, the variance is 67.0818 % of the original
ambrosia LOGGER: Variance transformation 2982.4627 ===> 2000.6892


Unnamed: 0,feature_1,feature_2,feature_3,target,target_cuped
0,-2.426916,5.575498,43.505323,187.385459,204.513107
1,-2.745189,7.995822,19.942889,99.691566,109.350175
2,2.437555,17.254237,33.091612,188.880782,169.968233
3,6.202871,28.913551,25.026746,199.532560,144.639755
4,3.099725,3.771417,26.403917,121.956238,144.651222
...,...,...,...,...,...
2995,1.277060,22.630330,36.479685,216.416345,180.913351
2996,5.124652,58.120888,13.836445,239.307014,94.281340
2997,-0.654616,3.930848,32.036205,139.957720,162.160705
2998,0.401016,29.254561,38.268808,240.608496,184.663346


Store fitted params

In [8]:
store_path_cuped = '_examples_configs/cuped_config.json'

In [9]:
cuped.get_params_dict()

{'target_column': 'target',
 'transformed_name': 'target_cuped',
 'covariate_column': 'feature_2',
 'theta': 3.085966714908545,
 'bias': 11.125671107545354}

In [10]:
cuped.store_params(store_path_cuped)

Load params

In [11]:
new_cuped = Cuped()
new_cuped.load_params(store_path_cuped)

In [12]:
new_cuped.get_params_dict()

{'target_column': 'target',
 'transformed_name': 'target_cuped',
 'covariate_column': 'feature_2',
 'theta': 3.085966714908545,
 'bias': 11.125671107545354}

## MultiCuped

In [13]:
multicuped = MultiCuped()

Fit and transform

In [14]:
multicuped.fit_transform(dataframe=data,
                         target_column=target_column,
                         covariate_columns=['feature_2', 'feature_3'],
                         transformed_name='target_multicuped',
                         inplace=True)

ambrosia LOGGER: After transformation Multi СUPED for target, the variance is 1.2779 % of the original
ambrosia LOGGER: Variance transformation 2982.4627 ===> 38.1133


Unnamed: 0,feature_1,feature_2,feature_3,target,target_cuped,target_multicuped
0,-2.426916,5.575498,43.505323,187.385459,204.513107,141.715314
1,-2.745189,7.995822,19.942889,99.691566,109.350175,140.948473
2,2.437555,17.254237,33.091612,188.880782,169.968233,149.436534
3,6.202871,28.913551,25.026746,199.532560,144.639755,156.975607
4,3.099725,3.771417,26.403917,121.956238,144.651222,150.181834
...,...,...,...,...,...,...
2995,1.277060,22.630330,36.479685,216.416345,180.913351,147.103213
2996,5.124652,58.120888,13.836445,239.307014,94.281340,152.893408
2997,-0.654616,3.930848,32.036205,139.957720,162.160705,145.165200
2998,0.401016,29.254561,38.268808,240.608496,184.663346,144.036343


Store fitted params

In [15]:
store_path_multicuped = '_examples_configs/multicuped_config.json'

In [16]:
multicuped.get_params_dict()

{'target_column': 'target',
 'transformed_name': 'target_multicuped',
 'covariate_columns': ['feature_2', 'feature_3'],
 'theta': [[3.034447972098987], [4.000919354909565]],
 'bias': 145.30970530527566}

In [17]:
multicuped.store_params(store_path_multicuped)

Load params

In [18]:
new_multicuped = MultiCuped()
new_multicuped.load_params(store_path_multicuped)

In [19]:
new_multicuped.get_params_dict()

{'target_column': 'target',
 'transformed_name': 'target_multicuped',
 'covariate_columns': ['feature_2', 'feature_3'],
 'theta': [[3.034447972098987], [4.000919354909565]],
 'bias': 145.30970530527566}

## ML Variance Reduction

In [20]:
mltransformer = MLVarianceReducer()

Fit and transform

In [21]:
mltransformer.fit_transform(dataframe=data,
                            target_column=target_column,
                            covariate_columns=['feature_2', 'feature_3'],
                            transformed_name='target_mlreducer',
                            inplace=True)

ambrosia LOGGER: After transformation ML approach reduce for target, the variance is 0.9774 % of the original
ambrosia LOGGER: Variance transformation 2982.4627 ===> 29.1504
ambrosia LOGGER: Prediction MSE score - 2945.29041


Unnamed: 0,feature_1,feature_2,feature_3,target,target_cuped,target_multicuped,target_mlreducer
0,-2.426916,5.575498,43.505323,187.385459,204.513107,141.715314,144.540545
1,-2.745189,7.995822,19.942889,99.691566,109.350175,140.948473,141.665906
2,2.437555,17.254237,33.091612,188.880782,169.968233,149.436534,149.703421
3,6.202871,28.913551,25.026746,199.532560,144.639755,156.975607,153.785873
4,3.099725,3.771417,26.403917,121.956238,144.651222,150.181834,150.223084
...,...,...,...,...,...,...,...
2995,1.277060,22.630330,36.479685,216.416345,180.913351,147.103213,148.719639
2996,5.124652,58.120888,13.836445,239.307014,94.281340,152.893408,151.970972
2997,-0.654616,3.930848,32.036205,139.957720,162.160705,145.165200,146.166828
2998,0.401016,29.254561,38.268808,240.608496,184.663346,144.036343,141.938850


**Note:** Be careful about overfitting and failing method(s) prerequisites

<video src="https://upload.wikimedia.org/wikipedia/commons/4/42/Shepard_Calais_1906_FrenchGP.ogv" controls autoplay loop>The HTML video element is not supported!</video>

## Final variance of the target metric

In [22]:
data[['target', 'target_cuped', 'target_multicuped', 'target_mlreducer']].var()

target               2983.457229
target_cuped         2001.356367
target_multicuped      38.126050
target_mlreducer       29.160151
dtype: float64

We can observe different variance reduction of the target metric 

---

## Learn more

To get the information on advanced metric transformation techniques see the following resources:


* [Paper on CUPED](https://www.exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf)
* [Booking article on CUPED](https://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d)
* [Avito article on ML-based criteria](https://habr.com/ru/companies/avito/articles/590105/)
* [Article with research on variance reduction techniques](https://j-sephb-lt-n.github.io/exploring_statistics/cuped_cupac_and_other_variance_reduction_techniques.html)