# `Ambrosia` in action. Building a complete A/B pipeline

In [4]:
import pandas as pd

from ambrosia.preprocessing import AggregatePreprocessor
from ambrosia.designer import Designer
from ambrosia.splitter import Splitter
from ambrosia.tester import Tester

Your CPU supports instructions that this binary was not compiled to use: AVX2
For maximum performance, you can install NMSLIB from sources 
pip install --no-binary :all: nmslib


### Simultaneous usage of AggregatePreprocessor, Designer, Splitter and Tester classes

In [5]:
dataframe = pd.read_csv('../tests/test_data/week_metrics.csv')
dataframe.head()

Unnamed: 0,id,gender,watched,sessions,day,platform
0,0,Male,28.440846,4,1,android
1,1,Female,1.825271,2,1,ios
2,2,Female,46.995606,0,1,web
3,3,Female,37.310264,1,1,ios
4,4,Female,147.513105,0,1,web


### Aggregate data

In [6]:
transformer = AggregatePreprocessor()

In [7]:
df = transformer.fit_transform(dataframe, groupby_columns='id', agg_params={
    'watched' : 'sum',
    'sessions' : 'max',
    'gender' : 'simple', # simple - choose the first possible value
    'platform' : 'mode'
})

In [8]:
df

Unnamed: 0,id,watched,sessions,gender,platform
0,0,772.597224,4,Male,ios
1,1,538.076739,6,Female,android
2,2,288.492353,7,Female,android
3,3,373.620408,3,Female,ios
4,4,630.238862,8,Female,ios
...,...,...,...,...,...
4995,4995,390.133588,9,Male,android
4996,4996,544.423724,15,Female,ios
4997,4997,204.713032,6,Male,android
4998,4998,1088.642872,10,Female,web


### Let's conduct an experiment design, suppose we want to catch a 5% effect on the ``watched`` metric

In [9]:
designer = Designer(dataframe=df, metrics='watched')

In [10]:
designer.run('size', effects=1.05)

errors,(0.05; 0.2)
effects,Unnamed: 1_level_1
5.0%,893


**For our experimrnt setting a sample size of about 900 objects is enough**

### Now let's make a group split of the designed size, that considers ``gender`` and ``platform``  variables stratification, and the number of ``sessions`` as objects proximity metric

In [11]:
splitter = Splitter(dataframe=df, strat_columns=['gender', 'platform'], fit_columns=['sessions'])

In [12]:
to_exp = splitter.run(groups_size=900, method='metric')

In [13]:
to_exp

Unnamed: 0,id,watched,sessions,gender,platform,group
8,8,403.033966,9,Female,android,A
1160,1160,749.689241,2,Female,android,A
353,353,569.724941,9,Female,android,A
277,277,545.971527,3,Female,android,A
1136,1136,313.996511,8,Female,android,A
...,...,...,...,...,...,...
3360,3360,897.219020,5,Male,ios,A
4859,4859,709.632395,5,Male,web,A
2687,2687,359.583783,7,Female,ios,B
4877,4877,752.709347,16,Male,android,B


## After some time the experiment is completed and the final data is received

In [14]:
table_result = pd.read_csv('../tests/test_data/watch_result.csv')
table_result

Unnamed: 0,id,watched,group,day
0,1708,349.581133,A,1
1,24,124.224169,A,1
2,1692,14.812922,A,1
3,185,179.607284,A,1
4,205,349.539016,A,1
...,...,...,...,...
12595,4274,15.077662,B,7
12596,1504,7.062741,B,7
12597,2911,421.190194,B,7
12598,3531,207.565094,B,7


### Aggregate data again

In [15]:
transformer = AggregatePreprocessor(real_method='sum')

In [16]:
to_test = transformer.fit_transform(table_result, groupby_columns='id', real_cols='watched', categorial_cols='group')
to_test

Unnamed: 0,id,watched,group
0,6,597.833362,A
1,11,549.314234,A
2,20,564.401942,A
3,21,248.735358,A
4,23,926.048946,B
...,...,...,...
1795,4987,454.662125,A
1796,4988,404.600192,B
1797,4997,594.629770,B
1798,4998,1025.918249,B


### Get an estimate of the experiment results

In [17]:
tester = Tester(dataframe=to_test, metrics='watched', column_groups='group')

In [18]:
tester.run(effect_type='relative', method='theory')

Unnamed: 0,first_type_error,pvalue,effect,confidence_interval,metric name,group A label,group B label
0,0.05,4e-05,0.079901,"(0.0419, 0.1183)",watched,A,B


**Have a statistically significant result, and a point estimate of the effect of about ~ 8%**