# *Ambrosia* in action. Building a simple A/B pipeline on synthetic data

In this example, a short, complete experimental pipeline will be built using various parts of *Ambrosia*. Synthetically generated one week data of daily content views by users is used.

The tutorial will be useful for building a general understanding about building A/B pipelines and using the tools from *Ambrosia*. \
We will not discuss the choice of hypothesis, criteria, or the logic behind certain parameter values.

In [1]:
import sys, os
sys.path.insert(1, os.path.realpath(os.path.pardir))

In [2]:
import pandas as pd

from ambrosia.preprocessing import AggregatePreprocessor
from ambrosia.designer import Designer
from ambrosia.splitter import Splitter
from ambrosia.tester import Tester

Your CPU supports instructions that this binary was not compiled to use: AVX2
For maximum performance, you can install NMSLIB from sources 
pip install --no-binary :all: nmslib


Load data

In [3]:
dataframe = pd.read_csv('../tests/test_data/week_metrics.csv')
dataframe.head()

Unnamed: 0,id,gender,watched,sessions,day,platform
0,0,Male,28.440846,4,1,android
1,1,Female,1.825271,2,1,ios
2,2,Female,46.995606,0,1,web
3,3,Female,37.310264,1,1,ios
4,4,Female,147.513105,0,1,web


## Aggregate data

We would like to run a fixed-horizon A/B test in which we will observe the weekly metrics of users, and for the design of the experiment we have historical data of the size of a week. *(For the real experiments historical data of one week is not enougth in that case)*

First, we need to aggregate the metrics by users in order to bring the metrics of the objects in the dataset to the desired form and so that rows become independent of each other.

In [5]:
transformer = AggregatePreprocessor()

In [6]:
df = transformer.fit_transform(dataframe, groupby_columns='id', agg_params={
    'watched' : 'sum',
    'sessions' : 'max',
    'gender' : 'simple',
    'platform' : 'mode'
})

In [7]:
df.head()

Unnamed: 0,id,watched,sessions,gender,platform
0,0,772.597224,4,Male,ios
1,1,538.076739,6,Female,android
2,2,288.492353,7,Female,android
3,3,373.620408,3,Female,ios
4,4,630.238862,8,Female,ios


## Design A/B test parameters

Let's conduct an experiment design, suppose we want to catch a **5% effect** on the ``watched`` metric with standard I and II type statistical errors. \
How many users should be in each experimental group for that scenario?

We will use theoretical approach for the parameters calculation, and after the end of the experiment we will apply the two sample independent t-test as a statistical criterion.

In [8]:
designer = Designer(dataframe=df, metrics='watched')

In [9]:
designer.run('size', method='theory', effects=1.05)

errors,(0.05; 0.2)
effects,Unnamed: 1_level_1
5.0%,894


**For our experiment, a number of about 900 objects in each experimental group is sufficient**

# Split groups

In our business scenario, we don't need a real-time splitting system and we can use batch group split. We will use the same data frame as a complete database containing unique object IDs and some useful data.

Let's make a group split of the calculated size, that considers ``gender`` and ``platform``  variables stratification. Hash split approach will be used to get the deterministic split result.

In [48]:
splitter = Splitter(dataframe=df,
                    strat_columns=['gender', 'platform'],
                    fit_columns=['sessions'])

In [54]:
splitted_groups = splitter.run(groups_size=900, method='hash', salt='exp_322')

In [55]:
splitted_groups

Unnamed: 0,id,watched,sessions,gender,platform,group
1,1,538.076739,6,Female,android,A
6,6,516.444015,10,Female,android,A
10,10,678.150205,3,Female,android,A
31,31,638.889779,11,Female,android,A
49,49,441.192430,5,Female,android,A
...,...,...,...,...,...,...
378,378,1217.191864,5,Male,android,A
258,258,1356.446101,3,Female,ios,A
1973,1973,662.959150,8,Female,android,B
4324,4324,610.512075,5,Male,web,B


Objects with these identifiers will fall into the corresponding groups. Let's wait for the end of the experiment and look at the result.

## Result measurment

The experiment ended and we received data on daily metrics in both groups for a week. \
Let's aggregate data to weekly values and examine for statistically significant changes.

In [56]:
experiment_result = pd.read_csv('../tests/test_data/watch_result.csv')
experiment_result.head()

Unnamed: 0,id,watched,group,day
0,1708,349.581133,A,1
1,24,124.224169,A,1
2,1692,14.812922,A,1
3,185,179.607284,A,1
4,205,349.539016,A,1


Aggregate

In [58]:
transformer = AggregatePreprocessor(real_method='sum')

In [59]:
df_to_test = transformer.fit_transform(dataframe=experiment_result,
                                       groupby_columns='id',
                                       real_cols='watched',
                                       categorial_cols='group')

In [60]:
df_to_test

Unnamed: 0,id,watched,group
0,6,597.833362,A
1,11,549.314234,A
2,20,564.401942,A
3,21,248.735358,A
4,23,926.048946,B
...,...,...,...
1795,4987,454.662125,A
1796,4988,404.600192,B
1797,4997,594.629770,B
1798,4998,1025.918249,B


Evaluate the result and calculate relative effect with the corresponding CI

In [61]:
tester = Tester(dataframe=df_to_test, metrics='watched', column_groups='group')

In [62]:
tester.run(effect_type='relative', method='theory')

Unnamed: 0,first_type_error,pvalue,effect,confidence_interval,metric name,group A label,group B label
0,0.05,4e-05,0.079901,"(0.0419, 0.1183)",watched,A,B


**For the chosen I type error we obtained a statistically significant result, with a point estimate of the effect of about ~8%**