## Analysis of training AB test results

### Table of content
1. [Description of dataset](#Description)
2. [Main parameters](#parameters)
3. [Data cleaning](#cleaning)
4. [Intermediate conclusion](#conclusion1)
5. [Checking for normality](#normality)
6. [Summary of calculations](#summary)
7. [Conclusion and recommendations](#recommendations)

### Description of dataset <a name="Description"></a>
There was an A/B test proceed at the some web-page.<br>
The goal was to increase revenue from users.<br>
In the attached file you can find raw dataset obtained as a result of this test:<br>
USER_ID - unique number belongs to each user;<br>
VARIANT_NAME - group of testing "control" or "variant";<br>
REVENUE - total amount of income from particular user.<br>

Please analyse data and provide your suggestions to manager.

In [2]:
import scipy.stats as stats
import pandas as pd

import warnings
warnings.filterwarnings('ignore')
warnings.warn('DelftStack')
warnings.warn('Do not show this message')

In [3]:
data = pd.read_csv('./AB_Test_Results.csv', sep=';')
data.head(10)

Unnamed: 0,USER_ID,VARIANT_NAME,REVENUE
0,737,variant,0
1,2423,control,0
2,9411,control,0
3,7311,control,0
4,6174,variant,0
5,2380,variant,0
6,2849,control,0
7,9168,control,0
8,6205,variant,0
9,7548,control,0


Let's check **main parameters** <a name ="parameters"></a>

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   USER_ID       10000 non-null  int64 
 1   VARIANT_NAME  10000 non-null  object
 2    REVENUE      10000 non-null  object
dtypes: int64(1), object(2)
memory usage: 234.5+ KB


Looks like name REVENUE has extra spacers. And for some reason it's dtype is 'object'.<br>
Let's check the name and correct if needed. Also let's change dtype from 'object' to 'float'.

In [5]:
data.columns

Index(['USER_ID', 'VARIANT_NAME', ' REVENUE '], dtype='object')

In [6]:
data = data.rename(columns={' REVENUE ': 'REVENUE'})
data.columns

Index(['USER_ID', 'VARIANT_NAME', 'REVENUE'], dtype='object')

In [7]:
data.REVENUE = data.REVENUE.str.replace(',', '.')
data.REVENUE = data.REVENUE.astype(float)
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   USER_ID       10000 non-null  int64  
 1   VARIANT_NAME  10000 non-null  object 
 2   REVENUE       10000 non-null  float64
dtypes: float64(1), int64(1), object(1)
memory usage: 234.5+ KB


In [8]:
data.describe()

Unnamed: 0,USER_ID,REVENUE
count,10000.0,10000.0
mean,4981.0802,0.099447
std,2890.590115,2.318529
min,2.0,0.0
25%,2468.75,0.0
50%,4962.0,0.0
75%,7511.5,0.0
max,10000.0,196.01


In [9]:
v = data.groupby('USER_ID', as_index=False).agg({'VARIANT_NAME': pd.Series.nunique})
v.sample(15)

Unnamed: 0,USER_ID,VARIANT_NAME
5264,8341,1
2184,3423,1
151,244,1
2561,4001,2
3642,5746,2
3262,5123,1
1039,1616,1
6062,9578,1
3151,4951,1
5759,9113,2


<a name = "cleaning"></a>
As we can see there are some rows in this data with the same USER_ID.<br> That means that data about some users activity got into the table more than one time.<br> It could be that the same user data belongs to both "control" and "variant" group.<br> To make this data suitable for analysis I should remove all duplicates. 

In [10]:
more_than_one_types = v.query('VARIANT_NAME > 1')

In [11]:
data_new = data[~data.USER_ID.isin(more_than_one_types.USER_ID)].sort_values('USER_ID')

In [12]:
data_new.shape

(6070, 3)

In [13]:
data.shape

(10000, 3)

### Intermediate conclusion <a name = "conclusion1"></a>
Almost 40% of data have been removed from dataset.<br> It used to be 10 000 rows, and now we have only 6070.<br> This means that the separation into groups was carried out incorrectly. I should recommend to proceed an A/A test to check what is wrong and fix it first, and then design another A/B test for this web page.<br> At the same time it is a training dataset. So I am going to analyse rest 60% of data, try to make conclusions and give some recommendations.

In [14]:
data_new.sample(10)

Unnamed: 0,USER_ID,VARIANT_NAME,REVENUE
7949,56,variant,2.99
342,4918,control,0.0
6030,9925,control,0.0
1845,6239,control,0.0
842,5748,variant,0.0
3300,2663,variant,0.0
1446,1164,control,0.0
6840,7195,control,0.0
4518,3391,variant,0.0
4748,6114,variant,0.0


In [15]:
data_new.VARIANT_NAME.value_counts()

variant    3044
control    3026
Name: VARIANT_NAME, dtype: int64

It is almost 50% of data in each group now.<br>
Let's check whether it **normal distribution or not**. <a name = "normality"></a>

In [16]:
alpha = 0.05

st = stats.shapiro(data_new.REVENUE)

print('Distribution is {}normal\n'.format( {True:'not ', False:''}[st[1] < alpha]));

Distribution is not normal



Because of the distribution is not normal let's use Mann-Whitney test.

In [17]:
control = data_new.query('VARIANT_NAME == "control"')
test = data_new.query('VARIANT_NAME == "variant"')

In [18]:
stats.mannwhitneyu(x=control['REVENUE'].values, y=test['REVENUE'].values)

MannwhitneyuResult(statistic=4622832.0, pvalue=0.2444173738649208)

In [19]:
control_pays = control[control['REVENUE'] > 0]
test_pays = test[test['REVENUE'] > 0]

number_of_con_pay = control_pays.shape[0]
number_of_test_pay = test_pays.shape[0]

print(f'Number of paying users in control group is {number_of_con_pay}')
print (f'Number of paying users in variant group is {number_of_test_pay}')

Number of paying users in control group is 54
Number of paying users in variant group is 43


In [20]:
test_pays_sum = test_pays.REVENUE.sum()
control_pays_sum = control_pays.REVENUE.sum()
print (f'Total revenue from users in control group is {round(control_pays_sum, 3)}') 
print (f'Total revenue from users in variant group is {round(test_pays_sum, 3)}')

Total revenue from users in control group is 470.56
Total revenue from users in variant group is 179.32


In [21]:
arppu_test = test_pays.REVENUE.sum() / number_of_test_pay
arppu_control = control_pays.REVENUE.sum() / number_of_con_pay
print (f'ARPPU in control group is {round(arppu_control, 3)}')
print (f'ARPPU in variant group is {round(arppu_test, 3)}') 

ARPPU in control group is 8.714
ARPPU in variant group is 4.17


In [22]:
number_of_test = test.shape[0]
number_of_control = control.shape[0]

arpu_test = test_pays.REVENUE.sum() / number_of_test
arpu_control = control_pays.REVENUE.sum() / number_of_control

print (f'ARPU in control group is {round(arpu_control, 3)}')
print (f'ARPU in variant group is {round(arpu_test, 3)}') 

ARPU in control group is 0.156
ARPU in variant group is 0.059


### Summary of calculations <a name = "summary"></a>

In [25]:
d = {'Number of paying users': [number_of_test_pay, number_of_con_pay, round(number_of_con_pay/number_of_test_pay, 1) ], 
     'Total revenue': [test_pays_sum, control_pays_sum, round(control_pays_sum/test_pays_sum, 3)],
     'ARPU': [arpu_test, arpu_control, round(arpu_control/arpu_test, 3)],
     'ARPPU': [arppu_test, arppu_control, round(arppu_control/arppu_test, 3)]
    }
index = ['variant', 'control', 'difference']
result = pd.DataFrame(data=d, index=index)
result

Unnamed: 0,Number of paying users,Total revenue,ARPU,ARPPU
variant,43.0,179.32,0.058909,4.170233
control,54.0,470.56,0.155506,8.714074
difference,1.3,2.624,2.64,2.09


### Conclusion and recommendations <a name = "recommendations"></a>
According to Mann-Whitney test pvalue is about 0.24 and it is more than alpha, which is 0.05. This means that control and variant distributions are equal.<br>
At the same time all key parameters of variant group less than in control group.<br>
But the goal of changes at this web page was to increase revenue. This goal obviously did not reach.<br>
In addition, I have to mention that I removed almost 40% of duplicates from raw dataset, and I suspect that separation of users to groups was incorrect.<br>
Based on this analysis I recommend to go back to beginning and:
1. Plan and execute A/A test to identify mistakes in separation mechanism
2. Analyse and choose another hypothesis
3. Design and execute another A/B test of this hypothesis

