# A/B testing for made-up project - Kittengram (365 Data Science project)

*Kittengram* - analog of Instagram, where users can post photos of their cats (only cats are monitored by a machine algorithm and in complex cases, a small group of researchers). 

**The main business task** facing the project and requiring a/b test

Kittengram received an offer from advertisers with a relevant product (cat food). The advertising involves affiliate program royalties. 

**The main purpose of the test** 
is to compare the control group and the test group for interest and conversion to this type of advertising. The main metrics that will interest us are


---

**Success metrics** - CTR + Affilate revenue, number of users making purchases, number of customers 

**Guardrial metric** - stable DAU


**A hypothesis**
Users have more relevan ads - They are more likely to click - More likely to purchase - More User satisfaction - Higher affilate revenue

**H1** - With the new type od ads users are more likely click on ads and CTR will increase per user
**H0** - New type of ads will have no effect on user engagement and will not affect CTR

**Minimum Detectable Effect** - at least 91 DAU difference between test and contorl group (0,33%) and CTR at least 1,8pp

# A/B test analysis

## Assignments

In [None]:
#Import libraries

import pandas as pd
import numpy as np
import altair as alt
alt.data_transformers.disable_max_rows()
from datetime import datetime
from scipy.stats import ttest_ind

In [None]:
#Import dataset

data = pd.read_csv("data/assignments.csv")

In [None]:
data.head()

Unnamed: 0,userid,ts,groupid
0,c5d77c89-33a3-4fe3-9e31-179dec09d49c,2021-11-02T07:31:42Z,0
1,9061d751-7a94-44d3-8792-5ca5ec59aa89,2021-11-13T07:43:51Z,0
2,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-11-20T19:26:07Z,0
3,d2646662-269f-49de-aab1-8776afced9a3,2021-11-20T11:09:02Z,0
4,2d9b23b7-4e5e-4162-9f0f-49e593fdd2b5,2021-11-04T07:42:07Z,0


In [None]:
 #Change the data type
 
 print(datetime.strptime(data.head(1)['ts'][0], '%Y-%m-%dT%H:%M:%SZ').strftime("%Y-%m-%d"))

In [None]:
#New column with a data

data['dt'] = data['ts'].map(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%SZ').strftime("%Y-%m-%d"))

In [None]:
data.head()

Unnamed: 0,userid,ts,groupid,dt
0,c5d77c89-33a3-4fe3-9e31-179dec09d49c,2021-11-02T07:31:42Z,0,2021-11-02
1,9061d751-7a94-44d3-8792-5ca5ec59aa89,2021-11-13T07:43:51Z,0,2021-11-13
2,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-11-20T19:26:07Z,0,2021-11-20
3,d2646662-269f-49de-aab1-8776afced9a3,2021-11-20T11:09:02Z,0,2021-11-20
4,2d9b23b7-4e5e-4162-9f0f-49e593fdd2b5,2021-11-04T07:42:07Z,0,2021-11-04


In [None]:
#From the mean we can see that two our groups have even distribution

data.describe()

Unnamed: 0,groupid
count,60000.0
mean,0.500817
std,0.500003
min,0.0
25%,0.0
50%,1.0
75%,1.0
max,1.0


In [None]:
#Actual number of users

data.groupby(['groupid']).count()

Unnamed: 0_level_0,userid,ts,dt
groupid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,29951,29951,29951
1,30049,30049,30049


In [None]:
#How many users assigned to each group in each day

data_count = data.groupby(['groupid','dt']).count().reset_index()

In [None]:
data_count.head()

Unnamed: 0,groupid,dt,userid,ts
0,0,2021-11-01,1497,1497
1,0,2021-11-02,1467,1467
2,0,2021-11-03,1532,1532
3,0,2021-11-04,1509,1509
4,0,2021-11-05,1503,1503


In [None]:
#How Assigments look like

alt.Chart(data_count).mark_line(size=3).encode(
    alt.X('dt'),
    alt.Y('userid'),
    color='groupid:O',
    tooltip=['userid']
).properties(
    width=600,
    height=400
)

#Uniformly distributed across all the dates of our test

## Pre-test metrics

### User activity

In [None]:
#Import a file with user activity

data_act = pd.read_csv("data/activity_all.csv")

In [None]:
data_act.head()

Unnamed: 0,userid,dt,groupid,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,1,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,1,0


In [None]:
data_act.groupby(['groupid','dt']).describe()

#Average activity level of the user was around 5 times a day in the control group and around 10 times a day in the test group

Unnamed: 0_level_0,Unnamed: 1_level_0,activity_level,activity_level,activity_level,activity_level,activity_level,activity_level,activity_level,activity_level
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max
groupid,dt,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
0,2021-10-01,29951.0,5.241762,6.516640,0.0,0.0,1.0,10.0,20.0
0,2021-10-02,29951.0,5.255885,6.509838,0.0,0.0,1.0,10.0,20.0
0,2021-10-03,29951.0,5.266068,6.511458,0.0,0.0,1.0,10.0,20.0
0,2021-10-04,29951.0,5.212447,6.511711,0.0,0.0,1.0,10.0,20.0
0,2021-10-05,29951.0,5.177590,6.512791,0.0,0.0,1.0,10.0,20.0
...,...,...,...,...,...,...,...,...,...
1,2021-11-26,30049.0,10.031216,5.770582,0.0,5.0,10.0,15.0,20.0
1,2021-11-27,30049.0,10.026024,5.774141,0.0,5.0,10.0,15.0,20.0
1,2021-11-28,30049.0,9.975307,5.788257,0.0,5.0,10.0,15.0,20.0
1,2021-11-29,30049.0,9.970781,5.799546,0.0,5.0,10.0,15.0,20.0


In [None]:
#How many users during a day had an activity level of over than zero

data_act.query('activity_level > 0').groupby(['dt', 'groupid']).count().reset_index().head()

Unnamed: 0,dt,groupid,userid,activity_level
0,2021-10-01,0,15337,15337
1,2021-10-01,1,15297,15297
2,2021-10-02,0,15354,15354
3,2021-10-02,1,15421,15421
4,2021-10-03,0,15423,15423


In [None]:
alt.Chart(data_act.query('activity_level > 0').groupby(['dt', 'groupid']).count().reset_index()).mark_line(size=3).encode(
    alt.X('dt'),
    alt.Y('userid'),
    color='groupid:O',
    tooltip=['userid']
).properties(
    width=600,
    height=400
)

In [None]:
#Control group

(
    data_act.query('activity_level > 0 and groupid == 0 and dt >= "2021-11-01"')
    .groupby(['dt','groupid']).count().reset_index()[['groupid','activity_level']].describe()
)

Unnamed: 0,groupid,activity_level
count,30.0,30.0
mean,0.0,15782.0
std,0.0,371.077276
min,0.0,15163.0
25%,0.0,15335.0
50%,0.0,15990.5
75%,0.0,16045.0
max,0.0,16147.0


In [None]:
#Test group

(
    data_act.query('activity_level > 0 and groupid == 1 and dt >= "2021-11-01"')
    .groupby(['dt','groupid']).count().reset_index()[['groupid','activity_level']].describe()
)

Unnamed: 0,groupid,activity_level
count,30.0,30.0
mean,1.0,29302.433333
std,0.0,30.417422
min,1.0,29255.0
25%,1.0,29280.0
50%,1.0,29300.0
75%,1.0,29321.0
max,1.0,29382.0


In [None]:
#Average users were active after the start of the test

data_act.query('dt >= "2021-11-01"').groupby(['groupid']).describe()

Unnamed: 0_level_0,activity_level,activity_level,activity_level,activity_level,activity_level,activity_level,activity_level,activity_level
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
groupid,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
0,898530.0,5.402211,6.55557,0.0,0.0,1.0,11.0,20.0
1,901470.0,9.996304,5.78868,0.0,5.0,10.0,15.0,20.0


In [None]:
#Before the test

data_act.query('dt < "2021-11-01"').groupby('groupid').describe()

Unnamed: 0_level_0,activity_level,activity_level,activity_level,activity_level,activity_level,activity_level,activity_level,activity_level
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
groupid,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
0,928481.0,5.245635,6.521184,0.0,0.0,1.0,10.0,20.0
1,931519.0,5.240952,6.520811,0.0,0.0,1.0,10.0,20.0


In [None]:
data_act_count = data_act.query('activity_level > 0').groupby(['groupid','dt']).count().reset_index()

In [None]:
data_act_count.head()

Unnamed: 0,groupid,dt,userid,activity_level
0,0,2021-10-01,15337,15337
1,0,2021-10-02,15354,15354
2,0,2021-10-03,15423,15423
3,0,2021-10-04,15211,15211
4,0,2021-10-05,15126,15126


In [None]:
alt.Chart(data_act_count).mark_line(size=3).encode(
    alt.X('dt'),
    alt.Y('userid'),
    color='groupid:O',
    tooltip=['userid']
).properties(
    width=600,
    height=400
)

### Comparing the activity between the groups

By the activity levels

In [None]:
#Pass activity levels to the T test

data_act.query('groupid == 0')['activity_level'].to_numpy()

array([ 0,  0,  0, ..., 20, 20, 20])

In [None]:
#Print the result

res = ttest_ind(data_act.query('groupid == 0 and dt >= "2021-11-01"')['activity_level'].to_numpy(),
                data_act.query('groupid == 1 and dt >= "2021-11-01"')['activity_level'].to_numpy()).pvalue

print(res)

0.0


In [None]:
#There is extremely high significance because changes are much bigger than minimum detectable effect, there is very low chance of randomness 

"{:.100f}".format(res)

'0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'

By the number of active users

In [None]:
before = data_act_count.query('dt < "2021-11-01"')

In [None]:
after = data_act_count.query('dt >= "2021-11-01"')

In [None]:
before.head()

Unnamed: 0,groupid,dt,userid,activity_level
0,0,2021-10-01,15337,15337
1,0,2021-10-02,15354,15354
2,0,2021-10-03,15423,15423
3,0,2021-10-04,15211,15211
4,0,2021-10-05,15126,15126


Checking for the pretest bias on activity.

In [None]:
np.mean(before.query('groupid == 0')['userid'].to_numpy())

15320.870967741936

In [None]:
np.mean(before.query('groupid == 1')['userid'].to_numpy())

15352.516129032258

In [None]:
#P value is very high which indicates there are no differences in results

res = ttest_ind(before.query('groupid == 0')['userid'].to_numpy(), before.query('groupid == 1')['userid']
                .to_numpy()).pvalue

print(res)

0.1630842353828083


In [None]:
"{:.100f}".format(res)

'0.1630842353828083068911780628695851191878318786621093750000000000000000000000000000000000000000000000'

In [None]:
np.mean(after.query('groupid == 0')['userid'].to_numpy())

15782.0

In [None]:
#User activity during test period

np.mean(after.query('groupid == 1')['userid'].to_numpy())

29302.433333333334

In [None]:
#P value for control group after the test and the test group after the test

res = ttest_ind(after.query('groupid == 0')['userid'].to_numpy(), after.query('groupid == 1')['userid']
                .to_numpy()).pvalue

print(res)

6.590603584107244e-84


In [None]:
#The results of the test affect the user activity regardless of way which we calculated

"{:.100f}".format(res)

'0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000065906035841072442'

### Click through rate (CTR)

In [None]:
#Import file with ctr

data_ctr = pd.read_csv("data/ctr_all.csv")

In [None]:
data_ctr.head()

Unnamed: 0,userid,dt,groupid,ctr
0,60389fa7-2d71-4cdf-831c-c2bb277ffa1e,2021-11-13,0,31.81
1,b59cb225-d160-4851-92d2-7cc8120a2f63,2021-11-13,0,30.46
2,aa336050-934e-453f-a5b0-dd881fcd114e,2021-11-13,0,34.25
3,8df767f4-a10f-4322-a722-676b7e02b372,2021-11-13,0,34.92
4,a74762ed-4da0-42ab-91d2-40d7e808dfe9,2021-11-13,0,34.95


In [None]:
data_ctr_avg = data_ctr.groupby(['groupid','dt']).mean().reset_index()

In [None]:
#Average CTR for each date for each group

alt.Chart(data_ctr_avg).mark_line(size=5).encode(
    alt.X('dt'),
    alt.Y('ctr'),
    color='groupid:O',
    tooltip=['ctr']
).properties(
    width=600,
    height=400
)

In [None]:
before = data_ctr.query('dt < "2021-11-01"')[['groupid', 'ctr']]

In [None]:
after = data_ctr.query('dt >= "2021-11-01"')[['groupid', 'ctr']]

In [None]:
after

Unnamed: 0,groupid,ctr
0,0,31.81
1,0,30.46
2,0,34.25
3,0,34.92
4,0,34.95
...,...,...
2303403,1,37.27
2303404,1,39.14
2303405,1,40.05
2303406,1,38.14


In [None]:
before.query('groupid == 0')['ctr'].to_numpy().mean()

33.00091277553074

In [None]:
before.query('groupid == 1')['ctr'].to_numpy().mean()

32.99957172093258

In [None]:
after.query('groupid == 0')['ctr'].to_numpy().mean()

32.996977569382835

In [None]:
#Around 5% difference after the test between group 0 and 1

after.query('groupid == 1')['ctr'].to_numpy().mean()

37.99695912626142

In [None]:
before.query('groupid == 0')['ctr'].to_numpy().std()

1.7336979501682888

In [None]:
before.query('groupid == 1')['ctr'].to_numpy().std()

1.7296548367391134

In [None]:
after.query('groupid == 0')['ctr'].to_numpy().std()

1.7331985918552912

In [None]:
after.query('groupid == 1')['ctr'].to_numpy().std()

1.7323710606903675

In [None]:
#Compare the before the group zero to group 1 , very little difference between 2 groups. Two groups have really similiar CTR

res = ttest_ind(before.query('groupid == 0')['ctr'].to_numpy(), before.query('groupid == 1')['ctr']
                .to_numpy()).pvalue

print(res)

0.705741417344299


In [None]:
res = ttest_ind(after.query('groupid == 0')['ctr'].to_numpy(), after.query('groupid == 1')['ctr']
                .to_numpy()).pvalue
print(res)

0.0


In [None]:
#And after the start and we have a very high certainity in the difference in CTR, because our minimum detectable effect here is large or 5% increase

"{:.100f}".format(res)

'0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'

Based on the test, we can say that the test was successful and we have a 5% increase in CTR and also a pretty sizable increase in user activity