# A/B Test Challenge



---

#### What is an A/B Test? 

It is a decision making support & research methodology that allow you to measure an impact of a change in a product (e.g.: a digital product). For this challenge you will analyse the data resulting of an A/B test performed on a digital product where a new set of sponsored ads are included.


#### Measure of success

Metrics are needed to measure the success of your product. They are typically split in the following categories: 

- __Enganged based metrics:__ number of users, number of downloads, number of active users, user retention, etc.

- __Revenue and monetization metrics:__ ads and affiliate links, subscription-based, in-app purchases, etc.

- __Technical metrics:__ service level indicators (uptime of the app, downtime of the app, latency).



---

## Metrics understanding

In this part you must analyse the metrics involved in the test. We will focus in the following metrics:

- Activity level + Daily active users (DAU).

- Click-through rate (CTR)

### Activity level

In the following part you must perform every calculation you consider necessary in order to answer the following questions:

- How many activity levels you can find in the dataset (Activity level of zero means no activity).

- What is the amount of users for each activity level.

- How many activity levels do you have per day and how many records per each activity level.

At the end of this section you must provide your conclusions about the _activity level_ of the users.

__Dataset:__ `activity_pretest.csv`

In [1]:
# your-code
import pandas as pd
activity = pd.read_csv("./abtest/activity_pretest.csv")
activity


Unnamed: 0,userid,dt,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,0
...,...,...,...
1859995,200d65e6-b1ce-4a47-8c2b-946db5c5a3a0,2021-10-31,20
1859996,535dafe4-de7c-4b56-acf6-aa94f21653bc,2021-10-31,20
1859997,0428ca3c-e666-4ef4-8588-3a2af904a123,2021-10-31,20
1859998,a8cd1579-44d4-48b3-b3d6-47ae5197dbc6,2021-10-31,20


### Daily active users (DAU)

![ab_test](./img/user_activity_ab_testing.JPG)


The daily active users (DAU) refers to the amount of users that are active per day (activity level of zero means no activity). You must perform the calculation of this metric and provide your insights about it.

__Dataset:__ `activity_pretest.csv`

In [2]:
total_users = activity["userid"].nunique()
total_users

60000

In [16]:
# your-code
active_users = activity.loc[(activity['activity_level'] > 0) & (activity["dt"] == "2021-10-03")].count()

active_users

userid            30785
dt                30785
activity_level    30785
dtype: int64

In [17]:
active_users_per_date = {}


for date in activity['dt'].unique():
  
    active_users_count = activity.loc[(activity['activity_level'] > 0) & (activity['dt'] == date)].count()
    active_users_per_date[date] = active_users_count

In [14]:
DAU_per_day= (active_users/total_users)*100
DAU

userid            51.056667
dt                51.056667
activity_level    51.056667
dtype: float64

In [18]:
DAU_average = ((active_users_count/total_users)*100).mean()
DAU_average
#The average activity ratio is 50.8%

50.865

### Click-through rate (CTR)

![ab_test](./img/ad_click_through_rate_ab_testing.JPG)

Click-through rate (CTR) refers to the percentage of clicks that the user perform from the total amount ads showed to that user during a certain day. You must perform the analysis of this metric (e.g.: average CTR per day) and provide your insights about it.

__Dataset:__ `ctr_pretest.csv`

In [19]:
# your-code

ctr = pd.read_csv("./abtest/ctr_pretest.csv")
ctr



Unnamed: 0,userid,dt,ctr
0,4b328144-df4b-47b1-a804-09834942dce0,2021-10-01,34.28
1,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,34.67
2,8028cccf-19c3-4c0e-b5b2-e707e15d2d83,2021-10-01,34.77
3,652b3c9c-5e29-4bf0-9373-924687b1567e,2021-10-01,35.42
4,45b57434-4666-4b57-9798-35489dc1092a,2021-10-01,35.04
...,...,...,...
950870,a09a3687-b71a-4a67-b1ef-9b05c9770c4c,2021-10-31,32.33
950871,c843a595-b94c-42e1-b2fe-ec096070681e,2021-10-31,30.09
950872,edcdf0c1-3d8f-47e8-b7dd-05505749eb69,2021-10-31,35.71
950873,76b7a9ae-98fa-4c77-869d-594a4ef7282d,2021-10-31,34.76


In [20]:
total_users_ctr = ctr["userid"].nunique()
total_users_ctr

60000

In [28]:
ctr_per_date = {}


for x in ctr['dt'].unique():
    ctr_average_per_date = ctr["ctr"].mean()
    ctr_per_date[x] = ctr_average_per_date

In [29]:
ctr_average_per_date

33.00024155646116

---

## Pretest metrics 

In this section you will perform the analysis of the metrics using the dataset that includes the result for the test and control groups, but only for the pretest data (i.e.: prior to November 1st, 2021). You must provide insights about the metrics (__Activity level__, __DAU__ and __CTR__) and also perform an hyphotesis test in order to determine whether there is any statistical significant difference between the groups prior to the start of the experiment. You must try different approaches (i.e.: __z-test__ and __t-test__) and compare the results.


__Datasets:__ `activity_all.csv`, `ctr_all.csv`

---

## Experiment metrics 

In this section you must perform the same analysis as in the previous section, but using the data generated during the experiment (i.e.: after November 1st, 2021). You must provide insights about the metrics (__Activity level__, __DAU__ and __CTR__) and also perform an hyphotesis test in order to determine whether there is any statistical significant difference between the groups during the experiment. You must try different approaches (i.e.: __z-test__ and __t-test__) and compare the results.


__Datasets:__ `activity_all.csv`, `ctr_all.csv`

In [87]:
from statsmodels.stats.weightstats import ztest as ztest

In [30]:
# your-code

activity_all = pd.read_csv("./abtest/activity_all.csv")
activity_all


Unnamed: 0,userid,dt,groupid,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,1,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,1,0
...,...,...,...,...
3659995,f0126b50-ad74-4480-9250-41b50a408932,2021-11-30,0,20
3659996,6ffe1efe-2e5d-427f-95ff-cc862c46c798,2021-11-30,1,20
3659997,f2073207-25dd-4127-a893-b70106d5ead7,2021-11-30,0,20
3659998,0416f2be-3ab8-481b-873c-3678b4705ecf,2021-11-30,1,20


In [31]:
ctr_all = pd.read_csv("./abtest/ctr_all.csv")
ctr_all

Unnamed: 0,userid,dt,groupid,ctr
0,60389fa7-2d71-4cdf-831c-c2bb277ffa1e,2021-11-13,0,31.81
1,b59cb225-d160-4851-92d2-7cc8120a2f63,2021-11-13,0,30.46
2,aa336050-934e-453f-a5b0-dd881fcd114e,2021-11-13,0,34.25
3,8df767f4-a10f-4322-a722-676b7e02b372,2021-11-13,0,34.92
4,a74762ed-4da0-42ab-91d2-40d7e808dfe9,2021-11-13,0,34.95
...,...,...,...,...
2303403,932e0348-ea2d-4b98-8782-aa84420f0796,2021-11-12,1,37.27
2303404,6775a825-6d3d-4dc3-9335-cad061736752,2021-11-12,1,39.14
2303405,a7b55365-21f1-4123-b2b5-485a8c7b98da,2021-11-12,1,40.05
2303406,a6fa937c-6f40-4f04-b15b-f1de09e179db,2021-11-12,1,38.14


In [34]:
total_users_all = activity_all["userid"].nunique()
total_users_all

60000

In [57]:
filtered_df_pre = activity_all.loc[(activity_all['dt'] >= '2021-10-01')
                     & (activity_all['dt'] < '2021-11-21')]
filtered_df_post = activity_all.loc[(activity_all['dt'] >= '2021-11-21')
                     & (activity_all['dt'] <= '2021-11-30')]

In [58]:
filtered_df_pre

Unnamed: 0,userid,dt,groupid,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,1,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,1,0
...,...,...,...,...
3648717,bb8e43c6-10fb-4bc2-b405-d12fa552085b,2021-11-20,0,20
3648718,34c311d8-9627-4894-8189-98625f924435,2021-11-20,1,20
3648719,520e467b-a702-4d19-b455-b27a76ce0c07,2021-11-20,1,20
3648720,c631da62-d2ad-4327-ba85-4f7994fe9e27,2021-11-20,0,20


In [76]:
active_users_per_date_pre_1 = {}


for date in filtered_df_pre['dt'].unique():
  
    active_users_count_1 = filtered_df_pre.loc[(filtered_df_pre['activity_level'] > 0) & (filtered_df_pre['groupid'] == 1)& (filtered_df_pre['dt'] == date)].count()
    active_users_per_date_pre_1[date] = active_users_count_1

In [77]:
active_users_per_date_pre_0 = {}


for date in filtered_df_pre['dt'].unique():
  
    active_users_count_0 = filtered_df_pre.loc[(filtered_df_pre['activity_level'] > 0) & (filtered_df_pre['groupid'] == 0)& (filtered_df_pre['dt'] == date)].count()
    active_users_per_date_pre_0[date] = active_users_count_0

In [78]:
DAU_pre_1 = ((active_users_count_1/total_users_all)*100).mean()
DAU_pre_0 = ((active_users_count_0/total_users_all)*100).mean()
DAU_pre_1


48.88166666666667

In [79]:
DAU_pre_0

26.816666666666666

In [81]:
active_users_per_date_post_1 = {}


for date in filtered_df_post['dt'].unique():
  
    active_users_count_post_1 = filtered_df_post.loc[(filtered_df_post['activity_level'] > 0) & (filtered_df_post['groupid'] == 1)& (filtered_df_post['dt'] == date)].count()
    active_users_per_date_post_1[date] = active_users_count_post_1

In [83]:
active_users_per_date_post_0 = {}


for date in filtered_df_post['dt'].unique():
  
    active_users_count_post_0 = filtered_df_post.loc[(filtered_df_post['activity_level'] > 0) & (filtered_df_post['groupid'] == 0)& (filtered_df_post['dt'] == date)].count()
    active_users_per_date_post_0[date] = active_users_count_post_0

In [85]:
DAU_post_1 = ((active_users_count_post_1/total_users_all)*100).mean()
DAU_post_0 = ((active_users_count_post_0/total_users_all)*100).mean()
DAU_post_1

48.97

In [86]:
DAU_post_0

25.319999999999997

In [88]:
prevalues = [DAU_pre_0, DAU_pre_1]
postvalues = [DAU_post_0, DAU_post_1]

In [89]:
ztest(prevalues, postvalues, value=0) 

(0.043541246318900925, 0.9652700858780925)

In [101]:
filtered_dfctr_pre1 = ctr_all.loc[(ctr_all['dt'] >= '2021-10-01')
                                 & (ctr_all['dt'] < '2021-11-21')
                                 & (ctr_all['groupid'] == 1)]
filtered_dfctr_post1 = ctr_all.loc[(ctr_all['dt'] >= '2021-11-21')
                     & (ctr_all['dt'] <= '2021-11-30') & (ctr_all['groupid'] == 1)]

In [102]:
filtered_dfctr_pre0 = ctr_all.loc[(ctr_all['dt'] >= '2021-10-01')
                                 & (ctr_all['dt'] < '2021-11-21')
                                 & (ctr_all['groupid'] == 0)]
filtered_dfctr_post0 = ctr_all.loc[(ctr_all['dt'] >= '2021-11-21')
                     & (ctr_all['dt'] <= '2021-11-30') & (ctr_all['groupid'] == 0)]

In [106]:
ctr_per_date_pre0 = {}


for x in filtered_dfctr_pre0['dt'].unique():
    ctr_average_per_date_pre0 = filtered_dfctr_pre0["ctr"].mean()
    ctr_per_date_pre0[x] = ctr_average_per_date_pre0

In [104]:
ctr_per_date_pre1 = {}


for x in filtered_dfctr_pre1['dt'].unique():
    ctr_average_per_date_pre1 = filtered_dfctr_pre1["ctr"].mean()
    ctr_per_date_pre1[x] = ctr_average_per_date_pre1

In [107]:
ctr_per_date_post0 = {}


for x in filtered_dfctr_post0['dt'].unique():
    ctr_average_per_date_post0 = filtered_dfctr_post0["ctr"].mean()
    ctr_per_date_post0[x] = ctr_average_per_date_post0

In [105]:
ctr_per_date_post1 = {}


for x in filtered_dfctr_post1['dt'].unique():
    ctr_average_per_date_post1 = filtered_dfctr_post1["ctr"].mean()
    ctr_per_date_post1[x] = ctr_average_per_date_post1

In [108]:
ctr_average_per_date_pre0

32.99960369409587

In [109]:
ctr_average_per_date_pre1

35.75883772832273

In [110]:
ctr_average_per_date_post0

32.9955344038649

In [111]:
ctr_average_per_date_post1

37.99125890566141

In [112]:
prevaluesctr = [ctr_average_per_date_pre0, ctr_average_per_date_pre1]
postvaluesctr = [ctr_average_per_date_post0, ctr_average_per_date_post1]

In [113]:
ztest(prevaluesctr, postvaluesctr, value=0) 

(-0.39045470013855804, 0.6962003462918562)

---

## Conclusions

Please provide your conclusions after the analyses and your recommendation whether we may or may not implement the changes in the digital product.

In [7]:
# your-conclusions

#H0: Changes after 21st November result in worst results for group 1
#H1: Changes after 21st November result in better results for group 1 
#alpha = 0.5
#The hypothesis can't be rejected, the changes shouldn't be implemented


---