# A/B Test Challenge



---

#### What is an A/B Test? 

It is a decision making support & research methodology that allow you to measure an impact of a change in a product (e.g.: a digital product). For this challenge you will analyse the data resulting of an A/B test performed on a digital product where a new set of sponsored ads are included.


#### Measure of success

Metrics are needed to measure the success of your product. They are typically split in the following categories: 

- __Enganged based metrics:__ number of users, number of downloads, number of active users, user retention, etc.

- __Revenue and monetization metrics:__ ads and affiliate links, subscription-based, in-app purchases, etc.

- __Technical metrics:__ service level indicators (uptime of the app, downtime of the app, latency).



---

## Metrics understanding

In this part you must analyse the metrics involved in the test. We will focus in the following metrics:

- Activity level + Daily active users (DAU).

- Click-through rate (CTR)

### Activity level

In the following part you must perform every calculation you consider necessary in order to answer the following questions:

- How many activity levels you can find in the dataset (Activity level of zero means no activity).

- What is the amount of users for each activity level.

- How many activity levels do you have per day and how many records per each activity level.

At the end of this section you must provide your conclusions about the _activity level_ of the users.

__Dataset:__ `activity_pretest.csv`

In [1]:
import pandas as pd
from zipfile import ZipFile
import numpy as np
from statsmodels.stats.weightstats import ztest
from scipy import stats
import seaborn as sns
import matplotlib.pylab as plt

In [2]:
# your-code

activity_pretest = pd.read_csv('./data/activity_pretest.csv')
activity_pretest.head()

# Guille: bad4cc056

Unnamed: 0,userid,dt,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,0


In [3]:
activity_pretest.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1860000 entries, 0 to 1859999
Data columns (total 3 columns):
 #   Column          Dtype 
---  ------          ----- 
 0   userid          object
 1   dt              object
 2   activity_level  int64 
dtypes: int64(1), object(2)
memory usage: 42.6+ MB


In [4]:
# Eliminar activity_level == 0
activity_pretest = activity_pretest[activity_pretest['activity_level'] != 0].reset_index(drop=True)
activity_pretest

Unnamed: 0,userid,dt,activity_level
0,428070b0-083e-4c0e-8444-47bf91e99fff,2021-10-01,1
1,93370f9c-56ef-437f-99ff-cb7c092d08a7,2021-10-01,1
2,0fb7120a-53cf-4a51-8b52-bf07b8659bd6,2021-10-01,1
3,ce64a9d8-07d9-4dca-908d-5e1e4568003d,2021-10-01,1
4,e08332f0-3a5c-4ed2-b957-87e464e89b97,2021-10-01,1
...,...,...,...
950870,200d65e6-b1ce-4a47-8c2b-946db5c5a3a0,2021-10-31,20
950871,535dafe4-de7c-4b56-acf6-aa94f21653bc,2021-10-31,20
950872,0428ca3c-e666-4ef4-8588-3a2af904a123,2021-10-31,20
950873,a8cd1579-44d4-48b3-b3d6-47ae5197dbc6,2021-10-31,20


In [5]:
# How many activity levels you can find in the dataset? (Activity level of zero means no activity).

print(activity_pretest['activity_level'].unique()) 
print(activity_pretest['activity_level'].nunique()) # There are 20 activity levels

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20]
20


In [6]:
# What is the amount of users for each activity level?

users_per_activity = activity_pretest.groupby('activity_level')['userid'].nunique()
users_per_activity # e.g. For Act level 1 there are 33688 users.

activity_level
1     33688
2     33761
3     33634
4     33502
5     33820
6     33789
7     33337
8     33365
9     33636
10    33784
11    33730
12    33649
13    33586
14    33703
15    33409
16    33831
17    33333
18    33668
19    33756
20    20215
Name: userid, dtype: int64

In [7]:
# How many activity levels do you have per day and how many records per each activity level.
daily_act_levels = activity_pretest.groupby(['dt','activity_level'])['userid'].nunique()
daily_act_levels.to_frame()

Unnamed: 0_level_0,Unnamed: 1_level_0,userid
dt,activity_level,Unnamed: 2_level_1
2021-10-01,1,1602
2021-10-01,2,1507
2021-10-01,3,1587
2021-10-01,4,1551
2021-10-01,5,1586
...,...,...
2021-10-31,16,1499
2021-10-31,17,1534
2021-10-31,18,1531
2021-10-31,19,1616


### Daily active users (DAU)

![ab_test](./img/user_activity_ab_testing.JPG)


The daily active users (DAU) refers to the amount of users that are active per day (activity level of zero means no activity). You must perform the calculation of this metric and provide your insights about it.

__Dataset:__ `activity_pretest.csv`

In [8]:
activity_pretest.head()

Unnamed: 0,userid,dt,activity_level
0,428070b0-083e-4c0e-8444-47bf91e99fff,2021-10-01,1
1,93370f9c-56ef-437f-99ff-cb7c092d08a7,2021-10-01,1
2,0fb7120a-53cf-4a51-8b52-bf07b8659bd6,2021-10-01,1
3,ce64a9d8-07d9-4dca-908d-5e1e4568003d,2021-10-01,1
4,e08332f0-3a5c-4ed2-b957-87e464e89b97,2021-10-01,1


In [9]:
# your-code

# How many active users per day?
daily_act_users = activity_pretest.groupby('dt')['userid'].nunique().reset_index()
daily_act_users

Unnamed: 0,dt,userid
0,2021-10-01,30634
1,2021-10-02,30775
2,2021-10-03,30785
3,2021-10-04,30599
4,2021-10-05,30588
5,2021-10-06,30639
6,2021-10-07,30637
7,2021-10-08,30600
8,2021-10-09,30902
9,2021-10-10,30581


### Click-through rate (CTR)

![ab_test](./img/ad_click_through_rate_ab_testing.JPG)

Click-through rate (CTR) refers to the percentage of clicks that the user perform from the total amount ads showed to that user during a certain day. You must perform the analysis of this metric (e.g.: average CTR per day) and provide your insights about it.

__Dataset:__ `ctr_pretest.csv`

In [10]:
ctr_pretest = pd.read_csv('./data/ctr_pretest.csv')
ctr_pretest

Unnamed: 0,userid,dt,ctr
0,4b328144-df4b-47b1-a804-09834942dce0,2021-10-01,34.28
1,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,34.67
2,8028cccf-19c3-4c0e-b5b2-e707e15d2d83,2021-10-01,34.77
3,652b3c9c-5e29-4bf0-9373-924687b1567e,2021-10-01,35.42
4,45b57434-4666-4b57-9798-35489dc1092a,2021-10-01,35.04
...,...,...,...
950870,a09a3687-b71a-4a67-b1ef-9b05c9770c4c,2021-10-31,32.33
950871,c843a595-b94c-42e1-b2fe-ec096070681e,2021-10-31,30.09
950872,edcdf0c1-3d8f-47e8-b7dd-05505749eb69,2021-10-31,35.71
950873,76b7a9ae-98fa-4c77-869d-594a4ef7282d,2021-10-31,34.76


In [11]:
# your-code

# How much is the CTR of each user per day?

# CTR mean per day
user_daily_ctr = ctr_pretest.groupby('dt')['ctr'].agg('mean').reset_index()
user_daily_ctr

Unnamed: 0,dt,ctr
0,2021-10-01,32.993446
1,2021-10-02,32.991664
2,2021-10-03,32.995086
3,2021-10-04,32.992995
4,2021-10-05,33.004375
5,2021-10-06,33.018564
6,2021-10-07,32.9885
7,2021-10-08,32.998654
8,2021-10-09,33.005082
9,2021-10-10,33.007134


In [12]:
# CTR mean for the whole month

cte_mean = user_daily_ctr['ctr'].mean()
cte_mean

33.00024304382363

---

## Pretest metrics 

In this section you will perform the analysis of the metrics using the dataset that includes the result for the test and control groups, but only for the pretest data (i.e.: prior to November 1st, 2021). You must provide insights about the metrics (__Activity level__, __DAU__ and __CTR__) and also perform an hyphotesis test in order to determine whether there is any statistical significant difference between the groups prior to the start of the experiment. You must try different approaches (i.e.: __z-test__ and __t-test__) and compare the results.


__Datasets:__ `activity_all.csv`, `ctr_all.csv`

# Control Group

### 1. Activity level

In [13]:
# your-code

activity_all = pd.read_csv('./data/activity_all.csv')
activity_all['dt'] = pd.to_datetime(activity_all['dt'])
activity_all

Unnamed: 0,userid,dt,groupid,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,1,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,1,0
...,...,...,...,...
3659995,f0126b50-ad74-4480-9250-41b50a408932,2021-11-30,0,20
3659996,6ffe1efe-2e5d-427f-95ff-cc862c46c798,2021-11-30,1,20
3659997,f2073207-25dd-4127-a893-b70106d5ead7,2021-11-30,0,20
3659998,0416f2be-3ab8-481b-873c-3678b4705ecf,2021-11-30,1,20


In [14]:
activity_all.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3660000 entries, 0 to 3659999
Data columns (total 4 columns):
 #   Column          Dtype         
---  ------          -----         
 0   userid          object        
 1   dt              datetime64[ns]
 2   groupid         int64         
 3   activity_level  int64         
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 111.7+ MB


In [15]:
activity_all['groupid'].value_counts()

groupid
1    1832989
0    1827011
Name: count, dtype: int64

- GC == 0
- GE == 1

In [16]:
# Replacing the groupid values
activity_all['groupid'] = activity_all['groupid'].replace({0: 'GC', 1: 'GE'})
activity_all

Unnamed: 0,userid,dt,groupid,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,GC,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,GC,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,GE,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,GC,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,GE,0
...,...,...,...,...
3659995,f0126b50-ad74-4480-9250-41b50a408932,2021-11-30,GC,20
3659996,6ffe1efe-2e5d-427f-95ff-cc862c46c798,2021-11-30,GE,20
3659997,f2073207-25dd-4127-a893-b70106d5ead7,2021-11-30,GC,20
3659998,0416f2be-3ab8-481b-873c-3678b4705ecf,2021-11-30,GE,20


### 1.1. Activity level before the experiment

In [17]:
gc_activity_before = activity_all[activity_all["groupid"]== 'GC']
gc_activity_before

Unnamed: 0,userid,dt,groupid,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,GC,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,GC,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,GC,0
6,82b1f3a8-57cc-4d2e-96c4-3664150f53e5,2021-10-01,GC,0
7,9dcc4eed-c222-4323-b2f6-d91edaba5d0e,2021-10-01,GC,0
...,...,...,...,...
3659988,b2a18b8c-00c7-4023-aa7e-e2b12d5bb5d3,2021-11-30,GC,20
3659989,c9737b7f-eb1b-4733-9eaa-7d538d86fb3d,2021-11-30,GC,20
3659995,f0126b50-ad74-4480-9250-41b50a408932,2021-11-30,GC,20
3659997,f2073207-25dd-4127-a893-b70106d5ead7,2021-11-30,GC,20


In [18]:
gc_activity_before = gc_activity_before[gc_activity_before['dt'] < '2021-11-01'].reset_index(drop=True)
gc_activity_before

Unnamed: 0,userid,dt,groupid,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,GC,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,GC,0
2,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,GC,0
3,82b1f3a8-57cc-4d2e-96c4-3664150f53e5,2021-10-01,GC,0
4,9dcc4eed-c222-4323-b2f6-d91edaba5d0e,2021-10-01,GC,0
...,...,...,...,...
928476,2ffce3bd-f7c6-4752-9141-ad887eea6938,2021-10-31,GC,20
928477,1a0dc2cf-c05a-40ad-86b8-d24809295ee2,2021-10-31,GC,20
928478,59f581ac-ff18-40f7-8253-cd8e7612bded,2021-10-31,GC,20
928479,200d65e6-b1ce-4a47-8c2b-946db5c5a3a0,2021-10-31,GC,20


### 2. DAU

### 2.1. Daily active users before the experiment

In [19]:
gc_dau_before = activity_all[activity_all["groupid"] =='GC']
gc_dau_before = gc_dau_before[gc_dau_before["activity_level"] != 0]
gc_dau_before = gc_dau_before[gc_dau_before["dt"] < "2021-11-01"]
gc_dau_before = gc_dau_before.groupby('dt')['userid'].nunique().reset_index()
gc_dau_before

Unnamed: 0,dt,userid
0,2021-10-01,15337
1,2021-10-02,15354
2,2021-10-03,15423
3,2021-10-04,15211
4,2021-10-05,15126
5,2021-10-06,15335
6,2021-10-07,15346
7,2021-10-08,15357
8,2021-10-09,15371
9,2021-10-10,15277


### 3. CTR

In [20]:
# your-code
ctr_all = pd.read_csv('./data/ctr_all.csv')
ctr_all['dt'] = pd.to_datetime(ctr_all['dt'])
ctr_all

Unnamed: 0,userid,dt,groupid,ctr
0,60389fa7-2d71-4cdf-831c-c2bb277ffa1e,2021-11-13,0,31.81
1,b59cb225-d160-4851-92d2-7cc8120a2f63,2021-11-13,0,30.46
2,aa336050-934e-453f-a5b0-dd881fcd114e,2021-11-13,0,34.25
3,8df767f4-a10f-4322-a722-676b7e02b372,2021-11-13,0,34.92
4,a74762ed-4da0-42ab-91d2-40d7e808dfe9,2021-11-13,0,34.95
...,...,...,...,...
2303403,932e0348-ea2d-4b98-8782-aa84420f0796,2021-11-12,1,37.27
2303404,6775a825-6d3d-4dc3-9335-cad061736752,2021-11-12,1,39.14
2303405,a7b55365-21f1-4123-b2b5-485a8c7b98da,2021-11-12,1,40.05
2303406,a6fa937c-6f40-4f04-b15b-f1de09e179db,2021-11-12,1,38.14


In [21]:
ctr_all['groupid'].value_counts()

groupid
1    1355001
0     948407
Name: count, dtype: int64

In [22]:
# Replacing the groupid values
ctr_all['groupid'] = ctr_all['groupid'].replace({0: 'GC', 1: 'GE'})
ctr_all

Unnamed: 0,userid,dt,groupid,ctr
0,60389fa7-2d71-4cdf-831c-c2bb277ffa1e,2021-11-13,GC,31.81
1,b59cb225-d160-4851-92d2-7cc8120a2f63,2021-11-13,GC,30.46
2,aa336050-934e-453f-a5b0-dd881fcd114e,2021-11-13,GC,34.25
3,8df767f4-a10f-4322-a722-676b7e02b372,2021-11-13,GC,34.92
4,a74762ed-4da0-42ab-91d2-40d7e808dfe9,2021-11-13,GC,34.95
...,...,...,...,...
2303403,932e0348-ea2d-4b98-8782-aa84420f0796,2021-11-12,GE,37.27
2303404,6775a825-6d3d-4dc3-9335-cad061736752,2021-11-12,GE,39.14
2303405,a7b55365-21f1-4123-b2b5-485a8c7b98da,2021-11-12,GE,40.05
2303406,a6fa937c-6f40-4f04-b15b-f1de09e179db,2021-11-12,GE,38.14


### 3.1. CTR before the experiment

In [23]:
gc_ctr_before = ctr_all[ctr_all["groupid"]== 'GC']
gc_ctr_before = gc_ctr_before[gc_ctr_before['dt'] < '2021-11-01'].reset_index(drop=True)
gc_ctr_before

Unnamed: 0,userid,dt,groupid,ctr
0,4b328144-df4b-47b1-a804-09834942dce0,2021-10-01,GC,34.28
1,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,GC,34.67
2,8028cccf-19c3-4c0e-b5b2-e707e15d2d83,2021-10-01,GC,34.77
3,652b3c9c-5e29-4bf0-9373-924687b1567e,2021-10-01,GC,35.42
4,45b57434-4666-4b57-9798-35489dc1092a,2021-10-01,GC,35.04
...,...,...,...,...
474942,b2f30e48-d012-4687-a93f-8000ab04e565,2021-10-31,GC,31.54
474943,1184bf5f-7036-4a6e-afc1-b0c8f1dba2de,2021-10-31,GC,31.92
474944,8b20e638-d933-489f-8139-7d7ca93aa8e0,2021-10-31,GC,32.05
474945,2071887d-c673-4d39-81dc-5c384bcb8458,2021-10-31,GC,30.25


# Experiment Group

### 1.1. Activity level before the experiment

In [24]:
ge_activity_before = activity_all[activity_all["groupid"]== 'GE']
ge_activity_before = ge_activity_before[ge_activity_before['dt'] < '2021-11-01'].reset_index(drop=True)
ge_activity_before

Unnamed: 0,userid,dt,groupid,activity_level
0,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,GE,0
1,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,GE,0
2,9b2f41cf-350d-4073-b9d4-3848d0c0b1b5,2021-10-01,GE,0
3,c55c0d67-6b95-4d19-bf7d-4c33911da83f,2021-10-01,GE,0
4,de9807bb-a7ff-4334-812e-34bb15a8f573,2021-10-01,GE,0
...,...,...,...,...
931514,93179304-6690-4932-bb68-6db1a18c747a,2021-10-31,GE,20
931515,a2551ab2-abd6-46a1-9f05-e9d2318ddf35,2021-10-31,GE,20
931516,535dafe4-de7c-4b56-acf6-aa94f21653bc,2021-10-31,GE,20
931517,0428ca3c-e666-4ef4-8588-3a2af904a123,2021-10-31,GE,20


### 2.1. Daily active users before the experiment

In [25]:
ge_dau_before = activity_all[activity_all["groupid"] =='GE']
ge_dau_before = ge_dau_before[ge_dau_before["activity_level"] != 0]
ge_dau_before = ge_dau_before[ge_dau_before["dt"] < "2021-11-01"]
ge_dau_before = ge_dau_before.groupby('dt')['userid'].nunique().reset_index()
ge_dau_before

Unnamed: 0,dt,userid
0,2021-10-01,15297
1,2021-10-02,15421
2,2021-10-03,15362
3,2021-10-04,15388
4,2021-10-05,15462
5,2021-10-06,15304
6,2021-10-07,15291
7,2021-10-08,15243
8,2021-10-09,15531
9,2021-10-10,15304


### 3.1. CTR before the experiment

In [26]:
ge_ctr_before = ctr_all[ctr_all["groupid"]== 'GE']
ge_ctr_before = ge_ctr_before[ge_ctr_before['dt'] < '2021-11-01'].reset_index(drop=True)
ge_ctr_before

Unnamed: 0,userid,dt,groupid,ctr
0,381e40b0-5529-4bc6-a3f6-6a687c7cde66,2021-10-01,GE,31.27
1,1797453f-f558-42f6-9a2f-55b95dd37e71,2021-10-01,GE,32.18
2,f8efefba-4782-4104-8fbf-7f4381dfb6d6,2021-10-01,GE,31.20
3,8a18c870-b2e2-4a47-9b30-0859f5854dcc,2021-10-01,GE,31.19
4,d472fbc3-d580-49f7-9ba4-ef002cc80606,2021-10-01,GE,35.62
...,...,...,...,...
475923,a09a3687-b71a-4a67-b1ef-9b05c9770c4c,2021-10-31,GE,32.33
475924,c843a595-b94c-42e1-b2fe-ec096070681e,2021-10-31,GE,30.09
475925,edcdf0c1-3d8f-47e8-b7dd-05505749eb69,2021-10-31,GE,35.71
475926,76b7a9ae-98fa-4c77-869d-594a4ef7282d,2021-10-31,GE,34.76


---

## Experiment metrics 

In this section you must perform the same analysis as in the previous section, but using the data generated during the experiment (i.e.: after November 1st, 2021). You must provide insights about the metrics (__Activity level__, __DAU__ and __CTR__) and also perform an hyphotesis test in order to determine whether there is any statistical significant difference between the groups during the experiment. You must try different approaches (i.e.: __z-test__ and __t-test__) and compare the results.


__Datasets:__ `activity_all.csv`, `ctr_all.csv`

# Control group

### 1.2. Activity level during the experiment

In [27]:
# your-code
gc_activity_during = activity_all[activity_all["groupid"]== 'GC']
gc_activity_during = gc_activity_during[gc_activity_during['dt'] >= '2021-11-01'].reset_index(drop=True)
gc_activity_during

Unnamed: 0,userid,dt,groupid,activity_level
0,d2646662-269f-49de-aab1-8776afced9a3,2021-11-01,GC,0
1,d6d51bbc-4005-4b61-86ae-a3e239235341,2021-11-01,GC,0
2,65cc7f97-ca08-4ac8-9076-8abbbea5d95d,2021-11-01,GC,0
3,1d8901b1-5fea-4376-8370-843363886e18,2021-11-01,GC,0
4,2614b27d-1449-4d38-8f8d-524daf95361a,2021-11-01,GC,0
...,...,...,...,...
898525,b2a18b8c-00c7-4023-aa7e-e2b12d5bb5d3,2021-11-30,GC,20
898526,c9737b7f-eb1b-4733-9eaa-7d538d86fb3d,2021-11-30,GC,20
898527,f0126b50-ad74-4480-9250-41b50a408932,2021-11-30,GC,20
898528,f2073207-25dd-4127-a893-b70106d5ead7,2021-11-30,GC,20


### 2.2. Daily active users during the experiment

In [28]:
gc_dau_during = activity_all[activity_all["groupid"] =='GC']
gc_dau_during = gc_dau_during[gc_dau_during["activity_level"] != 0]
gc_dau_during = gc_dau_during[gc_dau_during["dt"] >= "2021-11-01"]
gc_dau_during = gc_dau_during.groupby('dt')['userid'].nunique().reset_index()
gc_dau_during

Unnamed: 0,dt,userid
0,2021-11-01,15989
1,2021-11-02,16024
2,2021-11-03,16049
3,2021-11-04,16040
4,2021-11-05,16045
5,2021-11-06,15991
6,2021-11-07,16133
7,2021-11-08,16119
8,2021-11-09,15953
9,2021-11-10,15990


### 3.2. CTR during the experiment

In [29]:
gc_ctr_during = ctr_all[ctr_all["groupid"]== 'GC']
gc_ctr_during = gc_ctr_during[gc_ctr_during['dt'] >= '2021-11-01'].reset_index(drop=True)
gc_ctr_during

Unnamed: 0,userid,dt,groupid,ctr
0,60389fa7-2d71-4cdf-831c-c2bb277ffa1e,2021-11-13,GC,31.81
1,b59cb225-d160-4851-92d2-7cc8120a2f63,2021-11-13,GC,30.46
2,aa336050-934e-453f-a5b0-dd881fcd114e,2021-11-13,GC,34.25
3,8df767f4-a10f-4322-a722-676b7e02b372,2021-11-13,GC,34.92
4,a74762ed-4da0-42ab-91d2-40d7e808dfe9,2021-11-13,GC,34.95
...,...,...,...,...
473455,26c10c02-8ede-4beb-be14-32f8cca044ff,2021-11-12,GC,33.28
473456,ae235c4b-96a7-4f34-923e-08531a5f340a,2021-11-12,GC,34.15
473457,81daf7da-ba09-451f-b100-f15ed284977e,2021-11-12,GC,35.79
473458,38338581-e093-4202-8c9f-975004e221e3,2021-11-12,GC,31.82


# Experiment Group

### 1.2. Activity level during the experiment

In [30]:
ge_activity_during = activity_all[activity_all["groupid"]== 'GE']
ge_activity_during = ge_activity_during[ge_activity_during['dt'] >= '2021-11-01'].reset_index(drop=True)
ge_activity_during

Unnamed: 0,userid,dt,groupid,activity_level
0,39e33daf-6964-46a1-8b99-036ba08de05f,2021-11-01,GE,0
1,e1cf870f-b7c8-46e7-83cd-31a86a31375c,2021-11-01,GE,0
2,b1306532-9772-4e87-a4f5-ee5fef48783c,2021-11-01,GE,0
3,038d0ef3-3f78-465a-9c2f-ff3fe11b932a,2021-11-01,GE,0
4,f6e31dd1-7842-4dad-9270-c3b4f4fc8a59,2021-11-01,GE,0
...,...,...,...,...
901465,05f00021-052d-493c-94a7-554702d7f3a1,2021-11-30,GE,20
901466,219e12b3-49dc-4fc1-b947-c0683a8a400f,2021-11-30,GE,20
901467,cbc2d82c-7940-42fa-9dc5-7790d11b06b5,2021-11-30,GE,20
901468,6ffe1efe-2e5d-427f-95ff-cc862c46c798,2021-11-30,GE,20


### 2.2. Daily active users during the experiment

In [31]:
ge_dau_during = activity_all[activity_all["groupid"] =='GE']
ge_dau_during = ge_dau_during[ge_dau_during["activity_level"] != 0]
ge_dau_during = ge_dau_during[ge_dau_during["dt"] >= "2021-11-01"]
ge_dau_during = ge_dau_during.groupby('dt')['userid'].nunique().reset_index()
ge_dau_during

Unnamed: 0,dt,userid
0,2021-11-01,29318
1,2021-11-02,29289
2,2021-11-03,29306
3,2021-11-04,29267
4,2021-11-05,29336
5,2021-11-06,29306
6,2021-11-07,29255
7,2021-11-08,29263
8,2021-11-09,29286
9,2021-11-10,29340


### 3.2. CTR during the experiment

In [32]:
ge_ctr_during = ctr_all[ctr_all["groupid"]== 'GE']
ge_ctr_during = ge_ctr_during[ge_ctr_during['dt'] >= '2021-11-01'].reset_index(drop=True)
ge_ctr_during

Unnamed: 0,userid,dt,groupid,ctr
0,cd5df711-42f7-4684-9ae8-f6a72383bb28,2021-11-13,GE,40.39
1,fe630199-265b-4542-a103-a74d66abeb22,2021-11-13,GE,37.70
2,4b519a79-b1a4-40b0-9369-be9e2a2699af,2021-11-13,GE,35.47
3,30a8c7b1-ed8a-4cf2-888e-b8e110ba88d9,2021-11-13,GE,40.07
4,88ab26e4-2e67-4397-a5ec-8c2a384372f5,2021-11-13,GE,40.76
...,...,...,...,...
879068,932e0348-ea2d-4b98-8782-aa84420f0796,2021-11-12,GE,37.27
879069,6775a825-6d3d-4dc3-9335-cad061736752,2021-11-12,GE,39.14
879070,a7b55365-21f1-4123-b2b5-485a8c7b98da,2021-11-12,GE,40.05
879071,a6fa937c-6f40-4f04-b15b-f1de09e179db,2021-11-12,GE,38.14


# Z-Test for CTRs

In [38]:
# Z-Test GC before vs GE before

'''
H0: GC before == GE before
H1: GC before != GE before
alpha == 0.05
'''
hypothesis_mean = gc_ctr_before['ctr'].mean()
sample_mean = ge_ctr_before['ctr'].mean()
alpha = 0.05

Z_score, p_value = ztest(x1=gc_ctr_before['ctr'][:], x2=ge_ctr_before['ctr'][:])

print(f'hypothesis mean: {hypothesis_mean}', f'\nsample mean: {sample_mean}',
      f'\nalpha: {alpha}',f'\nZ_score: {Z_score}', f'\np-value: {p_value}')
print(p_value < alpha)

hypothesis mean: 33.00091277553074 
sample mean: 32.99957172093258 
alpha: 0.05 
Z_score: 0.3775817380268587 
p-value: 0.7057413330705573
False


In [39]:
# Z-Test GC before vs GC during

'''
H0: GC before == GC during
H1: GC before != GC during
alpha == 0.05
'''
hypothesis_mean = gc_ctr_before['ctr'].mean()
sample_mean = ge_ctr_during['ctr'].mean()
alpha = 0.05

Z_score, p_value = ztest(x1=gc_ctr_before['ctr'][:], x2=gc_ctr_during['ctr'][:])

print(f'hypothesis mean: {hypothesis_mean}', f'\nsample mean: {sample_mean}',
      f'\nalpha: {alpha}',f'\nZ_score: {Z_score}', f'\np-value: {p_value}')
print(p_value < alpha)

hypothesis mean: 33.00091277553074 
sample mean: 37.99695912626142 
alpha: 0.05 
Z_score: 1.1054087229140204 
p-value: 0.2689825257460209
False


In [40]:
# Z-Test GE before vs GE during

'''
H0: GE before == GE during
H1: GE before != GE during
alpha == 0.05
'''
hypothesis_mean = ge_ctr_before['ctr'].mean()
sample_mean = ge_ctr_during['ctr'].mean()
alpha = 0.05

Z_score, p_value = ztest(x1=ge_ctr_before['ctr'][:], x2=ge_ctr_during['ctr'][:])

print(f'hypothesis mean: {hypothesis_mean}', f'\nsample mean: {sample_mean}',
      f'\nalpha: {alpha}',f'\nZ_score: {Z_score}', f'\np-value: {p_value}')
print(p_value < alpha)

hypothesis mean: 32.99957172093258 
sample mean: 37.99695912626142 
alpha: 0.05 
Z_score: -1603.8146799084154 
p-value: 0.0
True


In [41]:
# Z-Test GC during vs GE during

'''
H0: GC during == GE during
H1: GC during != GE during
alpha == 0.05
'''
hypothesis_mean = gc_ctr_during['ctr'].mean()
sample_mean = ge_ctr_during['ctr'].mean()
alpha = 0.05

Z_score, p_value = ztest(x1=gc_ctr_during['ctr'].sample(30), x2=ge_ctr_during['ctr'].sample(30))

print(f'hypothesis mean: {hypothesis_mean}', f'\nsample mean: {sample_mean}',
      f'\nalpha: {alpha}',f'\nZ_score: {Z_score}', f'\np-value: {p_value}')
print(p_value < alpha)

hypothesis mean: 32.996977569382835 
sample mean: 37.99695912626142 
alpha: 0.05 
Z_score: -11.955147438191725 
p-value: 6.102517285193991e-33
True


# t-Test for CTRs

In [42]:
from scipy.stats import ttest_ind

In [43]:
# t-Test GC before vs GE before

'''
H0: GC before == GE before
H1: GC before != GE before
alpha == 0.05
'''

hypothesis_mean = gc_ctr_before['ctr'].mean()
sample_mean = ge_ctr_before['ctr'].mean()
alpha = 0.05

t_stat, p_value = ttest_ind(gc_ctr_before['ctr'], ge_ctr_before['ctr'])

print(f'hypothesis mean: {hypothesis_mean}', f'\nsample mean: {sample_mean}',
      f'\nalpha: {alpha}', f'\nt-statistic: {t_stat}', f'\np-value: {p_value}')
print(p_value < alpha)

hypothesis mean: 33.00091277553074 
sample mean: 32.99957172093258 
alpha: 0.05 
t-statistic: 0.3775817380268587 
p-value: 0.705741417344299
False


In [44]:
# t-Test GC before vs GC during

'''
H0: GC before == GC during
H1: GC before != GC during
alpha == 0.05
'''
hypothesis_mean = gc_ctr_before['ctr'].mean()
sample_mean = gc_ctr_during['ctr'].mean()
alpha = 0.05

t_stat, p_value = ttest_ind(gc_ctr_before['ctr'], gc_ctr_during['ctr'])

print(f'hypothesis mean: {hypothesis_mean}', f'\nsample mean: {sample_mean}',
      f'\nalpha: {alpha}', f'\nt-statistic: {t_stat}', f'\np-value: {p_value}')
print(p_value < alpha)


hypothesis mean: 33.00091277553074 
sample mean: 32.996977569382835 
alpha: 0.05 
t-statistic: 1.1054087229140201 
p-value: 0.26898280616065884
False


In [45]:
# t-Test GE before vs GE during

'''
H0: GE before == GE during
H1: GE before != GE during
alpha == 0.05
'''
hypothesis_mean = ge_ctr_before['ctr'].mean()
sample_mean = ge_ctr_during['ctr'].mean()
alpha = 0.05

t_stat, p_value = ttest_ind(ge_ctr_before['ctr'], ge_ctr_during['ctr'])

print(f'hypothesis mean: {hypothesis_mean}', f'\nsample mean: {sample_mean}',
      f'\nalpha: {alpha}', f'\nt-statistic: {t_stat}', f'\np-value: {p_value}')
print(p_value < alpha)


hypothesis mean: 32.99957172093258 
sample mean: 37.99695912626142 
alpha: 0.05 
t-statistic: -1603.8146799084154 
p-value: 0.0
True


In [46]:
# t-Test GC during vs GE during

'''
H0: GC during == GE during
H1: GC during != GE during
alpha == 0.05
'''
hypothesis_mean = gc_ctr_during['ctr'].mean()
sample_mean = ge_ctr_during['ctr'].mean()
alpha = 0.05

t_stat, p_value = ttest_ind(gc_ctr_during['ctr'], ge_ctr_during['ctr'], equal_var=False)

print(f'hypothesis mean: {hypothesis_mean}', f'\nsample mean: {sample_mean}',
      f'\nalpha: {alpha}', f'\nt-statistic: {t_stat}', f'\np-value: {p_value}')
print(p_value < alpha)


hypothesis mean: 32.996977569382835 
sample mean: 37.99695912626142 
alpha: 0.05 
t-statistic: -1600.5618238144957 
p-value: 0.0
True


---

## Conclusions

Please provide your conclusions after the analyses and your recommendation whether we may or may not implement the changes in the digital product.

There doesn't seem to be an immediate need to implement changes in the digital product based on click-through rates in the transitions **from** "GC before" to "GE before" **and** "GC before to "GC during"; due to its **p-value > alpha.**

**Recommendations:**

Implementing adjustments in the digital product when transitioning **from** "GE before" to "GE during" **and** from "GC during" to "GE during"; in order to potencially improve click-through rates during these specific transitions.

---