# A/B Test Challenge



---

#### What is an A/B Test? 

It is a decision making support & research methodology that allow you to measure an impact of a change in a product (e.g.: a digital product). For this challenge you will analyse the data resulting of an A/B test performed on a digital product where a new set of sponsored ads are included.


#### Measure of success

Metrics are need it to measure the success of your product. They are typically split in the following categories: 

- __Enganged based metrics:__ number of users, number of downloads, number of active users, user retention, etc.

- __Revenue and monetization metrics:__ ads and affiliate links, subscription-based, in-app purchases, etc.

- __Technical metrics:__ service level indicators (uptime of the app, downtime of the app, latency).



---

## Metrics understanding

In this part you must analyse the metrics involved in the test. We will focus in the following metrics:

- Activity level + Daily active users (DAU).

- Click-through rate (CTR)

### Activity level

In the following part you must perform every calculation you consider necessary in order to answer the following questions:

- How many activity levels you can find in the dataset (Activity level of zero means no activity).

- What is the amount of users for each activity level.

- How many activity levels do you have per day and how many records per each activity level.

At the end of this section you must provide your conclusions about the _activity level_ of the users.

__Dataset:__ `activity_pretest.csv`

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import trim_mean   
from statsmodels import robust     
import wquantiles                   
import plotly.express as px

import seaborn as sns
import matplotlib.pylab as plt
from statsmodels.stats.weightstats import ztest
from scipy.stats import ttest_ind

In [2]:
df = pd.read_csv('/Users/anadeondarza/Desktop/ironhack_data/challenges/ab_test_challenge/data/activity_pretest.csv')
df

Unnamed: 0,userid,dt,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,0
...,...,...,...
1859995,200d65e6-b1ce-4a47-8c2b-946db5c5a3a0,2021-10-31,20
1859996,535dafe4-de7c-4b56-acf6-aa94f21653bc,2021-10-31,20
1859997,0428ca3c-e666-4ef4-8588-3a2af904a123,2021-10-31,20
1859998,a8cd1579-44d4-48b3-b3d6-47ae5197dbc6,2021-10-31,20


In [3]:
#How many activity levels you can find in the dataset (Activity level of zero means no activity).  20!
array = pd.unique(df['activity_level'])
array

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20])

In [4]:
#What is the amount of users for each activity level?

In [5]:
df_userid = df.drop(['dt'], axis=1)

In [6]:
userid = df_userid.groupby(['activity_level']).count() 
df_user=userid.reset_index()
df_user

Unnamed: 0,activity_level,userid
0,0,909125
1,1,48732
2,2,49074
3,3,48659
4,4,48556
5,5,49227
6,6,48901
7,7,48339
8,8,48396
9,9,48820


In [7]:
df_0 = df.drop(df[df['activity_level']== 0].index)
df_0

Unnamed: 0,userid,dt,activity_level
909125,428070b0-083e-4c0e-8444-47bf91e99fff,2021-10-01,1
909126,93370f9c-56ef-437f-99ff-cb7c092d08a7,2021-10-01,1
909127,0fb7120a-53cf-4a51-8b52-bf07b8659bd6,2021-10-01,1
909128,ce64a9d8-07d9-4dca-908d-5e1e4568003d,2021-10-01,1
909129,e08332f0-3a5c-4ed2-b957-87e464e89b97,2021-10-01,1
...,...,...,...
1859995,200d65e6-b1ce-4a47-8c2b-946db5c5a3a0,2021-10-31,20
1859996,535dafe4-de7c-4b56-acf6-aa94f21653bc,2021-10-31,20
1859997,0428ca3c-e666-4ef4-8588-3a2af904a123,2021-10-31,20
1859998,a8cd1579-44d4-48b3-b3d6-47ae5197dbc6,2021-10-31,20


In [8]:
df_0.describe()

Unnamed: 0,activity_level
count,950875.0
mean,10.256362
std,5.635938
min,1.0
25%,5.0
50%,10.0
75%,15.0
max,20.0


In [9]:
#How many activity levels do you have per day and how many records per each activity level?

In [10]:
df_per_day = df_0.drop(['userid'], axis=1)
per_day = df_per_day.groupby(['dt']).count() 
df_reset=per_day.reset_index()
df_reset

Unnamed: 0,dt,activity_level
0,2021-10-01,30634
1,2021-10-02,30775
2,2021-10-03,30785
3,2021-10-04,30599
4,2021-10-05,30588
5,2021-10-06,30639
6,2021-10-07,30637
7,2021-10-08,30600
8,2021-10-09,30902
9,2021-10-10,30581


In [11]:
df_reset['activity_level'].mean()

30673.387096774193

In [12]:
#How many records per each activity level?

In [13]:
df5 = df_0.groupby(['activity_level']).count()
df_reset_1 = df5.reset_index()
df_reset_1

Unnamed: 0,activity_level,userid,dt
0,1,48732,48732
1,2,49074,49074
2,3,48659,48659
3,4,48556,48556
4,5,49227,49227
5,6,48901,48901
6,7,48339,48339
7,8,48396,48396
8,9,48820,48820
9,10,48943,48943


In [14]:
df4 = df_0.groupby(['dt','activity_level']).count()
df_reset_0 = df4.reset_index()
df_reset_0

Unnamed: 0,dt,activity_level,userid
0,2021-10-01,1,1602
1,2021-10-01,2,1507
2,2021-10-01,3,1587
3,2021-10-01,4,1551
4,2021-10-01,5,1586
...,...,...,...
615,2021-10-31,16,1499
616,2021-10-31,17,1534
617,2021-10-31,18,1531
618,2021-10-31,19,1616


In [15]:
df9 = df_reset_0.groupby(['dt']).sum()
df9 

Unnamed: 0_level_0,activity_level,userid
dt,Unnamed: 1_level_1,Unnamed: 2_level_1
2021-10-01,210,30634
2021-10-02,210,30775
2021-10-03,210,30785
2021-10-04,210,30599
2021-10-05,210,30588
2021-10-06,210,30639
2021-10-07,210,30637
2021-10-08,210,30600
2021-10-09,210,30902
2021-10-10,210,30581


### Daily active users (DAU)

![ab_test](./img/user_activity_ab_testinG.JPG)


The daily active users (DAU) refers to the amount of users that are active per day (activity level of zero means no activity). You must perform the calculation of this metric and provide your insights about it.

__Dataset:__ `activity_pretest.csv`

In [16]:
DAU_pretest = pd.read_csv('/Users/anadeondarza/Desktop/ironhack_data/challenges/ab_test_challenge/data/activity_pretest.csv')
DAU_pretest.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1860000 entries, 0 to 1859999
Data columns (total 3 columns):
 #   Column          Dtype 
---  ------          ----- 
 0   userid          object
 1   dt              object
 2   activity_level  int64 
dtypes: int64(1), object(2)
memory usage: 42.6+ MB


In [17]:
DAU_pretest

Unnamed: 0,userid,dt,activity_level
0,a5b70ae7-f07c-4773-9df4-ce112bc9dc48,2021-10-01,0
1,d2646662-269f-49de-aab1-8776afced9a3,2021-10-01,0
2,c4d1cfa8-283d-49ad-a894-90aedc39c798,2021-10-01,0
3,6889f87f-5356-4904-a35a-6ea5020011db,2021-10-01,0
4,dbee604c-474a-4c9d-b013-508e5a0e3059,2021-10-01,0
...,...,...,...
1859995,200d65e6-b1ce-4a47-8c2b-946db5c5a3a0,2021-10-31,20
1859996,535dafe4-de7c-4b56-acf6-aa94f21653bc,2021-10-31,20
1859997,0428ca3c-e666-4ef4-8588-3a2af904a123,2021-10-31,20
1859998,a8cd1579-44d4-48b3-b3d6-47ae5197dbc6,2021-10-31,20


In [18]:
DAU_pretest_0 = DAU_pretest.drop(DAU_pretest[DAU_pretest['activity_level']== 0].index)
DAU_pretest_0

Unnamed: 0,userid,dt,activity_level
909125,428070b0-083e-4c0e-8444-47bf91e99fff,2021-10-01,1
909126,93370f9c-56ef-437f-99ff-cb7c092d08a7,2021-10-01,1
909127,0fb7120a-53cf-4a51-8b52-bf07b8659bd6,2021-10-01,1
909128,ce64a9d8-07d9-4dca-908d-5e1e4568003d,2021-10-01,1
909129,e08332f0-3a5c-4ed2-b957-87e464e89b97,2021-10-01,1
...,...,...,...
1859995,200d65e6-b1ce-4a47-8c2b-946db5c5a3a0,2021-10-31,20
1859996,535dafe4-de7c-4b56-acf6-aa94f21653bc,2021-10-31,20
1859997,0428ca3c-e666-4ef4-8588-3a2af904a123,2021-10-31,20
1859998,a8cd1579-44d4-48b3-b3d6-47ae5197dbc6,2021-10-31,20


In [19]:
DAU_pretest_0['activity_level'].mean()

10.256361772052058

In [20]:
DAU_pretest_0.groupby(['dt']).agg({'userid': 'count', 'activity_level': 'count'}).reset_index()

Unnamed: 0,dt,userid,activity_level
0,2021-10-01,30634,30634
1,2021-10-02,30775,30775
2,2021-10-03,30785,30785
3,2021-10-04,30599,30599
4,2021-10-05,30588,30588
5,2021-10-06,30639,30639
6,2021-10-07,30637,30637
7,2021-10-08,30600,30600
8,2021-10-09,30902,30902
9,2021-10-10,30581,30581


In [21]:
DAU_pretest_0.describe()

Unnamed: 0,activity_level
count,950875.0
mean,10.256362
std,5.635938
min,1.0
25%,5.0
50%,10.0
75%,15.0
max,20.0


En el rango de fechas, todas las actividades tienen el mismo nivel de actividad, salvo el 20. 
En el día a día, el número de user por actividad varía, siendo la medía diraria de 30673.387096774193

### Click-through rate (CTR)

![ab_test](./img/ad_click_through_rate_ab_testing.JPG)

Click-through rate (CTR) refers to the percentage of clicks that the user perform from the total amount ads showed to that user during a certain day. You must perform the analysis of this metric (e.g.: average CTR per day) and provide your insights about it.

__Dataset:__ `ctr_pretest.csv`

In [22]:
ctr_pretest = pd.read_csv('/Users/anadeondarza/Desktop/ironhack_data/challenges/ab_test_challenge/data/ctr_pretest.csv')
ctr_pretest

Unnamed: 0,userid,dt,ctr
0,4b328144-df4b-47b1-a804-09834942dce0,2021-10-01,34.28
1,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,34.67
2,8028cccf-19c3-4c0e-b5b2-e707e15d2d83,2021-10-01,34.77
3,652b3c9c-5e29-4bf0-9373-924687b1567e,2021-10-01,35.42
4,45b57434-4666-4b57-9798-35489dc1092a,2021-10-01,35.04
...,...,...,...
950870,a09a3687-b71a-4a67-b1ef-9b05c9770c4c,2021-10-31,32.33
950871,c843a595-b94c-42e1-b2fe-ec096070681e,2021-10-31,30.09
950872,edcdf0c1-3d8f-47e8-b7dd-05505749eb69,2021-10-31,35.71
950873,76b7a9ae-98fa-4c77-869d-594a4ef7282d,2021-10-31,34.76


In [23]:
ctr_pretest.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 950875 entries, 0 to 950874
Data columns (total 3 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   userid  950875 non-null  object 
 1   dt      950875 non-null  object 
 2   ctr     950875 non-null  float64
dtypes: float64(1), object(2)
memory usage: 21.8+ MB


In [24]:
ctr = ctr_pretest.groupby(['dt']).mean()
ctr_0 = ctr.reset_index()
ctr_0

Unnamed: 0,dt,ctr
0,2021-10-01,32.993446
1,2021-10-02,32.991664
2,2021-10-03,32.995086
3,2021-10-04,32.992995
4,2021-10-05,33.004375
5,2021-10-06,33.018564
6,2021-10-07,32.9885
7,2021-10-08,32.998654
8,2021-10-09,33.005082
9,2021-10-10,33.007134


In [25]:
user = ctr_pretest.groupby(['userid']).mean()
ctr_1 = user.reset_index()
ctr_1

Unnamed: 0,userid,ctr
0,0002a1ca-0b76-41cd-91e6-9aa51947b7fc,32.826429
1,00037d4d-ebfa-4a99-9d3e-adbefd6dae3a,32.511500
2,0004c8bb-df77-43b2-a93c-7398e9bc5175,33.472500
3,00051943-ca03-49d2-aafc-138439e5459c,32.270000
4,0007262b-b62e-447a-9021-232ad25df9ed,33.185714
...,...,...
59995,fff8c764-169f-4ee0-92f3-4d858a485d5c,33.118750
59996,fffb68bd-be7f-48e4-80bb-41f7354983ca,32.355333
59997,fffd73e1-a42e-49cb-9b17-1c5aae62ab46,32.810000
59998,fffdf2f8-7f61-4fb3-b5fc-6323a72290a7,33.468824


In [26]:
ctr_pretest.describe()

Unnamed: 0,ctr
count,950875.0
mean,33.000242
std,1.731677
min,30.0
25%,31.5
50%,33.0
75%,34.5
max,36.0


---

## Pretest metrics 

In this section you will perform the analysis of the metrics using the dataset that includes the result for the test and control groups, but only for the pretest data (i.e.: prior to November 1st, 2021). You must provide insights about the metrics (__Activity level__, __DAU__ and __CTR__) and also perform an hyphotesis test in order to determine whether there is any statistical significant difference between the groups prior to the start of the experiment. You must try different approaches (i.e.: __z-test__ and __t-test__) and compare the results.


__Datasets:__ `activity_all.csv`, `ctr_all.csv`

# Acvitidad

In [27]:
DAU_pretest = pd.read_csv('/Users/anadeondarza/Desktop/ironhack_data/challenges/ab_test_challenge/data/activity_all.csv')
DAU_pretest.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3660000 entries, 0 to 3659999
Data columns (total 4 columns):
 #   Column          Dtype 
---  ------          ----- 
 0   userid          object
 1   dt              object
 2   groupid         int64 
 3   activity_level  int64 
dtypes: int64(2), object(2)
memory usage: 111.7+ MB


In [28]:
DAU_pretestt = DAU_pretest.drop(DAU_pretest[DAU_pretest['activity_level']== 0].index)
DAU_pretestt

Unnamed: 0,userid,dt,groupid,activity_level
1356592,428070b0-083e-4c0e-8444-47bf91e99fff,2021-10-01,1,1
1356593,93370f9c-56ef-437f-99ff-cb7c092d08a7,2021-10-01,1,1
1356594,0fb7120a-53cf-4a51-8b52-bf07b8659bd6,2021-10-01,1,1
1356595,ce64a9d8-07d9-4dca-908d-5e1e4568003d,2021-10-01,0,1
1356596,e08332f0-3a5c-4ed2-b957-87e464e89b97,2021-10-01,1,1
...,...,...,...,...
3659995,f0126b50-ad74-4480-9250-41b50a408932,2021-11-30,0,20
3659996,6ffe1efe-2e5d-427f-95ff-cc862c46c798,2021-11-30,1,20
3659997,f2073207-25dd-4127-a893-b70106d5ead7,2021-11-30,0,20
3659998,0416f2be-3ab8-481b-873c-3678b4705ecf,2021-11-30,1,20


In [29]:
grupo_0_act = DAU_pretestt[DAU_pretestt['groupid'] == 0]
grupo_0_act

Unnamed: 0,userid,dt,groupid,activity_level
1356595,ce64a9d8-07d9-4dca-908d-5e1e4568003d,2021-10-01,0,1
1356597,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,0,1
1356598,420a60e9-6394-4324-b02c-ab372609968e,2021-10-01,0,1
1356599,6f6b36ef-bd93-4399-a2f4-996c96d3e0a7,2021-10-01,0,1
1356600,7dfbbc2e-6e71-4128-848d-be83df79b921,2021-10-01,0,1
...,...,...,...,...
3659988,b2a18b8c-00c7-4023-aa7e-e2b12d5bb5d3,2021-11-30,0,20
3659989,c9737b7f-eb1b-4733-9eaa-7d538d86fb3d,2021-11-30,0,20
3659995,f0126b50-ad74-4480-9250-41b50a408932,2021-11-30,0,20
3659997,f2073207-25dd-4127-a893-b70106d5ead7,2021-11-30,0,20


In [30]:
grupo_1_act = DAU_pretestt[DAU_pretestt['groupid'] ==1]
grupo_1_act

Unnamed: 0,userid,dt,groupid,activity_level
1356592,428070b0-083e-4c0e-8444-47bf91e99fff,2021-10-01,1,1
1356593,93370f9c-56ef-437f-99ff-cb7c092d08a7,2021-10-01,1,1
1356594,0fb7120a-53cf-4a51-8b52-bf07b8659bd6,2021-10-01,1,1
1356596,e08332f0-3a5c-4ed2-b957-87e464e89b97,2021-10-01,1,1
1356602,75d936e5-257e-4b78-a7b3-96acb30ce6c1,2021-10-01,1,1
...,...,...,...,...
3659992,05f00021-052d-493c-94a7-554702d7f3a1,2021-11-30,1,20
3659993,219e12b3-49dc-4fc1-b947-c0683a8a400f,2021-11-30,1,20
3659994,cbc2d82c-7940-42fa-9dc5-7790d11b06b5,2021-11-30,1,20
3659996,6ffe1efe-2e5d-427f-95ff-cc862c46c798,2021-11-30,1,20


# Análisis Actividad Grupo 0: grupo_0_act

In [31]:
grupo_0_act['dt'] = pd.to_datetime(grupo_0_act['dt'])
grupo_0_act.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 948407 entries, 1356595 to 3659999
Data columns (total 4 columns):
 #   Column          Non-Null Count   Dtype         
---  ------          --------------   -----         
 0   userid          948407 non-null  object        
 1   dt              948407 non-null  datetime64[ns]
 2   groupid         948407 non-null  int64         
 3   activity_level  948407 non-null  int64         
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 36.2+ MB


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  grupo_0_act['dt'] = pd.to_datetime(grupo_0_act['dt'])


In [32]:
g0_act_pre = grupo_0_act[grupo_0_act['dt'] < '2021-11-01']
g0_act_pre

Unnamed: 0,userid,dt,groupid,activity_level
1356595,ce64a9d8-07d9-4dca-908d-5e1e4568003d,2021-10-01,0,1
1356597,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,0,1
1356598,420a60e9-6394-4324-b02c-ab372609968e,2021-10-01,0,1
1356599,6f6b36ef-bd93-4399-a2f4-996c96d3e0a7,2021-10-01,0,1
1356600,7dfbbc2e-6e71-4128-848d-be83df79b921,2021-10-01,0,1
...,...,...,...,...
3625427,2ffce3bd-f7c6-4752-9141-ad887eea6938,2021-10-31,0,20
3625429,1a0dc2cf-c05a-40ad-86b8-d24809295ee2,2021-10-31,0,20
3625430,59f581ac-ff18-40f7-8253-cd8e7612bded,2021-10-31,0,20
3625439,200d65e6-b1ce-4a47-8c2b-946db5c5a3a0,2021-10-31,0,20


In [33]:
g0_act_pre_userid = g0_act_pre.groupby(['activity_level']).count() 
g0_act_pre_userid_user=g0_act_pre_userid.reset_index()
g0_act_pre_userid_user

Unnamed: 0,activity_level,userid,dt,groupid
0,1,24390,24390,24390
1,2,24376,24376,24376
2,3,24241,24241,24241
3,4,24307,24307,24307
4,5,24552,24552,24552
5,6,24552,24552,24552
6,7,24296,24296,24296
7,8,24322,24322,24322
8,9,24428,24428,24428
9,10,24317,24317,24317


In [34]:
#g0_act_pre_day = g0_act_pre.drop(['userid'], axis=1)
g0_act_pre_day = g0_act_pre.groupby(['dt']).count() 
g0_act_pre_day_reset=g0_act_pre_day.reset_index()
g0_act_pre_day_reset

Unnamed: 0,dt,userid,groupid,activity_level
0,2021-10-01,15337,15337,15337
1,2021-10-02,15354,15354,15354
2,2021-10-03,15423,15423,15423
3,2021-10-04,15211,15211,15211
4,2021-10-05,15126,15126,15126
5,2021-10-06,15335,15335,15335
6,2021-10-07,15346,15346,15346
7,2021-10-08,15357,15357,15357
8,2021-10-09,15371,15371,15371
9,2021-10-10,15277,15277,15277


In [35]:
g0_act_pre_2 = g0_act_pre.groupby(['dt','activity_level']).count()
g0_act_pre_2_reset_0 = g0_act_pre_2.reset_index()
g0_act_pre_2_reset_0

Unnamed: 0,dt,activity_level,userid,groupid
0,2021-10-01,1,792,792
1,2021-10-01,2,755,755
2,2021-10-01,3,799,799
3,2021-10-01,4,800,800
4,2021-10-01,5,798,798
...,...,...,...,...
615,2021-10-31,16,739,739
616,2021-10-31,17,724,724
617,2021-10-31,18,756,756
618,2021-10-31,19,803,803


In [36]:
g0_act_pre.describe()

Unnamed: 0,groupid,activity_level
count,474947.0,474947.0
mean,0.0,10.254769
std,0.0,5.636209
min,0.0,1.0
25%,0.0,5.0
50%,0.0,10.0
75%,0.0,15.0
max,0.0,20.0


In [37]:
g0_act_post = grupo_0_act[grupo_0_act['dt'] >= '2021-11-01']
g0_act_post

Unnamed: 0,userid,dt,groupid,activity_level
1405325,27f9ec3c-37bf-459a-b94b-f2aff84cd96f,2021-11-01,0,1
1405327,c34e51cf-4b66-420f-94d0-2a0397b29d83,2021-11-01,0,1
1405331,479f1a5e-be3a-4a55-85d1-7fe97a6cc2f7,2021-11-01,0,1
1405335,77e076ee-0b7c-464a-b078-3a25c8f089e7,2021-11-01,0,1
1405339,1299e929-1dd7-4a76-8f97-b485641ee1e1,2021-11-01,0,1
...,...,...,...,...
3659988,b2a18b8c-00c7-4023-aa7e-e2b12d5bb5d3,2021-11-30,0,20
3659989,c9737b7f-eb1b-4733-9eaa-7d538d86fb3d,2021-11-30,0,20
3659995,f0126b50-ad74-4480-9250-41b50a408932,2021-11-30,0,20
3659997,f2073207-25dd-4127-a893-b70106d5ead7,2021-11-30,0,20


# Análisis Actividad Grupo 1: grupo_1_act

In [38]:
grupo_1_act

Unnamed: 0,userid,dt,groupid,activity_level
1356592,428070b0-083e-4c0e-8444-47bf91e99fff,2021-10-01,1,1
1356593,93370f9c-56ef-437f-99ff-cb7c092d08a7,2021-10-01,1,1
1356594,0fb7120a-53cf-4a51-8b52-bf07b8659bd6,2021-10-01,1,1
1356596,e08332f0-3a5c-4ed2-b957-87e464e89b97,2021-10-01,1,1
1356602,75d936e5-257e-4b78-a7b3-96acb30ce6c1,2021-10-01,1,1
...,...,...,...,...
3659992,05f00021-052d-493c-94a7-554702d7f3a1,2021-11-30,1,20
3659993,219e12b3-49dc-4fc1-b947-c0683a8a400f,2021-11-30,1,20
3659994,cbc2d82c-7940-42fa-9dc5-7790d11b06b5,2021-11-30,1,20
3659996,6ffe1efe-2e5d-427f-95ff-cc862c46c798,2021-11-30,1,20


In [39]:
grupo_1_act['dt'] = pd.to_datetime(grupo_1_act['dt'])
grupo_1_act.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1355001 entries, 1356592 to 3659998
Data columns (total 4 columns):
 #   Column          Non-Null Count    Dtype         
---  ------          --------------    -----         
 0   userid          1355001 non-null  object        
 1   dt              1355001 non-null  datetime64[ns]
 2   groupid         1355001 non-null  int64         
 3   activity_level  1355001 non-null  int64         
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 51.7+ MB


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  grupo_1_act['dt'] = pd.to_datetime(grupo_1_act['dt'])


In [40]:
grupo_1_act_pre = grupo_1_act[grupo_1_act['dt'] < '2021-11-01']
grupo_1_act_pre

Unnamed: 0,userid,dt,groupid,activity_level
1356592,428070b0-083e-4c0e-8444-47bf91e99fff,2021-10-01,1,1
1356593,93370f9c-56ef-437f-99ff-cb7c092d08a7,2021-10-01,1,1
1356594,0fb7120a-53cf-4a51-8b52-bf07b8659bd6,2021-10-01,1,1
1356596,e08332f0-3a5c-4ed2-b957-87e464e89b97,2021-10-01,1,1
1356602,75d936e5-257e-4b78-a7b3-96acb30ce6c1,2021-10-01,1,1
...,...,...,...,...
3625437,93179304-6690-4932-bb68-6db1a18c747a,2021-10-31,1,20
3625438,a2551ab2-abd6-46a1-9f05-e9d2318ddf35,2021-10-31,1,20
3625440,535dafe4-de7c-4b56-acf6-aa94f21653bc,2021-10-31,1,20
3625441,0428ca3c-e666-4ef4-8588-3a2af904a123,2021-10-31,1,20


In [41]:
grupo_1_act_post = grupo_1_act[grupo_1_act['dt'] >= '2021-11-01']
grupo_1_act_post 

Unnamed: 0,userid,dt,groupid,activity_level
1405324,37e721ba-4b26-4196-abd1-2435da67d619,2021-11-01,1,1
1405326,26162641-e802-4f79-b2ec-6b79845aad89,2021-11-01,1,1
1405328,90c3c10b-5767-41d2-b142-f8a859782cbd,2021-11-01,1,1
1405329,4509302c-c10d-4a56-8730-8dab6523e26d,2021-11-01,1,1
1405330,dac72108-96e4-4c30-a129-6b61910c7c44,2021-11-01,1,1
...,...,...,...,...
3659992,05f00021-052d-493c-94a7-554702d7f3a1,2021-11-30,1,20
3659993,219e12b3-49dc-4fc1-b947-c0683a8a400f,2021-11-30,1,20
3659994,cbc2d82c-7940-42fa-9dc5-7790d11b06b5,2021-11-30,1,20
3659996,6ffe1efe-2e5d-427f-95ff-cc862c46c798,2021-11-30,1,20


In [42]:
grupo_1_act_pre

Unnamed: 0,userid,dt,groupid,activity_level
1356592,428070b0-083e-4c0e-8444-47bf91e99fff,2021-10-01,1,1
1356593,93370f9c-56ef-437f-99ff-cb7c092d08a7,2021-10-01,1,1
1356594,0fb7120a-53cf-4a51-8b52-bf07b8659bd6,2021-10-01,1,1
1356596,e08332f0-3a5c-4ed2-b957-87e464e89b97,2021-10-01,1,1
1356602,75d936e5-257e-4b78-a7b3-96acb30ce6c1,2021-10-01,1,1
...,...,...,...,...
3625437,93179304-6690-4932-bb68-6db1a18c747a,2021-10-31,1,20
3625438,a2551ab2-abd6-46a1-9f05-e9d2318ddf35,2021-10-31,1,20
3625440,535dafe4-de7c-4b56-acf6-aa94f21653bc,2021-10-31,1,20
3625441,0428ca3c-e666-4ef4-8588-3a2af904a123,2021-10-31,1,20


In [43]:
g1_act_pre_userid = grupo_1_act_pre.groupby(['activity_level']).count() 
g1_act_pre_userid_user=g1_act_pre_userid.reset_index()
g1_act_pre_userid_user

Unnamed: 0,activity_level,userid,dt,groupid
0,1,24342,24342,24342
1,2,24698,24698,24698
2,3,24418,24418,24418
3,4,24249,24249,24249
4,5,24675,24675,24675
5,6,24349,24349,24349
6,7,24043,24043,24043
7,8,24074,24074,24074
8,9,24392,24392,24392
9,10,24626,24626,24626


In [44]:
g1_act_pre_day = grupo_1_act_pre.groupby(['dt']).count() 
g1_act_pre_day_reset=g1_act_pre_day.reset_index()
g1_act_pre_day_reset

Unnamed: 0,dt,userid,groupid,activity_level
0,2021-10-01,15297,15297,15297
1,2021-10-02,15421,15421,15421
2,2021-10-03,15362,15362,15362
3,2021-10-04,15388,15388,15388
4,2021-10-05,15462,15462,15462
5,2021-10-06,15304,15304,15304
6,2021-10-07,15291,15291,15291
7,2021-10-08,15243,15243,15243
8,2021-10-09,15531,15531,15531
9,2021-10-10,15304,15304,15304


In [45]:
g1_act_pre_2 = grupo_1_act_pre.groupby(['dt','activity_level']).count()
g1_act_pre_2_reset_0 = g1_act_pre_2.reset_index()
g1_act_pre_2_reset_0

Unnamed: 0,dt,activity_level,userid,groupid
0,2021-10-01,1,810,810
1,2021-10-01,2,752,752
2,2021-10-01,3,788,788
3,2021-10-01,4,751,751
4,2021-10-01,5,788,788
...,...,...,...,...
615,2021-10-31,16,760,760
616,2021-10-31,17,810,810
617,2021-10-31,18,775,775
618,2021-10-31,19,813,813


In [46]:
grupo_1_act_pre.describe()

Unnamed: 0,groupid,activity_level
count,475928.0,475928.0
mean,1.0,10.257951
std,0.0,5.635674
min,1.0,1.0
25%,1.0,5.0
50%,1.0,10.0
75%,1.0,15.0
max,1.0,20.0


# CTR

In [47]:
CTR_pretest = pd.read_csv('/Users/anadeondarza/Desktop/ironhack_data/challenges/ab_test_challenge/data/ctr_all.csv')
CTR_pretest.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2303408 entries, 0 to 2303407
Data columns (total 4 columns):
 #   Column   Dtype  
---  ------   -----  
 0   userid   object 
 1   dt       object 
 2   groupid  int64  
 3   ctr      float64
dtypes: float64(1), int64(1), object(2)
memory usage: 70.3+ MB


In [48]:
CTR_pretest['dt'] = pd.to_datetime(CTR_pretest['dt'])
CTR_pretest.info()
CTR_pretest

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2303408 entries, 0 to 2303407
Data columns (total 4 columns):
 #   Column   Dtype         
---  ------   -----         
 0   userid   object        
 1   dt       datetime64[ns]
 2   groupid  int64         
 3   ctr      float64       
dtypes: datetime64[ns](1), float64(1), int64(1), object(1)
memory usage: 70.3+ MB


Unnamed: 0,userid,dt,groupid,ctr
0,60389fa7-2d71-4cdf-831c-c2bb277ffa1e,2021-11-13,0,31.81
1,b59cb225-d160-4851-92d2-7cc8120a2f63,2021-11-13,0,30.46
2,aa336050-934e-453f-a5b0-dd881fcd114e,2021-11-13,0,34.25
3,8df767f4-a10f-4322-a722-676b7e02b372,2021-11-13,0,34.92
4,a74762ed-4da0-42ab-91d2-40d7e808dfe9,2021-11-13,0,34.95
...,...,...,...,...
2303403,932e0348-ea2d-4b98-8782-aa84420f0796,2021-11-12,1,37.27
2303404,6775a825-6d3d-4dc3-9335-cad061736752,2021-11-12,1,39.14
2303405,a7b55365-21f1-4123-b2b5-485a8c7b98da,2021-11-12,1,40.05
2303406,a6fa937c-6f40-4f04-b15b-f1de09e179db,2021-11-12,1,38.14


In [49]:
CTR_pretest_0 = CTR_pretest[CTR_pretest['groupid'] == 0]
CTR_pretest_0

Unnamed: 0,userid,dt,groupid,ctr
0,60389fa7-2d71-4cdf-831c-c2bb277ffa1e,2021-11-13,0,31.81
1,b59cb225-d160-4851-92d2-7cc8120a2f63,2021-11-13,0,30.46
2,aa336050-934e-453f-a5b0-dd881fcd114e,2021-11-13,0,34.25
3,8df767f4-a10f-4322-a722-676b7e02b372,2021-11-13,0,34.92
4,a74762ed-4da0-42ab-91d2-40d7e808dfe9,2021-11-13,0,34.95
...,...,...,...,...
2274125,26c10c02-8ede-4beb-be14-32f8cca044ff,2021-11-12,0,33.28
2274126,ae235c4b-96a7-4f34-923e-08531a5f340a,2021-11-12,0,34.15
2274127,81daf7da-ba09-451f-b100-f15ed284977e,2021-11-12,0,35.79
2274128,38338581-e093-4202-8c9f-975004e221e3,2021-11-12,0,31.82


In [50]:
CTR_pretest_0_pre = CTR_pretest_0[CTR_pretest_0['dt'] < '2021-11-01']
CTR_pretest_0_pre

Unnamed: 0,userid,dt,groupid,ctr
808703,4b328144-df4b-47b1-a804-09834942dce0,2021-10-01,0,34.28
808704,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,0,34.67
808705,8028cccf-19c3-4c0e-b5b2-e707e15d2d83,2021-10-01,0,34.77
808706,652b3c9c-5e29-4bf0-9373-924687b1567e,2021-10-01,0,35.42
808707,45b57434-4666-4b57-9798-35489dc1092a,2021-10-01,0,35.04
...,...,...,...,...
1744262,b2f30e48-d012-4687-a93f-8000ab04e565,2021-10-31,0,31.54
1744263,1184bf5f-7036-4a6e-afc1-b0c8f1dba2de,2021-10-31,0,31.92
1744264,8b20e638-d933-489f-8139-7d7ca93aa8e0,2021-10-31,0,32.05
1744265,2071887d-c673-4d39-81dc-5c384bcb8458,2021-10-31,0,30.25


In [51]:
CTR_posttest_0_post = CTR_pretest_0[CTR_pretest_0['dt'] >= '2021-11-01']
#CTR_posttest_0_post

In [52]:
CTR_pretest_1 = CTR_pretest[CTR_pretest['groupid'] == 1]
CTR_pretest_1

Unnamed: 0,userid,dt,groupid,ctr
15973,cd5df711-42f7-4684-9ae8-f6a72383bb28,2021-11-13,1,40.39
15974,fe630199-265b-4542-a103-a74d66abeb22,2021-11-13,1,37.70
15975,4b519a79-b1a4-40b0-9369-be9e2a2699af,2021-11-13,1,35.47
15976,30a8c7b1-ed8a-4cf2-888e-b8e110ba88d9,2021-11-13,1,40.07
15977,88ab26e4-2e67-4397-a5ec-8c2a384372f5,2021-11-13,1,40.76
...,...,...,...,...
2303403,932e0348-ea2d-4b98-8782-aa84420f0796,2021-11-12,1,37.27
2303404,6775a825-6d3d-4dc3-9335-cad061736752,2021-11-12,1,39.14
2303405,a7b55365-21f1-4123-b2b5-485a8c7b98da,2021-11-12,1,40.05
2303406,a6fa937c-6f40-4f04-b15b-f1de09e179db,2021-11-12,1,38.14


In [53]:
CTR_pretest_1_pre = CTR_pretest_1[CTR_pretest_1['dt'] < '2021-11-01']
CTR_pretest_1_pre

Unnamed: 0,userid,dt,groupid,ctr
824040,381e40b0-5529-4bc6-a3f6-6a687c7cde66,2021-10-01,1,31.27
824041,1797453f-f558-42f6-9a2f-55b95dd37e71,2021-10-01,1,32.18
824042,f8efefba-4782-4104-8fbf-7f4381dfb6d6,2021-10-01,1,31.20
824043,8a18c870-b2e2-4a47-9b30-0859f5854dcc,2021-10-01,1,31.19
824044,d472fbc3-d580-49f7-9ba4-ef002cc80606,2021-10-01,1,35.62
...,...,...,...,...
1759573,a09a3687-b71a-4a67-b1ef-9b05c9770c4c,2021-10-31,1,32.33
1759574,c843a595-b94c-42e1-b2fe-ec096070681e,2021-10-31,1,30.09
1759575,edcdf0c1-3d8f-47e8-b7dd-05505749eb69,2021-10-31,1,35.71
1759576,76b7a9ae-98fa-4c77-869d-594a4ef7282d,2021-10-31,1,34.76


In [54]:
CTR_posttest_1_post = CTR_pretest_1[CTR_pretest_1['dt'] >= '2021-11-01']
#CTR_posttest_1_post

In [55]:
CTR_pretest_1_pre

Unnamed: 0,userid,dt,groupid,ctr
824040,381e40b0-5529-4bc6-a3f6-6a687c7cde66,2021-10-01,1,31.27
824041,1797453f-f558-42f6-9a2f-55b95dd37e71,2021-10-01,1,32.18
824042,f8efefba-4782-4104-8fbf-7f4381dfb6d6,2021-10-01,1,31.20
824043,8a18c870-b2e2-4a47-9b30-0859f5854dcc,2021-10-01,1,31.19
824044,d472fbc3-d580-49f7-9ba4-ef002cc80606,2021-10-01,1,35.62
...,...,...,...,...
1759573,a09a3687-b71a-4a67-b1ef-9b05c9770c4c,2021-10-31,1,32.33
1759574,c843a595-b94c-42e1-b2fe-ec096070681e,2021-10-31,1,30.09
1759575,edcdf0c1-3d8f-47e8-b7dd-05505749eb69,2021-10-31,1,35.71
1759576,76b7a9ae-98fa-4c77-869d-594a4ef7282d,2021-10-31,1,34.76


# Análisis CTR Grupo PRE 0

In [56]:
CTR_pretest_0_pre

Unnamed: 0,userid,dt,groupid,ctr
808703,4b328144-df4b-47b1-a804-09834942dce0,2021-10-01,0,34.28
808704,34ace777-5e9d-40b3-a859-4145d0c35c8d,2021-10-01,0,34.67
808705,8028cccf-19c3-4c0e-b5b2-e707e15d2d83,2021-10-01,0,34.77
808706,652b3c9c-5e29-4bf0-9373-924687b1567e,2021-10-01,0,35.42
808707,45b57434-4666-4b57-9798-35489dc1092a,2021-10-01,0,35.04
...,...,...,...,...
1744262,b2f30e48-d012-4687-a93f-8000ab04e565,2021-10-31,0,31.54
1744263,1184bf5f-7036-4a6e-afc1-b0c8f1dba2de,2021-10-31,0,31.92
1744264,8b20e638-d933-489f-8139-7d7ca93aa8e0,2021-10-31,0,32.05
1744265,2071887d-c673-4d39-81dc-5c384bcb8458,2021-10-31,0,30.25


In [57]:
ctr_pre_0_dt = CTR_pretest_0_pre.groupby(['dt']).mean()
ctr_pre_0_dt_in = ctr_pre_0_dt.reset_index()
#ctr_pre_0_dt_in 

In [58]:
CTR_pretest_0_pre_user = CTR_pretest_0_pre.groupby(['userid']).mean()
CTR_pretest_0_pre_user_in = CTR_pretest_0_pre_user.reset_index()
#CTR_pretest_0_pre_user_in

In [59]:
CTR_pretest_0_pre.describe()

Unnamed: 0,groupid,ctr
count,474947.0,474947.0
mean,0.0,33.000913
std,0.0,1.7337
min,0.0,30.0
25%,0.0,31.5
50%,0.0,33.0
75%,0.0,34.5
max,0.0,36.0


# Análisis CTR Grupo PRE 1

In [60]:
CTR_pretest_1_pre

Unnamed: 0,userid,dt,groupid,ctr
824040,381e40b0-5529-4bc6-a3f6-6a687c7cde66,2021-10-01,1,31.27
824041,1797453f-f558-42f6-9a2f-55b95dd37e71,2021-10-01,1,32.18
824042,f8efefba-4782-4104-8fbf-7f4381dfb6d6,2021-10-01,1,31.20
824043,8a18c870-b2e2-4a47-9b30-0859f5854dcc,2021-10-01,1,31.19
824044,d472fbc3-d580-49f7-9ba4-ef002cc80606,2021-10-01,1,35.62
...,...,...,...,...
1759573,a09a3687-b71a-4a67-b1ef-9b05c9770c4c,2021-10-31,1,32.33
1759574,c843a595-b94c-42e1-b2fe-ec096070681e,2021-10-31,1,30.09
1759575,edcdf0c1-3d8f-47e8-b7dd-05505749eb69,2021-10-31,1,35.71
1759576,76b7a9ae-98fa-4c77-869d-594a4ef7282d,2021-10-31,1,34.76


In [61]:
ctr_pre_1_dt = CTR_pretest_1_pre.groupby(['dt']).mean()
ctr_pre_1_dt_in = ctr_pre_1_dt.reset_index()
ctr_pre_0_dt_in 

Unnamed: 0,dt,groupid,ctr
0,2021-10-01,0.0,32.980627
1,2021-10-02,0.0,33.004056
2,2021-10-03,0.0,33.002006
3,2021-10-04,0.0,32.990363
4,2021-10-05,0.0,33.014167
5,2021-10-06,0.0,33.02166
6,2021-10-07,0.0,32.976366
7,2021-10-08,0.0,33.003955
8,2021-10-09,0.0,33.024208
9,2021-10-10,0.0,33.002929


In [62]:
CTR_pretest_1_pre_user = CTR_pretest_1_pre.groupby(['userid']).mean()
CTR_pretest_1_pre_user_in = CTR_pretest_1_pre_user.reset_index()
CTR_pretest_1_pre_user_in

Unnamed: 0,userid,groupid,ctr
0,0004c8bb-df77-43b2-a93c-7398e9bc5175,1.0,33.472500
1,00051943-ca03-49d2-aafc-138439e5459c,1.0,32.270000
2,000bd06a-23a3-4773-b9fb-cdceb64899b9,1.0,33.270714
3,00175478-40a3-4806-830d-dcf0cc593f8b,1.0,33.105882
4,001a9d23-7549-44dc-8add-819cdf3d564f,1.0,33.341765
...,...,...,...
30044,ffe7931f-491d-4ab1-a541-e42c8bd2737c,1.0,33.072143
30045,ffede027-c669-44bc-845b-360e35e802c5,1.0,33.107000
30046,fff33f17-41ce-4c40-b147-35ddca524426,1.0,33.202143
30047,fffd73e1-a42e-49cb-9b17-1c5aae62ab46,1.0,32.810000


In [63]:
CTR_pretest_1_pre.describe()

Unnamed: 0,groupid,ctr
count,475928.0,475928.0
mean,1.0,32.999572
std,0.0,1.729657
min,1.0,30.0
25%,1.0,31.5
50%,1.0,33.0
75%,1.0,34.5
max,1.0,36.0


# Ztest Activiy and CTR PRE

In [64]:
#Activity Level PRE
#Hipótesis nula: tienen el mismo comportamiento antes del experimento. 

In [65]:
#z-test Activity Level
z_score, p_value = ztest(g0_act_pre['activity_level'],grupo_1_act_pre['activity_level'])

print(f'z_score: {z_score}', f'\np-value: {p_value}')

z_score: -0.27521370941856227 
p-value: 0.7831520549245693


In [66]:
g0_act_pre['activity_level'].mean()

10.254769479541928

In [67]:
grupo_1_act_pre['activity_level'].mean()

10.257950782471298

Las dos muestras tienen comportamiento similar

In [68]:
# z-test Userid
#Hipótesis nula: tienen el mismo comportamiento antes del experimento. 
z_score, p_value = ztest(g0_act_pre_day_reset['userid'], g1_act_pre_day_reset['userid'], value=0)

print(f'z_score: {z_score}', f'\np-value: {p_value}')

z_score: -1.4121065242323187 
p-value: 0.15791859802311015


In [69]:
g0_act_pre_day_reset['userid'].mean()

15320.870967741936

In [70]:
g1_act_pre_day_reset['userid'].mean()

15352.516129032258

Las dos muestras tienen comportamiento similar

In [71]:
#CTR Level PREtest
#Hipótesis nula: tienen el mismo comportamiento antes del experimento. 

In [72]:
z_score, p_value = ztest(CTR_pretest_0_pre['ctr'], CTR_pretest_1_pre['ctr'], value=0)

print(f'z_score: {z_score}', f'\np-value: {p_value}')


z_score: 0.3775817380268587 
p-value: 0.7057413330705573


In [73]:
CTR_pretest_0_pre['ctr'].mean()

33.0009127755312

In [74]:
CTR_pretest_1_pre['ctr'].mean()

32.99957172093207

Las dos muestras tienen comportamiento similar

In [None]:
"""
Podemos concluir que antes del experimento las dos muestras tienen comportamiento similar.
"""

---

## Experiment metrics 

In this section you must perform the same analysis as in the previous section, but using the data generated during the experiment (i.e.: after November 1st, 2021). You must provide insights about the metrics (__Activity level__, __DAU__ and __CTR__) and also perform an hyphotesis test in order to determine whether there is any statistical significant difference between the groups during the experiment. You must try different approaches (i.e.: __z-test__ and __t-test__) and compare the results.


__Datasets:__ `activity_all.csv`, `ctr_all.csv`

# Analisis Actividad Grupo 0

In [75]:
g0_act_post

Unnamed: 0,userid,dt,groupid,activity_level
1405325,27f9ec3c-37bf-459a-b94b-f2aff84cd96f,2021-11-01,0,1
1405327,c34e51cf-4b66-420f-94d0-2a0397b29d83,2021-11-01,0,1
1405331,479f1a5e-be3a-4a55-85d1-7fe97a6cc2f7,2021-11-01,0,1
1405335,77e076ee-0b7c-464a-b078-3a25c8f089e7,2021-11-01,0,1
1405339,1299e929-1dd7-4a76-8f97-b485641ee1e1,2021-11-01,0,1
...,...,...,...,...
3659988,b2a18b8c-00c7-4023-aa7e-e2b12d5bb5d3,2021-11-30,0,20
3659989,c9737b7f-eb1b-4733-9eaa-7d538d86fb3d,2021-11-30,0,20
3659995,f0126b50-ad74-4480-9250-41b50a408932,2021-11-30,0,20
3659997,f2073207-25dd-4127-a893-b70106d5ead7,2021-11-30,0,20


In [76]:
g0_act_post_userid = g0_act_post.groupby(['activity_level']).count() 
g0_act_post_userid_user=g0_act_post_userid.reset_index()
g0_act_post_userid_user

Unnamed: 0,activity_level,userid,dt,groupid
0,1,24734,24734,24734
1,2,24472,24472,24472
2,3,24247,24247,24247
3,4,24188,24188,24188
4,5,24150,24150,24150
5,6,24191,24191,24191
6,7,24276,24276,24276
7,8,24037,24037,24037
8,9,24283,24283,24283
9,10,24039,24039,24039


In [77]:
g0_act_post_day = g0_act_post.groupby(['dt']).count() 
g0_act_post_day_reset=g0_act_post_day.reset_index()
g0_act_post_day_reset

Unnamed: 0,dt,userid,groupid,activity_level
0,2021-11-01,15989,15989,15989
1,2021-11-02,16024,16024,16024
2,2021-11-03,16049,16049,16049
3,2021-11-04,16040,16040,16040
4,2021-11-05,16045,16045,16045
5,2021-11-06,15991,15991,15991
6,2021-11-07,16133,16133,16133
7,2021-11-08,16119,16119,16119
8,2021-11-09,15953,15953,15953
9,2021-11-10,15990,15990,15990


In [78]:
g0_act_post_2 = g0_act_post.groupby(['dt','activity_level']).count()
g0_act_post_2_reset_0 = g0_act_post_2.reset_index()
g0_act_post_2_reset_0

Unnamed: 0,dt,activity_level,userid,groupid
0,2021-11-01,1,819,819
1,2021-11-01,2,861,861
2,2021-11-01,3,838,838
3,2021-11-01,4,847,847
4,2021-11-01,5,845,845
...,...,...,...,...
595,2021-11-30,16,822,822
596,2021-11-30,17,788,788
597,2021-11-30,18,771,771
598,2021-11-30,19,769,769


In [79]:
g0_act_post.describe()

Unnamed: 0,groupid,activity_level
count,473460.0,473460.0
mean,0.0,10.25229
std,0.0,5.642184
min,0.0,1.0
25%,0.0,5.0
50%,0.0,10.0
75%,0.0,15.0
max,0.0,20.0


# Analisis Actividad Grupo 1

In [80]:
grupo_1_act_post

Unnamed: 0,userid,dt,groupid,activity_level
1405324,37e721ba-4b26-4196-abd1-2435da67d619,2021-11-01,1,1
1405326,26162641-e802-4f79-b2ec-6b79845aad89,2021-11-01,1,1
1405328,90c3c10b-5767-41d2-b142-f8a859782cbd,2021-11-01,1,1
1405329,4509302c-c10d-4a56-8730-8dab6523e26d,2021-11-01,1,1
1405330,dac72108-96e4-4c30-a129-6b61910c7c44,2021-11-01,1,1
...,...,...,...,...
3659992,05f00021-052d-493c-94a7-554702d7f3a1,2021-11-30,1,20
3659993,219e12b3-49dc-4fc1-b947-c0683a8a400f,2021-11-30,1,20
3659994,cbc2d82c-7940-42fa-9dc5-7790d11b06b5,2021-11-30,1,20
3659996,6ffe1efe-2e5d-427f-95ff-cc862c46c798,2021-11-30,1,20


In [81]:
g1_act_post_userid = grupo_1_act_post.groupby(['activity_level']).count() 
g1_act_post_userid_user=g1_act_post_userid.reset_index()
g1_act_post_userid_user

Unnamed: 0,activity_level,userid,dt,groupid
0,1,45183,45183,45183
1,2,45507,45507,45507
2,3,45181,45181,45181
3,4,45148,45148,45148
4,5,44807,44807,44807
5,6,45007,45007,45007
6,7,44744,44744,44744
7,8,45180,45180,45180
8,9,45369,45369,45369
9,10,44687,44687,44687


In [82]:
g1_act_post_day = grupo_1_act_post.groupby(['dt']).count() 
g1_act_post_day_reset=grupo_1_act_post.reset_index()
g1_act_post_day_reset

Unnamed: 0,index,userid,dt,groupid,activity_level
0,1405324,37e721ba-4b26-4196-abd1-2435da67d619,2021-11-01,1,1
1,1405326,26162641-e802-4f79-b2ec-6b79845aad89,2021-11-01,1,1
2,1405328,90c3c10b-5767-41d2-b142-f8a859782cbd,2021-11-01,1,1
3,1405329,4509302c-c10d-4a56-8730-8dab6523e26d,2021-11-01,1,1
4,1405330,dac72108-96e4-4c30-a129-6b61910c7c44,2021-11-01,1,1
...,...,...,...,...,...
879068,3659992,05f00021-052d-493c-94a7-554702d7f3a1,2021-11-30,1,20
879069,3659993,219e12b3-49dc-4fc1-b947-c0683a8a400f,2021-11-30,1,20
879070,3659994,cbc2d82c-7940-42fa-9dc5-7790d11b06b5,2021-11-30,1,20
879071,3659996,6ffe1efe-2e5d-427f-95ff-cc862c46c798,2021-11-30,1,20


In [99]:
g1_act_post_2 = grupo_1_act_post.groupby(['dt'],).count()
g1_act_post_2_reset_0 = g1_act_post_2.reset_index()
g1_act_post_2_reset_0

Unnamed: 0,dt,userid,groupid,activity_level
0,2021-11-01,29318,29318,29318
1,2021-11-02,29289,29289,29289
2,2021-11-03,29306,29306,29306
3,2021-11-04,29267,29267,29267
4,2021-11-05,29336,29336,29336
5,2021-11-06,29306,29306,29306
6,2021-11-07,29255,29255,29255
7,2021-11-08,29263,29263,29263
8,2021-11-09,29286,29286,29286
9,2021-11-10,29340,29340,29340


In [84]:
grupo_1_act_post.describe()

Unnamed: 0,groupid,activity_level
count,879073.0,879073.0
mean,1.0,10.250989
std,0.0,5.634872
min,1.0,1.0
25%,1.0,5.0
50%,1.0,10.0
75%,1.0,15.0
max,1.0,20.0


# Analisis CTR Grupo 0

In [102]:
CTR_posttest_0_post

Unnamed: 0,userid,dt,groupid,ctr
0,60389fa7-2d71-4cdf-831c-c2bb277ffa1e,2021-11-13,0,31.81
1,b59cb225-d160-4851-92d2-7cc8120a2f63,2021-11-13,0,30.46
2,aa336050-934e-453f-a5b0-dd881fcd114e,2021-11-13,0,34.25
3,8df767f4-a10f-4322-a722-676b7e02b372,2021-11-13,0,34.92
4,a74762ed-4da0-42ab-91d2-40d7e808dfe9,2021-11-13,0,34.95
...,...,...,...,...
2274125,26c10c02-8ede-4beb-be14-32f8cca044ff,2021-11-12,0,33.28
2274126,ae235c4b-96a7-4f34-923e-08531a5f340a,2021-11-12,0,34.15
2274127,81daf7da-ba09-451f-b100-f15ed284977e,2021-11-12,0,35.79
2274128,38338581-e093-4202-8c9f-975004e221e3,2021-11-12,0,31.82


In [103]:
ctr_post_0_dt = CTR_posttest_0_post.groupby(['dt']).mean()
ctr_post_0_dt_in = ctr_post_0_dt.reset_index()
ctr_post_0_dt_in

Unnamed: 0,dt,groupid,ctr
0,2021-11-01,0.0,32.982671
1,2021-11-02,0.0,33.014983
2,2021-11-03,0.0,33.008268
3,2021-11-04,0.0,32.986679
4,2021-11-05,0.0,33.004766
5,2021-11-06,0.0,32.998322
6,2021-11-07,0.0,33.006228
7,2021-11-08,0.0,32.990779
8,2021-11-09,0.0,33.021025
9,2021-11-10,0.0,32.993291


In [104]:
CTR_pretest_0_post_user = CTR_posttest_0_post.groupby(['userid']).mean()
CTR_pretest_0_post_user_in = CTR_pretest_0_post_user.reset_index()
CTR_pretest_0_post_user_in 

Unnamed: 0,userid,groupid,ctr
0,0002a1ca-0b76-41cd-91e6-9aa51947b7fc,0.0,32.362727
1,00037d4d-ebfa-4a99-9d3e-adbefd6dae3a,0.0,32.052667
2,0007262b-b62e-447a-9021-232ad25df9ed,0.0,33.325789
3,00077af5-c5b2-4c10-ab23-d4c7a1eea3f1,0.0,32.592778
4,000858b8-e8ce-4a6a-8d3a-55a4346a6076,0.0,33.038750
...,...,...,...
29946,fff5daf2-fa11-45f3-9a83-ce37f22e11f9,0.0,33.732941
29947,fff7326c-419e-4c26-bba9-4b588f6e1eb0,0.0,32.960909
29948,fff8c764-169f-4ee0-92f3-4d858a485d5c,0.0,33.274667
29949,fffb68bd-be7f-48e4-80bb-41f7354983ca,0.0,33.172353


In [105]:
CTR_pretest_0_post_user_in.describe()

Unnamed: 0,groupid,ctr
count,29951.0,29951.0
mean,0.0,32.997001
std,0.0,0.444622
min,0.0,31.262727
25%,0.0,32.7
50%,0.0,32.994375
75%,0.0,33.294167
max,0.0,35.26


# Análisis CTR Grupo  1

In [106]:
CTR_posttest_1_post

Unnamed: 0,userid,dt,groupid,ctr
15973,cd5df711-42f7-4684-9ae8-f6a72383bb28,2021-11-13,1,40.39
15974,fe630199-265b-4542-a103-a74d66abeb22,2021-11-13,1,37.70
15975,4b519a79-b1a4-40b0-9369-be9e2a2699af,2021-11-13,1,35.47
15976,30a8c7b1-ed8a-4cf2-888e-b8e110ba88d9,2021-11-13,1,40.07
15977,88ab26e4-2e67-4397-a5ec-8c2a384372f5,2021-11-13,1,40.76
...,...,...,...,...
2303403,932e0348-ea2d-4b98-8782-aa84420f0796,2021-11-12,1,37.27
2303404,6775a825-6d3d-4dc3-9335-cad061736752,2021-11-12,1,39.14
2303405,a7b55365-21f1-4123-b2b5-485a8c7b98da,2021-11-12,1,40.05
2303406,a6fa937c-6f40-4f04-b15b-f1de09e179db,2021-11-12,1,38.14


In [107]:
ctr_post_1_dt = CTR_posttest_1_post.groupby(['dt']).mean()
ctr_post_1_dt_in = ctr_post_1_dt.reset_index()
ctr_post_1_dt_in

Unnamed: 0,dt,groupid,ctr
0,2021-11-01,1.0,37.994619
1,2021-11-02,1.0,38.013656
2,2021-11-03,1.0,37.995562
3,2021-11-04,1.0,37.988512
4,2021-11-05,1.0,38.002816
5,2021-11-06,1.0,38.013127
6,2021-11-07,1.0,37.995762
7,2021-11-08,1.0,37.994871
8,2021-11-09,1.0,38.013217
9,2021-11-10,1.0,37.990059


In [108]:
CTR_pretest_1_post_user = CTR_posttest_1_post.groupby(['userid']).mean()
CTR_pretest_1_post_user_in = CTR_pretest_1_post_user.reset_index()
CTR_pretest_1_post_user_in 

Unnamed: 0,userid,groupid,ctr
0,0004c8bb-df77-43b2-a93c-7398e9bc5175,1.0,37.437241
1,00051943-ca03-49d2-aafc-138439e5459c,1.0,38.481034
2,000bd06a-23a3-4773-b9fb-cdceb64899b9,1.0,38.544000
3,00175478-40a3-4806-830d-dcf0cc593f8b,1.0,38.008667
4,001a9d23-7549-44dc-8add-819cdf3d564f,1.0,38.445333
...,...,...,...
30044,ffe7931f-491d-4ab1-a541-e42c8bd2737c,1.0,38.493103
30045,ffede027-c669-44bc-845b-360e35e802c5,1.0,38.523793
30046,fff33f17-41ce-4c40-b147-35ddca524426,1.0,38.174667
30047,fffd73e1-a42e-49cb-9b17-1c5aae62ab46,1.0,37.701000


In [109]:
CTR_posttest_1_post.describe()

Unnamed: 0,groupid,ctr
count,879073.0,879073.0
mean,1.0,37.996959
std,0.0,1.732372
min,1.0,35.0
25%,1.0,36.5
50%,1.0,38.0
75%,1.0,39.5
max,1.0,41.0


# Ztest Activiy and CTR PRE

In [110]:
#Activity Level Post
# hipótesis nula: tienen el mismo comportamiento después del experimento. 

In [111]:
#z-test Activity Level
z_score, p_value = ztest(g0_act_post['activity_level'],grupo_1_act_post['activity_level'])

print(f'z_score: {z_score}', f'\np-value: {p_value}')

z_score: 0.12793424967290937 
p-value: 0.8982010064247459


In [112]:
g0_act_post['activity_level'].mean()

10.252289528154437

In [113]:
grupo_1_act_post['activity_level'].mean()

10.250989394509899

In [120]:
# Si comparamos las actividad de las dos muestras tienen la misma actividad y podemos aceptar la hipótesis nula. 


In [115]:
z_score, p_value = ztest(g0_act_post_day_reset['userid'], g1_act_post_2_reset_0['userid'])

print(f'z_score: {z_score}', f'\np-value: {p_value}')

z_score: -198.89904948926164 
p-value: 0.0


In [121]:
# las dos muestras aquí tienen comportamientos distintos cuando comparemos la actividad del useid por date, por
# lo que rechazamos la hipótesis nula

In [117]:
#CTR Level PREtest
# hipótesis nula: tienen el mismo comportamiento después del experimento. 

In [119]:
z_score, p_value = ztest(CTR_posttest_0_post['ctr'], CTR_posttest_1_post['ctr'], value=0)

print(f'z_score: {z_score}', f'\np-value: {p_value}')

z_score: -1600.7913068017688 
p-value: 0.0


In [None]:
# las dos muestras aquí tienen comportamientos distintos cuando comparemosel click de los usuarios, por
# lo que rechazamos la hipótesis nula

---

## Conclusions

Please provide your conclusions after the analyses and your recommendation whether we may or may not implement the changes in the digital product.

In [8]:
# your-conclusions


'''

Las dos muestras tenían el mismo comportamiento antes del experimento, pero tras aplicarlos, 
el número de uduarios y clicks cambia por lo que si se ve afectado.

En principio, por tanto, es recomendable el cambio.

'''

---