In [56]:
import pandas as pd
import scipy.stats as stats
import numpy as np
import seaborn as sns

In [57]:
df = pd.read_csv('/Users/liu/Desktop/experiment_dataset.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,Age,Location,Device,Variant,Time Spent,CTR
0,0,62,Location2,Device2,Control,13.928669,0.084776
1,1,18,Location1,Device1,Variant B,11.310518,0.096859
2,2,21,Location2,Device1,Variant B,24.8421,0.09763
3,3,21,Location1,Device3,Variant B,20.0613,0.109783
4,4,57,Location1,Device2,Variant B,34.495503,0.068579


In [58]:
df = df.drop(df.filter(regex='Unnamed'), axis=1)
#drop duplicates
df.drop_duplicates
df.head()

Unnamed: 0,Age,Location,Device,Variant,Time Spent,CTR
0,62,Location2,Device2,Control,13.928669,0.084776
1,18,Location1,Device1,Variant B,11.310518,0.096859
2,21,Location2,Device1,Variant B,24.8421,0.09763
3,21,Location1,Device3,Variant B,20.0613,0.109783
4,57,Location1,Device2,Variant B,34.495503,0.068579


In [59]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Age         1000 non-null   int64  
 1   Location    1000 non-null   object 
 2   Device      1000 non-null   object 
 3   Variant     1000 non-null   object 
 4   Time Spent  1000 non-null   float64
 5   CTR         1000 non-null   float64
dtypes: float64(2), int64(1), object(3)
memory usage: 47.0+ KB


In [60]:
# By location
df.groupby('Location')[['CTR','Time Spent']].mean()

Unnamed: 0_level_0,CTR,Time Spent
Location,Unnamed: 1_level_1,Unnamed: 2_level_1
Location1,0.110217,22.707286
Location2,0.108517,22.648998
Location3,0.108708,22.787691


In [61]:
# checking CTR
loc_1 = df[df['Location'] == 'Location1']['CTR']
loc_2 = df[df['Location'] == 'Location2']['CTR']
loc_3 = df[df['Location'] == 'Location2']['CTR']

p_val = stats.f_oneway(loc_1, loc_2, loc_3)
print(p_val)

F_onewayResult(statistic=0.5898006646540357, pvalue=0.5546275105072024)


In [62]:
# checking Time Spent
loc_1 = df[df['Location'] == 'Location1']['Time Spent']
loc_2 = df[df['Location'] == 'Location2']['Time Spent']
loc_3 = df[df['Location'] == 'Location2']['Time Spent']

p_val = stats.f_oneway(loc_1, loc_2, loc_3)
print(p_val)

F_onewayResult(statistic=0.012290518940782385, pvalue=0.9877848478672077)


For all 3 locations, it seems that the users behave the same. There is no statistically significant difference between them - this means we can comfortably assign all users to any variant of our experiment.

In [63]:
# checking CTR
loc_1 = df[df['Device'] == 'Device1']['CTR']
loc_2 = df[df['Device'] == 'Device2']['CTR']
loc_3 = df[df['Device'] == 'Device3']['CTR']

p_val = stats.f_oneway(loc_1, loc_2, loc_3)
print(p_val)

F_onewayResult(statistic=0.7105872492654717, pvalue=0.4916042399968955)


In [64]:
# checking Time spent
loc_1 = df[df['Device'] == 'Device1']['Time Spent']
loc_2 = df[df['Device'] == 'Device2']['Time Spent']
loc_3 = df[df['Device'] == 'Device3']['Time Spent']

p_val = stats.f_oneway(loc_1, loc_2, loc_3)
print(p_val)

F_onewayResult(statistic=0.26645371811833884, pvalue=0.7661459958744103)


For all 3 devices, it seems that the users behave the same. There is no statistically significant difference between them - this means we can comfortably assign all users to any variant of our experiment.

In [65]:
# Separate data for control group and variant groups
VariantA = df[df['Variant'] == 'Variant A']
VariantB = df[df['Variant'] == 'Variant B']
Control = df[df['Variant'] == 'Control']

In [66]:
 # checking CTR
VariantA = df[df['Variant'] == 'Variant A']['CTR']
VariantB = df[df['Variant'] == 'Variant B']['CTR']
Control = df[df['Variant'] == 'Control']['CTR']
p_val = stats.f_oneway(VariantA, VariantB, Control)
print(p_val)


F_onewayResult(statistic=93.58891593622702, pvalue=5.638952705781554e-38)


In [67]:
# checking Time spent
VariantA = df[df['Variant'] == 'Variant A']['Time Spent']
VariantB = df[df['Variant'] == 'Variant B']['Time Spent']
Control = df[df['Variant'] == 'Control']['Time Spent']
p_val = stats.f_oneway(VariantA, VariantB, Control)
print(p_val)


F_onewayResult(statistic=75.60840947416146, pvalue=2.676826588910432e-31)


Based on the ANOVA, we see that both the Time Spent and CTR have statistically significant results!

In [71]:
VariantA = df[df['Variant'] == 'Variant A']['CTR']
VariantB = df[df['Variant'] == 'Variant B']['CTR']
Control = df[df['Variant'] == 'Control']['CTR']
p_val = stats.ttest_ind(VariantA, Control)
p_val_2 = stats.ttest_ind(VariantB, Control)
p_val_3 = stats.ttest_ind(VariantA, VariantB)
print(p_val)
print(p_val_2)
print(p_val_3)

Ttest_indResult(statistic=13.829424737499187, pvalue=1.9602781373243157e-38)
Ttest_indResult(statistic=6.4718143491783255, pvalue=1.8743198199982106e-10)
Ttest_indResult(statistic=7.08499696316128, pvalue=3.587180487986577e-12)


we see there are significant differences between both variants and the control. This means we should have one variant that performs better than the others

In [72]:
VariantA = df[df['Variant'] == 'Variant A']['Time Spent']
VariantB = df[df['Variant'] == 'Variant B']['Time Spent']
Control = df[df['Variant'] == 'Control']['Time Spent']
p_val = stats.ttest_ind(VariantA, Control)
p_val_2 = stats.ttest_ind(VariantB, Control)
p_val_3 = stats.ttest_ind(VariantA, VariantB)
print(p_val)
print(p_val_2)
print(p_val_3)

Ttest_indResult(statistic=12.142363487472364, pvalue=8.488565644996449e-31)
Ttest_indResult(statistic=8.174237395991806, pvalue=1.496358076285182e-15)
Ttest_indResult(statistic=3.6788175394209075, pvalue=0.0002534771014765265)


we see there are significant differences between both variants and the control. This means we should have one variant that performs better than the others

In [74]:
df.groupby('Variant')[['CTR']].mean()

Unnamed: 0_level_0,CTR
Variant,Unnamed: 1_level_1
Control,0.098554
Variant A,0.120269
Variant B,0.108933


CTR is slightly better than Control, and since we know the results are statistically significant, but both variants are better than control.

In [75]:
df.groupby('Variant')[['Time Spent']].mean()

Unnamed: 0_level_0,Time Spent
Variant,Unnamed: 1_level_1
Control,20.070781
Variant A,24.805547
Variant B,23.343783


Time Spent is slightly better than Control, and since we know the results are statistically significant, but both variants are better than control.

Q3

The p-values for both CTR and Time Spent are:

CTR:
Control vs. Variant A: p-value = 1.9602781373243157e-38 < 0.05

Control vs. Variant B: p-value = 1.8743198199982106e-10 < 0.05

Variant A vs. Variant B: p-value = 3.587180487986577e-12 < 0.05

Time Spent:
Control vs. Variant A: p-value = 8.488565644996449e-31 < 0.05

Control vs. Variant B: p-value = 1.496358076285182e-15 < 0.05

Variant A vs. Variant B: p-value = 0.0002534771014765265 < 0.05

Since the p-values are less than the significance level (alpha = 0.05), we reject the null hypothesis and conclude that there is a statistically significant difference between the variant groups and the control group.

So,based on the analysis and statistical testing. Both Variant A and Variant B results a CTR lift and time lift compared to the control group.

As there are significant differences between both variants, one variant that performs better than the others. From the average CTR and Time Spent we can see Variant A perform better than Variant B.

I recommend the engineering team deploy Variant A. 


Q4 roll-out plan

Phase 1: Release Variant A to a small percentage (10%) of the user base. Monitor the CTR and Time Spent and gather user feedback during this phase.

Phase 2: After analyzing the performance and gathering feedback from Phase 1, expand the release to 25% of the user base. This will help validate the results on a larger scale.

Phase 3: Once Variant A has shown consistent positive results and received positive feedback from users, expand the release to 50% of the user base.

Phase 4: Finally, roll out Variant A to the remaining 50% of the user base, effectively replacing the control group with the new feature.



