In [44]:
import numpy as np
import pandas as pd
from scipy import stats
import seaborn as sns

dt=pd.read_csv('/Users/xinyuanliang/Desktop/experiment_dataset.csv')
dt.head(10)

Unnamed: 0.1,Unnamed: 0,Age,Location,Device,Variant,Time Spent,CTR
0,0,62,Location2,Device2,Control,13.928669,0.084776
1,1,18,Location1,Device1,Variant B,11.310518,0.096859
2,2,21,Location2,Device1,Variant B,24.8421,0.09763
3,3,21,Location1,Device3,Variant B,20.0613,0.109783
4,4,57,Location1,Device2,Variant B,34.495503,0.068579
5,5,27,Location3,Device1,Variant B,26.129246,0.149341
6,6,37,Location3,Device3,Variant B,20.525362,0.095788
7,7,39,Location2,Device1,Variant A,21.525217,0.149985
8,8,54,Location3,Device2,Control,21.910608,0.135535
9,9,41,Location1,Device2,Variant A,27.642788,0.137266


### Data Cleaning

In [45]:
# Remove the unnecessary column
dt.drop('Unnamed: 0', axis=1, inplace = True)
dt.head(10)

Unnamed: 0,Age,Location,Device,Variant,Time Spent,CTR
0,62,Location2,Device2,Control,13.928669,0.084776
1,18,Location1,Device1,Variant B,11.310518,0.096859
2,21,Location2,Device1,Variant B,24.8421,0.09763
3,21,Location1,Device3,Variant B,20.0613,0.109783
4,57,Location1,Device2,Variant B,34.495503,0.068579
5,27,Location3,Device1,Variant B,26.129246,0.149341
6,37,Location3,Device3,Variant B,20.525362,0.095788
7,39,Location2,Device1,Variant A,21.525217,0.149985
8,54,Location3,Device2,Control,21.910608,0.135535
9,41,Location1,Device2,Variant A,27.642788,0.137266


In [46]:
dt.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Age         1000 non-null   int64  
 1   Location    1000 non-null   object 
 2   Device      1000 non-null   object 
 3   Variant     1000 non-null   object 
 4   Time Spent  1000 non-null   float64
 5   CTR         1000 non-null   float64
dtypes: float64(2), int64(1), object(3)
memory usage: 47.0+ KB


In [21]:
# Find missing values
dt.isnull().sum()

Age           0
Location      0
Device        0
Variant       0
Time Spent    0
CTR           0
dtype: int64

### Q1 Analyze the results to determine which feature (if any) results in CTR or Time Spent lift.

Users shouldn't have significant differences between the amounts of time spent and CTR to make sure the users are similar in behaviour. We should compare users by Location, Device and use statistical tests to support our analysis.

In [22]:
# By location
dt.groupby('Location')[['Time Spent','CTR']].mean()

Unnamed: 0_level_0,Time Spent,CTR
Location,Unnamed: 1_level_1,Unnamed: 2_level_1
Location1,22.707286,0.110217
Location2,22.648998,0.108517
Location3,22.787691,0.108708


In [23]:
# checking time spent
loc_1 = dt[dt['Location'] == 'Location1']['Time Spent']
loc_2 = dt[dt['Location'] == 'Location2']['Time Spent']
loc_3 = dt[dt['Location'] == 'Location3']['Time Spent']

p_val = stats.f_oneway(loc_1, loc_2, loc_3)
print(p_val)

F_onewayResult(statistic=0.05357883967057365, pvalue=0.9478339402848069)


The ANOVA test results suggest that there is no significant difference in the average time spent among different locations (p-value = 0.9478).

In [24]:
# checking CTR
loc_1 = dt[dt['Location'] == 'Location1']['CTR']
loc_2 = dt[dt['Location'] == 'Location2']['CTR']
loc_3 = dt[dt['Location'] == 'Location3']['CTR']

p_val = stats.f_oneway(loc_1, loc_2, loc_3)
print(p_val)

F_onewayResult(statistic=0.5792245145655729, pvalue=0.5605211716238133)


The ANOVA test results indicate that there is no significant difference in the average CTR among different locations (p-value = 0.5605).


In [47]:
# By device
dt.groupby('Device')[['Time Spent', 'CTR']].mean()

Unnamed: 0_level_0,Time Spent,CTR
Device,Unnamed: 1_level_1,Unnamed: 2_level_1
Device1,22.635032,0.109634
Device2,22.890021,0.109868
Device3,22.612276,0.107993


In [25]:
# checking time spent
device_1 = dt[dt['Device'] == 'Device1']['Time Spent']
device_2 = dt[dt['Device'] == 'Device2']['Time Spent']
device_3 = dt[dt['Device'] == 'Device3']['Time Spent']

p_val = stats.f_oneway(device_1, device_2, device_3)
print(p_val)

F_onewayResult(statistic=0.2664537181183386, pvalue=0.7661459958744103)



The ANOVA test results indicate that there is no significant difference in the average time spent among different locations (p-value = 0.7661).

In [26]:
# checking CTR
device_1 = dt[dt['Device'] == 'Device1']['CTR']
device_2 = dt[dt['Device'] == 'Device2']['CTR']
device_3 = dt[dt['Device'] == 'Device3']['CTR']

p_val = stats.f_oneway(device_1, device_2, device_3)
print(p_val)

F_onewayResult(statistic=0.7105872492654717, pvalue=0.4916042399968955)



The ANOVA test results suggest that there is no significant difference in the average CTR among different locations (p-value = 0.4916).

### Calculate the average CTR & Time Spent for both Variant

In [27]:
# Calculate the average CTR for Variant A
varA_avg_ctr = dt[dt['Variant'] == 'Variant A']['CTR'].mean()
varA_avg_ctr

0.12026949300288214

In [28]:
# Calculate the average CTR for Variant B
varB_avg_ctr = dt[dt['Variant'] == 'Variant B']['CTR'].mean()
varB_avg_ctr

0.1089330399532712

There is little difference between the two variants, but the average CTR of variant A is higher than that of variant B (0.120269 > 0.108933), which indicates that variant A is letting in a higher CTR.

In [29]:
# Calculate the average Time Spent for Variant A
varA_avg_time_spent = dt[dt['Variant'] == 'Variant A']['Time Spent'].mean()
varA_avg_time_spent


24.805547386576052

In [30]:
# Calculate the average Time Spent for Variant B
varB_avg_time_spent = dt[dt['Variant'] == 'Variant B']['Time Spent'].mean()
varB_avg_time_spent

23.343782979234575

There is little difference between the two variants, but the average time spent of variant A is higher than that of variant B (24.805547 > 23.343783), which indicates that variant A is letting in a higher time spent.

### Q2 Conduct statistical testing to determine if there is a statistically significant difference between the features and the control group.

In [49]:
# mean values for each group
dt.groupby('Variant')[['Time Spent','CTR']].mean()

Unnamed: 0_level_0,Time Spent,CTR
Variant,Unnamed: 1_level_1,Unnamed: 2_level_1
Control,20.070781,0.098554
Variant A,24.805547,0.120269
Variant B,23.343783,0.108933


## Checking Time Spent

In [31]:
var_A_time_spent = dt[dt['Variant']=='Variant A']['Time Spent']
var_B_time_spent = dt[dt['Variant']=='Variant B']['Time Spent']
control_group_time_spent = dt[dt['Variant']=='Control']['Time Spent']

## T-test (Time Spent)

Compare Variant A & control group:

H0 : The mean of time spent for Variant A = the mean of time spent for the control group.

H1 : The mean of time spent for Variant A ≠ to the mean of time spent for the control group.


In [32]:
t_stat_A, p_val_A = stats.ttest_ind(var_A_time_spent, control_group_time_spent)

print(f"T-statistic: {t_stat_A:.5f}")
print(f"P-value: {p_val_A:.5f}")

T-statistic: 12.14236
P-value: 0.00000


The T-statistic of 12.14236 and the p-value of 0.00000 indicate a significant difference in the mean time spent between Variant A and the control group.(Reject H0, and conclude H1)

Compare Variant B & control group:

H0 : The mean of time spent for Variant B = the mean of time spent for the control group.

H1 : The mean of time spent for Variant B ≠ to the mean of time spent for the control group.

In [33]:
t_stat_B, p_val_B = stats.ttest_ind(var_B_time_spent, control_group_time_spent)

print(f"T-statistic: {t_stat_B:.5f}")
print(f"P-value: {p_val_B:.5f}")

T-statistic: 8.17424
P-value: 0.00000


The T-statistic of 8.17424 and the p-value of 0.00000 suggest a significant difference in the mean time spent between Variant B and the control group.(Reject H0, and conclude H1)

Compare Variant A & Variant B:

H0 : The mean of time spent for Variant A = the mean of time spent for Variant B.

H1 : The mean of time spent for Variant A ≠ to the mean of time spent for Variant A.

In [34]:
t_stat_C, p_val_C = stats.ttest_ind(var_A_time_spent, var_B_time_spent)

print(f"T-statistic: {t_stat_C:.5f}")
print(f"P-value: {p_val_C:.5f}")

T-statistic: 3.67882
P-value: 0.00025


There is also a significant difference in mean time spent between Variant A and Variant B, with a T-statistic of 3.67882 and a small p-value of 0.00025.(Reject H0, and conclude H1)

## Checking CTR

In [35]:
var_A_ctr = dt[dt['Variant']=='Variant A']['CTR']
var_B_ctr = dt[dt['Variant']=='Variant B']['CTR']
control_group_ctr = dt[dt['Variant']=='Control']['CTR']

## T-test (CTR)
Compare Variant A & control group:

H0 : The mean of CTR for Variant A = the mean of CTR for the control group.

H1 : The mean of CTR for Variant A ≠ to the mean of CTR for the control group.

In [36]:
# t-test between control group and Variant A
t_stat_A, p_val_A = stats.ttest_ind(var_A_ctr, control_group_ctr)

print(f"T-statistic: {t_stat_A:.5f}")
print(f"P-value: {p_val_A:.5f}")

T-statistic: 13.82942
P-value: 0.00000


The T-statistic of 13.82942 and the p-value of 0.00000 suggest a significant difference in the mean CTR between Variant A and the control group. (Reject H0, and conclude H1)

Compare Variant B & control group:

H0 : The mean of CTR for Variant B  = the mean of CTR for the control group.

H1 : The mean of CTR for Variant B ≠ the mean of CTR for the control group.

In [37]:
# t-test between control group and Variant B
t_stat_B, p_val_B = stats.ttest_ind(var_B_ctr, control_group_ctr)

print(f"T-statistic: {t_stat_B:.5f}")
print(f"P-value: {p_val_B:.5f}")

T-statistic: 6.47181
P-value: 0.00000


The T-statistic of 6.47181 and the p-value of 0.00000 indicate a significant difference in the mean CTR between Variant B and the control group.(Reject H0, and conclude H1)

Compare Variant A & Variant B:

H0 : The mean of CTR for Variant A = the mean of CTR for Variant B.

H1 : The mean of CTR for Variant B ≠ the mean of CTR for Variant B.

In [38]:
# t-test between Variant A and Variant B
t_stat_C, p_val_C = stats.ttest_ind(var_A_ctr, var_B_ctr)

print(f"T-statistic: {t_stat_C:.5f}")
print(f"P-value: {p_val_C:.5f}")

T-statistic: 7.08500
P-value: 0.00000


There is also a significant difference in mean CTR between Variant A and Variant B, with a T-statistic of 7.08500 and a p-value close to 0.(Reject Ho, and conclude H1)

## ANOVA test

H0: There is no significant difference among the variants.

H1: There is significant differences among the variants.

In [39]:
# checking time spend
var_A_time_spent = dt[dt['Variant']=='Variant A']['Time Spent']
var_B_time_spent = dt[dt['Variant']=='Variant B']['Time Spent']
control_group_time_spent = dt[dt['Variant']=='Control']['Time Spent']

F_stats,p_val = stats.f_oneway(var_A_time_spent, var_B_time_spent, control_group_time_spent)
print("F-statistic: {:.4f}".format(F_stats))
print("p-value: {:.4f}".format(p_val))



F-statistic: 75.6084
p-value: 0.0000


The F-statistic of 75.6084 and the p-value of 0.0000 for time spent indicate a significant difference in the means among the different variants.(Reject H0, and conclude H1)

H0: There is no significant difference among the variants.

H1: There is significant differences among the variants.

In [40]:
# checking CTR
var_A_ctr = dt[dt['Variant']=='Variant A']['CTR']
var_B_ctr = dt[dt['Variant']=='Variant B']['CTR']
control_group_ctr = dt[dt['Variant']=='Control']['CTR']

F_stats,p_val = stats.f_oneway(var_A_ctr,var_B_ctr,control_group_ctr)
print("F-statistic: {:.4f}".format(F_stats))
print("p-value: {:.4f}".format(p_val))

F-statistic: 93.5889
p-value: 0.0000


The F-statistic of 93.5889 and the p-value of 0.0000 for CTR suggest a significant difference in the means among the different variants.(Reject H0, and conclude H1)

### Q3 Summarize your results. Make a recommendation to the engineering team about which feature to deploy. 






Based on the analysis of the dataset, there were no significant differences in the average time spent and click-through rates across locations (Location 1, Location 2 and Location 3), as indicated by the high p-values obtained from the one-way ANOVA tests. Likewise, the analysis of time spent and click-through rates for the different devices (Device1, Device2 and Device3) did not reveal any significant differences.

Based on the analysis performed, we compared the performance of three different variants: variant A, variant B and the control group. The metrics examined were the average time spent and the click-through rate (CTR).

When comparing the performance of both variant A and variant B with the control, it is clear that both variants outperform the control in terms of time spent and click-through rate, as indicated by the significantly lower p-values obtained from the t-test. These results suggest that deploying either variant A or variant B may increase user engagement and interaction compared to the control.

When directly comparing variant A and variant B, we find significant differences in both time spent and click-through rates. The t-statistic for time spent is 3.67882, while the CTR is 7.08500. These results indicate a significant performance difference between the two variants.

The ANOVA test provides additional support for our conclusions. For time spent, the F-statistic is 75.6084, while for click-through rate, it is 93.5889. These values indicate a significant difference in the mean values between the two variants of the metric.

In summary, both variant A and variant B outperform the control group in terms of time spent and click-through rate. Deploying either variant is likely to increase user engagement and interaction. 

However, compared to variant B, variant A showed higher t-statistics and lower p-values, indicating stronger evidence of superiority in terms of time spent and click-through rate, so I recommend that engineering teams deploy variant A.

### Q4 Create a roll-out plan. How quickly will you introduce the feature to your audience?

The roll-out plan needs to ensure a smooth transition and minimize potential risks and disruptions. Regular communication with the development team, stakeholders and audiences throughout the rollout process is critical.

The rollout plan can include a phased implementation strategy, starting with a small target group to gather initial feedback and evaluate impact. This initial phase helps validate statistical results and provides the opportunity to optimize functionality based on user response.

If the initial phase is successful, the application can be gradually rolled out to a larger audience, carefully monitoring key performance indicators and user feedback. This iterative process allows for adjustments and optimizations to improve the user experience and maximize the utility of the functionality.

Following a structured roll-out plan that includes ongoing monitoring, integration of feedback, and effective communication will ensure a positive user experience that maximizes the potential benefits to both the user and the business.The speed of deployment should be determined by factors such as complexity of functionality, technical feasibility and resource availability.