### MBAN 6110 Assignment 2

In [1]:
import numpy as np
import pandas as pd
from scipy import stats
import seaborn as sns

In [2]:
dt = pd.read_csv('/Users/jiawenli/Desktop/MBAN_6110T/Assignment 2/experiment_dataset.csv')
dt

Unnamed: 0.1,Unnamed: 0,Age,Location,Device,Variant,Time Spent,CTR
0,0,62,Location2,Device2,Control,13.928669,0.084776
1,1,18,Location1,Device1,Variant B,11.310518,0.096859
2,2,21,Location2,Device1,Variant B,24.842100,0.097630
3,3,21,Location1,Device3,Variant B,20.061300,0.109783
4,4,57,Location1,Device2,Variant B,34.495503,0.068579
...,...,...,...,...,...,...,...
995,995,39,Location2,Device2,Variant B,17.252030,0.092211
996,996,38,Location3,Device2,Control,30.075898,0.078151
997,997,60,Location2,Device3,Control,31.929223,0.125213
998,998,35,Location2,Device2,Variant B,14.680299,0.095423


##### Data Cleaning Process

Drop the unnecessary column: Unnamed:0

In [3]:
dt.drop('Unnamed: 0', axis=1, inplace = True)
dt.head(10)

Unnamed: 0,Age,Location,Device,Variant,Time Spent,CTR
0,62,Location2,Device2,Control,13.928669,0.084776
1,18,Location1,Device1,Variant B,11.310518,0.096859
2,21,Location2,Device1,Variant B,24.8421,0.09763
3,21,Location1,Device3,Variant B,20.0613,0.109783
4,57,Location1,Device2,Variant B,34.495503,0.068579
5,27,Location3,Device1,Variant B,26.129246,0.149341
6,37,Location3,Device3,Variant B,20.525362,0.095788
7,39,Location2,Device1,Variant A,21.525217,0.149985
8,54,Location3,Device2,Control,21.910608,0.135535
9,41,Location1,Device2,Variant A,27.642788,0.137266


In [4]:
dt.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Age         1000 non-null   int64  
 1   Location    1000 non-null   object 
 2   Device      1000 non-null   object 
 3   Variant     1000 non-null   object 
 4   Time Spent  1000 non-null   float64
 5   CTR         1000 non-null   float64
dtypes: float64(2), int64(1), object(3)
memory usage: 47.0+ KB


Check for missing values

In [5]:
dt.isnull().sum()

Age           0
Location      0
Device        0
Variant       0
Time Spent    0
CTR           0
dtype: int64

Great news, there is no missing value in the dataset. This means our dataset is clean, and ready to do data analysis.

In [6]:
dt.drop_duplicates()

Unnamed: 0,Age,Location,Device,Variant,Time Spent,CTR
0,62,Location2,Device2,Control,13.928669,0.084776
1,18,Location1,Device1,Variant B,11.310518,0.096859
2,21,Location2,Device1,Variant B,24.842100,0.097630
3,21,Location1,Device3,Variant B,20.061300,0.109783
4,57,Location1,Device2,Variant B,34.495503,0.068579
...,...,...,...,...,...,...
995,39,Location2,Device2,Variant B,17.252030,0.092211
996,38,Location3,Device2,Control,30.075898,0.078151
997,60,Location2,Device3,Control,31.929223,0.125213
998,35,Location2,Device2,Variant B,14.680299,0.095423


This means that there is no duplicate in the dataset. This is good.

For outliers, we decide not to exclude them since they could provide useful insights.

#### Q1. Analyze the result

(a) Before we can design our experiments, we need to make sure that users' behavior is similar. This means that we need to make sure that all users in different regions, and using different devices do not differ too much in terms of click-through rates and average time spent. Only on this basis can we start the design of the experiment between the control group and the feature.

In [8]:
# First, we group the user by location
dt.groupby('Location')[['Time Spent', 'CTR']].mean()

Unnamed: 0_level_0,Time Spent,CTR
Location,Unnamed: 1_level_1,Unnamed: 2_level_1
Location1,22.707286,0.110217
Location2,22.648998,0.108517
Location3,22.787691,0.108708


It looks like there is not much difference between the average time spent and click-through rate of the three locations. Let's verify each metric with an ANOVA test. If indeed there is no significant difference, then we will get a high p-value

In [14]:
# Perform an ANOVA test for Time spent
loc_1 = dt[dt['Location'] == 'Location1']['Time Spent']
loc_2 = dt[dt['Location'] == 'Location2']['Time Spent']
loc_3= dt[dt['Location'] == 'Location3']['Time Spent']

F_stats, p_value = stats.f_oneway(loc_1, loc_2, loc_3)
print("ANOVA test:")
print("F-statistic: {:.4f}".format(F_stats))
print("p-value: {:.4f}".format(p_value))

ANOVA test:
F-statistic: 0.0536
p-value: 0.9478


In [15]:
# Perform an ANOVA test for CTR
loc1 = dt[dt['Location'] == 'Location1']['CTR']
loc2 = dt[dt['Location'] == 'Location2']['CTR']
loc3= dt[dt['Location'] == 'Location3']['CTR']

F_statistics, p_value = stats.f_oneway(loc1, loc2, loc3)
print("ANOVA test:")
print("F-statistic: {:.4f}".format(F_statistics))
print("p-value: {:.4f}".format(p_value))

ANOVA test:
F-statistic: 0.5792
p-value: 0.5605


The result indicates the following: 
1. The p-value of 0.95 indicates that there is no significant difference in time spent by users in different locations.
2. The p-value of 0.56 indicates that there is no significant difference in CTR from users in different locations.

In [12]:
# Now, let's group by different devices
dt.groupby('Device')[['Time Spent', 'CTR']].mean()

Unnamed: 0_level_0,Time Spent,CTR
Device,Unnamed: 1_level_1,Unnamed: 2_level_1
Device1,22.635032,0.109634
Device2,22.890021,0.109868
Device3,22.612276,0.107993


Again, it looks like there is not much difference between the average time spent and click-through rate of the three device. Let's also verify each metric with an ANOVA test.

In [17]:
# Perform an ANOVA test for Time spent by different devices
device_1 = dt[dt['Device'] == 'Device1']['Time Spent']
device_2 = dt[dt['Device'] == 'Device2']['Time Spent']
device_3= dt[dt['Device'] == 'Device3']['Time Spent']

F_stats, p_value = stats.f_oneway(device_1, device_2, device_3)
print("ANOVA test:")
print("F-statistic: {:.4f}".format(F_stats))
print("p-value: {:.4f}".format(p_value))

ANOVA test:
F-statistic: 0.2665
p-value: 0.7661


In [18]:
# Perform an ANOVA test for CTR by different devices
device1 = dt[dt['Device'] == 'Device1']['CTR']
device2 = dt[dt['Device'] == 'Device2']['CTR']
device3= dt[dt['Device'] == 'Device3']['CTR']

F_statistics, p_value = stats.f_oneway(device1, device2, device3)
print("ANOVA test:")
print("F-statistic: {:.4f}".format(F_statistics))
print("p-value: {:.4f}".format(p_value))

ANOVA test:
F-statistic: 0.7106
p-value: 0.4916


The result indicates the following: 
1. The p-value of 0.77 indicates that there is no significant difference in time spent by users with different devices.
2. The p-value of 0.49 indicates that there is no significant difference in CTR from users with different devices.

##### (b) Click Through rate analysis:

We first calculate the average CTR for each variant, using a method that adds up the CTR values for all that variant and then divides them by the total number of data points for that variant.

Once the average hit rate for variant A and variant B is obtained, the two can be compared.

First, Calculate the average CTR for variant A

In [19]:
dt_var_A = dt[dt['Variant'] == 'Variant A']
dt_var_A

Unnamed: 0,Age,Location,Device,Variant,Time Spent,CTR
7,39,Location2,Device1,Variant A,21.525217,0.149985
9,41,Location1,Device2,Variant A,27.642788,0.137266
13,30,Location2,Device3,Variant A,26.208502,0.087875
15,56,Location2,Device2,Variant A,7.800901,0.069781
17,41,Location2,Device3,Variant A,32.699437,0.114626
...,...,...,...,...,...,...
983,48,Location2,Device1,Variant A,28.171759,0.144396
984,20,Location1,Device3,Variant A,31.648129,0.121159
985,37,Location1,Device3,Variant A,24.543317,0.137414
992,44,Location1,Device2,Variant A,20.161483,0.098470


In [20]:
(dt_var_A['CTR'].sum())/330

0.12026949300288214

Thus, the average hit rate for variant A is roughly 0.1203

Then, Calculate the average CTR for variant B

In [21]:
dt_var_B = dt[dt['Variant'] == 'Variant B']
dt_var_B

Unnamed: 0,Age,Location,Device,Variant,Time Spent,CTR
1,18,Location1,Device1,Variant B,11.310518,0.096859
2,21,Location2,Device1,Variant B,24.842100,0.097630
3,21,Location1,Device3,Variant B,20.061300,0.109783
4,57,Location1,Device2,Variant B,34.495503,0.068579
5,27,Location3,Device1,Variant B,26.129246,0.149341
...,...,...,...,...,...,...
989,31,Location2,Device2,Variant B,32.952763,0.098476
991,35,Location3,Device1,Variant B,22.409808,0.100951
995,39,Location2,Device2,Variant B,17.252030,0.092211
998,35,Location2,Device2,Variant B,14.680299,0.095423


In [22]:
(dt_var_B['CTR'].sum())/330

0.1089330399532712

Thus, the average hit rate for variant B is approximately 0.1089

Now, we can compare the two average hit rate for variant A and variant B  with 0.1203 and 0.1089 respectively.

Here we can see that the average click-through rate of variant A is higher than that of variant B, which indicates an elevated click-through rate for variant A.

##### (c) Analysis of the average time spent by clients for different variants:

We can calculate the average time spent by the client for each variant by adding the values of the time spent for the corresponding variant and dividing them by the total number of data for that variant.
Then compare the average time spent by the client on variant A and variant B, respectively.

First, Calculate the average time spent for variant A

In [23]:
(dt_var_A['Time Spent'].sum())/330

24.805547386576052

Thus, the average time spent on variant A is about 24.81

Next, calculate the average time spent for variant B

In [24]:
(dt_var_B['Time Spent'].sum())/330

23.343782979234575

The average time spent on variant B is roughly 23.34

The average time spent for variant A is slightly higher than that of variant B, which indicates that the time spent for variant A has been improved.

However, it is worth to mention that the difference between Variant A and Variant B is not particularly dramatic, both in terms of average time spent and click-through rate. Then it is better to use statistical testing to further analyze the increase and change in click-through rates between the variants

#### Q2. Conduct statistical testing to determine if there is a statistically significant difference between the features and the control group.

In [25]:
from scipy import stats

##### 2.1 Let us see the CTR first

In [26]:
#Get the CTR data for the control group and name it control_group
control_group = dt[dt['Variant'] == 'Control']['CTR']

#Get the CTR data for the variant A and name it variant_A
variant_A = dt[dt['Variant'] == 'Variant A']['CTR']

#Get the CTR data for the variant B and name it variant_B
variant_B = dt[dt['Variant'] == 'Variant B']['CTR']

Let's do t-test first, to see if there is a significant difference between the CTR of control group and Variant A:

Null hypothesis for Variant A: There is no significant difference between the Variant A and the control group in terms of CTR.
 
Alternative hypothesis for Variant A: There is a statistically significant difference between the Variant A and the control group in terms of CTR.

In [27]:
# Perform t-test between control group and Variant A
t_stats_a, p_value_a = stats.ttest_ind(control_group, variant_A)

Similarly, let's also do a t-test to see if there is a significant difference between the CTR of control group and that of Variant B:

Null hypothesis for Variant B: The mean of CTR for Variant B is equal to the mean of CTR for the control group.

Alternative hypothesis for Variant B: The mean of CTR in Variant B does not equal to the mean of CTR for the control group.

In [28]:
# Perform t-test between control group and Variant B
t_stats_b, p_value_b = stats.ttest_ind(control_group, variant_B)

Print out the test score amd make a comparison between Control Group and Variant A

In [29]:
# Print the t-statistic and p-value for each comparison
print("T-statistics: {:.4f}".format(t_stats_a))
print("p-value: {:.4f}".format(p_value_a))

T-statistics: -13.8294
p-value: 0.0000


Similarly, do the same thing amd make a comparison between Control Group and Variant B

In [30]:
print("T-statistics: {:.4f}".format(t_stats_b))
print("p-value: {:.4f}".format(p_value_b))

T-statistics: -6.4718
p-value: 0.0000


We found that the p-value of both tests was zero.
A p-value of zero indicates that there is extremely strong evidence against the null hypothesis. It suggests that if the null hypothesis is true, then it is highly likely that the observed data did not occur by chance.

We should reject the null hypothesis in favor of the alternative hypothesis. 
And so, there is a statistically significant difference between the Variant A and the control group in terms of CTR.

Similarly, there is a statistically significant difference between the Variant B and the control group in terms of CTR.

##### Perform ANOVA test

Null hypothesis: There is no significant difference among the groups.

Alternative hypothesis: There exists significant differences among the groups.

In [31]:
F_stats, p_value = stats.f_oneway(control_group, variant_A, variant_B)

##### Print the F-statistic and p-value

In [32]:
print("ANOVA test:")
print("F-statistic: {:.4f}".format(F_stats))
print("p-value: {:.4f}".format(p_value))

ANOVA test:
F-statistic: 93.5889
p-value: 0.0000


The resulting F-statistic with a value of 93.5889 measures the variability between groups as well as the ratio of variability within each group. The higher F-statistic indicates that there is a large variation between the groups.

Therefore, we reject the null hypothesis in favor of the alternative hypothesis. That is, there is a significant difference among the control groups and variant features.

##### 2.2 Next, let us focus on the Average Time spent

Simialrly, let's do t-test first, to see if there is a significant difference between the Average time spent for control group and Variant A:

Null hypothesis for Variant A: There is no significant difference between the Variant A and the control group in terms of Average time spent.
 
Alternative hypothesis for Variant A: There is a statistically significant difference between the Variant A and the control group in terms of Average time spent.

In [33]:
#Get the avg time spent data for the control group and name it control_group
control_group_time = dt[dt['Variant'] == 'Control']['Time Spent']

#Get the avg time spent data for the variant A and name it variant_A
variant_A_time = dt[dt['Variant'] == 'Variant A']['Time Spent']

#Get the avg time spent data for the variant B and name it variant_B
variant_B_time = dt[dt['Variant'] == 'Variant B']['Time Spent']

In [34]:
# Perform t-test between control group and Variant A
t_stats_a_time, p_value_a_time = stats.ttest_ind(control_group, variant_A_time)

# Perform t-test between control group and Variant B
t_stats_b_time, p_value_b_time = stats.ttest_ind(control_group, variant_B_time)

Print the t-statistic and p-value for variant A and control group

In [35]:
# Print the t-statistic and p-value for each comparison
print("T-statistics: {:.4f}".format(t_stats_a_time))
print("p-value: {:.4f}".format(p_value_a_time))

T-statistics: -91.7710
p-value: 0.0000


Print the t-statistic and p-value for variant B and control group

In [36]:
print("T-statistics: {:.4f}".format(t_stats_b_time))
print("p-value: {:.4f}".format(p_value_b_time))

T-statistics: -81.7996
p-value: 0.0000


The result of t-statistics and p-value for average time spent also indicates that there is a significant difference between the features and the control group.

##### 2.3 Let's figure out which variant is better by the mean

In [39]:
dt.groupby('Variant')[['Time Spent','CTR']].mean()

Unnamed: 0_level_0,Time Spent,CTR
Variant,Unnamed: 1_level_1,Unnamed: 2_level_1
Control,20.070781,0.098554
Variant A,24.805547,0.120269
Variant B,23.343783,0.108933


According to the results, variant A is a bit better.

#### Q3. Summarize your results. Make a recommendation to the engineering team about which feature to deploy. 

A p-value equal to 0 for both variant A and variant B compared to the control group would imply that the observed differences are extremely statistically significant. a p-value of 0 would indicate that the probability of obtaining the observed differences with the null hypothesis being true is essentially impossible.

Given that both variants are extremely significantly different compared to the control group, we can then conclude that both variant A and variant B outperform the control group to some extent.

Based on the statistical test and data analysis, I think we should deploy variant A. Because variant A shows a bigger gap in the result than B. Therefore, based on the measured results alone, my recommendation to the engineering team is to deploy variant A.


#### Q4. Create a roll-out plan. How quickly will you introduce the feature to your audience?

When creating a rollout plan to introduce a new feature to your audience, it is important to thoroughly and carefully consider factors such as the complexity of the new feature, potential risks that may exist, the impact on users, and the ability of your infrastructure to handle and host the increased usage. The following is the outline I designed for this rollout plan:

1. Begin by introducing the feature to only a small percentage of your audience.
2. Monitor the performance of new features. Gather feedbacks from the users.
3. Identify and modify any potential problems or bugs. Make improvements.
4. Incremental expansion. Gradually and incrementally expand the scope.
5. Iteratively evaluate the performance and make necessary adjustments to ensure user satisfaction.
