#### Analysing the Impact of Altering Paywall Headline on Subsription Rate

An A/B Test is a randomized experiment in which a two or more variants a variable are deployed to different segments of customers to determine which of the variant is most effective in boosting KPIs. Here are some scenarios where AB testing is used:

- Streaming services: Determining whether changing the movie recommendation algorithm increases user engagement
- E-commerce: Determining which product page layout results in the highest proportion of checkouts
Product & Service Advertising: Determining if the usage of emojis in Advertisment headlines result in higher click rates

For this case, we will be analysing the A/B test results data of an online learning platform. The platform operates on a freemium model, some of its courses are free, but the advanced courses are locked behind a paywall. The platform provides a 2 week trial for its advanced courses. At the end of the trial period, users are directed to a paywall. Two paywall variants with differing headlines were compared in this A/B test. 


- "We hope you enjoyed learning with us. Please consider subscribing to learn more." (Current, Control)
- "Your trial has ended! Subscribe now so you dont miss out!" (Test)

In [19]:
# imports
import numpy as np
import pandas as pd
from scipy.stats import norm

import configparser

config = configparser.ConfigParser()
config.read('config.ini')

pd.options.display.max_columns=None
pd.options.display.max_rows=None
pd.options.display.max_colwidth=None

In [4]:
PATH = config['Paths']['ab_test_data']
df_ab = pd.read_csv(f'{PATH}ab_testing_results.csv', delimiter=",")
print(df_ab.info())

df_ab.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45883 entries, 0 to 45882
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   uid         45883 non-null  int64 
 1   country     45883 non-null  object
 2   gender      45883 non-null  object
 3   group       45883 non-null  object
 4   device      45883 non-null  object
 5   subscribed  45883 non-null  object
dtypes: int64(1), object(5)
memory usage: 2.1+ MB
None


Unnamed: 0,uid,country,gender,group,device,subscribed
0,72629692,FRA,F,control,android,Yes
1,25633647,GBR,F,control,android,No
2,31206551,BRA,M,test,ios,No
3,87162368,USA,M,control,android,Yes
4,88562222,USA,M,test,android,No


Before we can analyse the results, we need to ensure that the A/B test was deployed correctly. Randomness is key for the test to yield optimal results. Hence, the distribution of the country, gender and device features within the control and test groups should be approximately equal to one another. This is to eliminate the possibility of a confounding variable impacting our results. Also, the sample size for both groups would ideally be large (>30) and roughly the same. Any violation of these criteria is considered to be sub-optimal practices which could lead to misleading test results.

In [13]:
# check dist of country for control & test groups
df_ab.groupby('group')[['country']].value_counts(normalize=True)

group    country
control  USA        0.301056
         BRA        0.196358
         MEX        0.125429
         DEU        0.082142
         TUR        0.078013
         FRA        0.062280
         GBR        0.060237
         ESP        0.042157
         CAN        0.030336
         AUS        0.021991
test     USA        0.309128
         BRA        0.196336
         MEX        0.115502
         DEU        0.078954
         TUR        0.076156
         GBR        0.062210
         FRA        0.061598
         ESP        0.042144
         CAN        0.035630
         AUS        0.022340
Name: proportion, dtype: float64

In [12]:
# check dist of gender and device for control and test groups
df_ab.groupby('group')[['gender', 'device']].value_counts(normalize=True)

group    gender  device 
control  M       android    0.255161
                 ios        0.250641
         F       ios        0.250120
                 android    0.244078
test     F       android    0.251814
         M       ios        0.251071
                 android    0.250721
         F       ios        0.246393
Name: proportion, dtype: float64

In [15]:
print(f'No. of customers in total sample: {df_ab.uid.nunique()}')

# check sample size of control and test groups
df_ab['group'].value_counts(normalize=True)

No. of customers in total sample: 45883


group
control    0.501471
test       0.498529
Name: proportion, dtype: float64

We can observe that the distribution of country, gender and device in both control and test along with the sample size of two groups adhere to optimal A/B testing deployment.

We will now perform hypothesis test to draw a conclusion from the A/B test conducted.

#### From business

The platform only considers the test paywall to be worth implementing if its rate is 3% greater than that of the control group. Thus, the null and alternative hypotheses for our hypothesis test is as follows:

- `Hypo(0): p(test) - p(control) = 3%`
- `Hypo(1): p(test) - p(control) > 3%`

This is a one-sided two sample proportion test as we are testing if `p(test) - p(control) > 3`. We will be conducting this test at the `5%` significance level. We will assume that `Hypo(0)` is true to conduct the test.

In [16]:
alpha = 0.05

# get sample proportions & sample sizes
sample_data = df_ab.groupby('group')['subscribed'].value_counts(normalize=True)
sample_size = df_ab['group'].value_counts()

sample_test = sample_data[('test','Yes')]
sample_control = sample_data[('control','Yes')]

size_test = sample_size['test']
size_control = sample_size['control']

# compute diff of sample proportions
print(sample_test-sample_control)

0.04007702404356617


So the subscription rate of the test sample is 4% higher than that of the control sample. We need to determine if this is statistically significant.

In [17]:
# compute weighted mean of sample proportions
sample_weighted_mean = ((size_test * sample_test) + (size_control * sample_control))/(size_test + size_control)

# calculate std error of diff in sample proportions
sample_weighted_mean_se = np.sqrt((sample_weighted_mean * (1-sample_weighted_mean)) / size_test + 
                                  (sample_weighted_mean * (1-sample_weighted_mean)) / size_control)

# compute z-score
z_score = ((sample_test - sample_control) - 0.03)/sample_weighted_mean_se
print(z_score)

2.158522748076968


Now that we have caculated the test statistic, we can proceed to calculate the p-value. The p-value is defined as the probabilty of obtaining a result which is equal to or more extreme than the result observed, assuming the null hypothesis is true.

- p_value > alpha indicates that there is evidence supporting `Hypo(0)`
- p_value <= alpha indicates that there is evidence against `Hypo(0)`

In [24]:
# compute p-value
p_value = 1 - norm.cdf(z_score)

print(f'p-value: {round(p_value,3)}')
print(f'alpha value: {alpha}')
print()
print(f'p_value > alpha: {p_value > alpha}')
print(f'p_value <= alpha: {p_value <= alpha}')

p-value: 0.015
alpha value: 0.05

p_value > alpha: False
p_value <= alpha: True


Since the p_value is less than the significance level, there is evidence aganist the null hypothesis. We reject the null hypothesis. There is evidence at the 5% significance level that the subscription rate of the test group is 3% more than that of the control group. Hence the Platform should go ahead and implement the proposed paywall headline!

Moreover, we can calculate a 95% confidence interval for the difference in subscription rates as follows:

In [26]:
conf_int_95 = [(sample_test - sample_control) - (norm.ppf(1-0.025)*sample_weighted_mean_se), (sample_test - sample_control) + (norm.ppf(1-0.025)*sample_weighted_mean_se)]  
print(conf_int_95)

[0.030926967962698025, 0.04922708012443431]


There is a 95% chance that this interval encompasses the true increase in subscription rate when switching to the proposed paywall headline.