## CLICK THROUGH RATE ($CTR$)

This dataset was provided by <name>. It is a simulated data from an experiment to check if click through rate increases when the website's engagement button is changed from "secure free trial" to "enroll now". 

In this notebook, a hypothesis test is conducted to verify there is a difference between the $CTRs$ of the experimental group and the control group.

The overall strategy is to examine the dataset to understand it, clean and prepare it for the test.

In [None]:
# in this cell, all the needed libraries are imported.
import numpy as np # numpy for linear algerbra.
import pandas as pd # pandas for data preparation and cleaning.
import plotly.express as px # plotly express for graphing
import plotly.graph_objects as go
from scipy.stats import chi2_contingency, norm, chi2 # chi2_contingency, and norm for chi-squared test.
color_palette = ['#E74C3C','#2ECC71'] # definition of color palette.

In [2]:
# load the dataset and examine the first five rows.
rec_sys_df = pd.read_csv('./data/ab_test_click_data.csv')
rec_sys_df.sample(5)

Unnamed: 0,user_id,click,group,timestamp
1540,1541,1,exp,2024-01-02 01:40:00
9809,9810,1,exp,2024-01-07 19:29:00
6193,6194,1,exp,2024-01-05 07:13:00
16407,16408,1,con,
12803,12804,0,con,


The dataset contains four variables:
- user_id: unique identifier for all the users in the experiment.
- click: a binary column having a 1 if a user clicked and 0 if otherwise.
- group: contains the two groups of the experiment. `exp` for the experiment group and `con` for the control group.
- timestamp: the time in which the click event happened.

In the next cell is the peep into some basic information of about the dataset.

In [3]:
# display the metadats of the dataset.
rec_sys_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   user_id    20000 non-null  int64 
 1   click      20000 non-null  int64 
 2   group      20000 non-null  object
 3   timestamp  10000 non-null  object
dtypes: int64(2), object(2)
memory usage: 625.1+ KB



The dataset consists of $20k$ entries numbered from $0$ to $19,999$. The `user_id` and `click` are `int64` objects while `group` and `timestamp` are generic pandas string objects. 

While all other entries have full entries, timestamp is half-way. 

Finally, the total memory in use is about 625 kilobyte.

In [4]:
# check for duplicates
rec_sys_df.duplicated().sum()

np.int64(0)

There is no duplicated row! 

In [5]:
# check the rows for null entries.
rec_sys_df.isna().sum()

user_id          0
click            0
group            0
timestamp    10000
dtype: int64

As was seen earlier, the `timestamp` has $10k$ null entries. Given the limited variables available for this experiment, the timestamp will be of no use for this test. 

In [6]:
# display some statistic summary of the numeric variables of the dataset.
rec_sys_df.describe()

Unnamed: 0,user_id,click
count,20000.0,20000.0
mean,10000.5,0.40525
std,5773.647028,0.490953
min,1.0,0.0
25%,5000.75,0.0
50%,10000.5,0.0
75%,15000.25,1.0
max,20000.0,1.0


The statistics of the `uder_id` make little sense in the part of interpretation. For the `click` variable, the average is $0.40525$.

Although the statistics of the pandas' object dtypes are less common in practice, it could show some vital information.

In [7]:
# statistics of the object variables.
rec_sys_df.describe(exclude='number')

Unnamed: 0,group,timestamp
count,20000,10000
unique,2,10000
top,exp,2024-01-07 22:23:00
freq,10000,1


The important varaiable to look at here is `group`. The total count is $20k$ as we have seen before. There are $2$ unique entries: the experimental group and the control group. The output claims that the experimental group is the top occuring but looking at the frequeny, it is $10k$ which shows that the two groups are equally distributed.

In the following cell, there is a deep dive into the distribution of the each of the variable to understand how data is distributed within them. Even though the variables of interest are categorical and mostly binary, the outcome is already trivial but it is better to confirm than assume.

> `varable_count()` was intended to be used only for this experiment. if there is need to use it elsewhere, there has to be some modification in which the total number of observations per group would have to be detected or passed as an argument to be used in the first block of code instead of hard-coding it. the hard-code is easier here since each group has the same number of observations.

In [8]:
# define a functon to count the unique entries of any column. call on the function to display the count of the click column
def variable_count(df, group_col:str, filter_col:str=None):

    '''variable count counts the total occurrence of the unique values in a column with the corresponding
    percentages.

    Arguments:
    ===============================
    df: a datarame object. 
    col: a string object. the name of the column of interest.

    Return:
    ===============================
    returns a datafram object.
    '''
    if filter_col:
        distribution_count = df.groupby(group_col)[filter_col].value_counts().reset_index()
        distribution_count['percentage'] = distribution_count['count'] / 10000 * 100
        distribution_count['label'] = distribution_count['percentage'].round(2).astype(str) + '%'
        distribution_count['click'] = distribution_count['click'].astype(str)
        return distribution_count
    click_variable_count = df[group_col].value_counts().reset_index()
    click_variable_count['percentage'] = click_variable_count['count'] / df.shape[0] * 100
    click_variable_count['label'] = click_variable_count['percentage'].round(2).astype(str) + '%'
    return click_variable_count

click_variable_count = variable_count(rec_sys_df, group_col='click')
click_variable_count

Unnamed: 0,click,count,percentage,label
0,0,11895,59.475,59.48%
1,1,8105,40.525,40.52%


The entries of the click variables were counted. It happened that the experiment recorded more of 'no click' than 'click' with 'no click' making $59.47\%$ while 'click' makes the remaining $40.52\%$. 

In the next cell, the above outcome is visualized in a bar chart.

In [9]:
# plot the click count 
fig = px.bar(
    click_variable_count, 
    x='click', 
    y='count', 
    color='click', 
    text='label',
    title='Click Distribution Across The Experimental and Control Groups',
    color_discrete_sequence=color_palette 
    )
fig

As was seen earlier, the `group` variable is divided into: 'experiment' and 'control'. In the cell that follows, the distribution is examined.

In [10]:
# call avriable_count funtion to count the entries in group. 
group_count = variable_count(rec_sys_df, 'group')
group_count

Unnamed: 0,group,count,percentage,label
0,exp,10000,50.0,50.0%
1,con,10000,50.0,50.0%


From the table above, the two groups are equally distributed among the users since each make up $50\%$ of the experiment. The Group Distribution Plot below visualizes the above table.

In [11]:
# plot the group distristion across the dataframe.
fig = px.bar(
    group_count, 
    x='group', 
    y='count', 
    color='group', 
    text='label',
    title='Group Distribution Plot' 
    )
fig

### CLICK DISTRIBUTION ACROSS THE TWO GROUPS.

In the final phase of the exploration analysis is to understand how the clicks are distributed across the two groups. This will provide a surface insight of which group had more interaction in the experiment. The resulting table is plotted after the table.

In [12]:
# call variable_count() to plot the click count per group passing 'group' to group_col and passing 'click' to filter_col
click_distribution_across_groups = variable_count(rec_sys_df, group_col='group', filter_col='click')
click_distribution_across_groups

Unnamed: 0,group,click,count,percentage,label
0,con,0,8011,80.11,80.11%
1,con,1,1989,19.89,19.89%
2,exp,1,6116,61.16,61.16%
3,exp,0,3884,38.84,38.84%


In [13]:
fig = px.bar(
    click_distribution_across_groups,
    x='group',
    y='count',
    text='label',
    title='Click Distribution Across The Experimental and Control Groups',
    color='click',
    barmode='group',
    color_discrete_sequence=color_palette 
)
fig

In the control group, out of $10,000$ users $8,011$ didn't click the 'secure free trial' button while $1,989$ clicked the button. In the experiment group, out of $10,000$ users, $6,116$ clicked on the 'enroll now' button and $3,884$ didn't. This already suggests a difference in $CTR$ for the two groups. This difference is what will be tested in the following cell to confirm if it is there or by chance. The test will be started off with the chi-squared test of independency.

## CHI-SQUARED TEST OF DEPENDENCY.

The aim is to use the chi-squared test to test if `click` variable is independent of the button type. The hypothesis would be stated as follows:
$$
\begin{aligned}
H_{0}: & \; \text{click is independent of button text} \\
H_{a}: & \; \text{click is not independent of button text}
\end{aligned}
$$

The objective is to test the above hypothesis using data from the dataset. In this case, it will be favourable if there is enough statistical evidence against $H_{0}$ for a given $\alpha$-level. To do this, a contigency table strategy is employed. A contigency table is simply a table containing the observed click and no click distribution across the control and experimental groups. For this test, a contigency table will be constructed using the group variable and the click variable. The contigency table is constructed with pandas' `crosstab()` function. 
> there are vast literatures on Chai-Squared Test which explains how to compute the frequency contigency table if the $H_{0}$ is true. Here, we shall only state the Chai-Squared test statistic.

The formular for the chi-squared test statistic follows:

$$
\chi^{2} = \sum_{i=1}^r \sum_{j=1}^c \frac{(O_{ij}-E_{ij})^2}{E_{ij}}
$$

Where:

- $O_{ij} = \text{Observed frequency in row}\; i \; \text{column}\;j$
- $E_{ij} = \text{Expected frequency in row}\; i \; \text{column}\;j$
- $r = \text{ number of rows}$
- $c = \text{ number of columns}$

And:
$$
E_{ij} = \frac{(\text{row total of } i)(\text{column total of } j)}{\text{grand total}}
$$
With degree of freedom:

$$ df = (r-1)(c-1)
$$

With the above, a $p$-value is calculated.

So, for a given $\alpha$, if $ p \leq \alpha $ then, there is enough statistical evidence to reject $H_{0}$. Below is the contigency table for the dataset:



In [14]:
# construct a contegency table using pd.crosstab() passing 'group' and 'click' as row and column respectively.
contigency_table = pd.crosstab(rec_sys_df['group'],rec_sys_df['click'])
contigency_table

click,0,1
group,Unnamed: 1_level_1,Unnamed: 2_level_1
con,8011,1989
exp,3884,6116


For the actual Chi-Squared test, the `chi2_contigency()` will be used to conduct the test with the following $\alpha$ value:

$$
\alpha = 0.05
$$

In [15]:
# use a tuple to unpack the values of the function chi_contigency() after passing the contigency table to it.
chi_2, p_value, degree_of_freedom, expected = chi2_contingency(contigency_table)
print(f'Chi-Square Statistic: {chi_2:.4f}') # the test statistic
print(f'p-value: {p_value:.10f}') # the p-value
print(f'Degree of Freedom: {degree_of_freedom}') # degree of freedom
print('Expected Frequencies:')
print(expected) # expected frequency table.

Chi-Square Statistic: 3531.5957
p-value: 0.0000000000
Degree of Freedom: 1
Expected Frequencies:
[[5947.5 4052.5]
 [5947.5 4052.5]]


In [None]:
# Parameters
df = degree_of_freedom
alpha = 0.05
chi2_stat = chi_2

chi2_crit = chi2.ppf(1 - alpha, df)

# x and y values for chi-squared distribution
x = np.linspace(0, chi2.ppf(0.999, df), 500)
y = chi2.pdf(x, df)

# Create Plotly figure
fig = go.Figure()

# Add chi-squared distribution curve
fig.add_trace(go.Scatter(x=x, y=y, mode='lines', name=f'Chi-Squared(df={df})'))

# Shade rejection region (right tail)
fig.add_trace(go.Scatter(
    x=x[x > chi2_crit],
    y=y[x > chi2_crit],
    fill='tozeroy',
    mode='none',
    name='Rejection Region',
    fillcolor='rgba(255,0,0,0.4)'
))

# Vertical line for test statistic (off-scale)
fig.add_trace(go.Scatter(
    x=[chi2_stat, chi2_stat],
    y=[0, 0.1],  # small stub, since it's far off
    mode='lines',
    line=dict(color='green', dash='dash', width=2),
    name=f'Test Statistic = {chi2_stat:.2f} (off-chart)'
))

# Vertical line for chi-squared critical value
fig.add_trace(go.Scatter(
    x=[chi2_crit, chi2_crit],
    y=[0, chi2.pdf(chi2_crit, df)],
    mode='lines',
    line=dict(color='blue', dash='dash', width=2),
    name=f'Chi2 Critical = {chi2_crit:.2f}'
))

# Update layout with tighter zoom & stretched y-axis
fig.update_layout(
    title=f'Chi-Squared Test (df={df}, α={alpha})<br>Statistic = {chi2_stat:.2f}, Critical = {chi2_crit:.2f}',
    xaxis_title='Chi-Squared Value',
    yaxis_title='Probability Density',
    template='simple_white',
    xaxis=dict(range=[0, 10]),   # zoom in around the rejection region
    yaxis=dict(range=[0, 2.0])   # << stretch y-axis to see the peak
)

fig.show()

From the print-out, the test statistic is very large. This is a strong evidence against the null hypothesis. With the $p$-value almost zero, there is enough statistical evidence to reject $H_{0}$. Therefore, click behaviour is not independent of the texts in the buttons of the different interfaces. 

The chart above tries to visualize what the distribution looks like along with the rejection region. The test statistic is far off the chart indicating a strong evidence against the null hypothesis.


### Z-Test.

The z-test will help to determine if the difference between the two groups' $CTR$ is significant. Even though it does not determine the magnitude, it will give insights into the direction of the difference if null hypothesis is rejected.

> Because the population's standard deviations are not known, the ideal test should have been a T-test. But because the popultion has significantly large observations, a z-test could be applied since Central Limit Theorem guarantees that the sampling distribution of the $CTR$ is approximatly normal.

The procedure for conducting the test starts with stating the hypothesis. Thus:
Let $p_{1}$ and $p_{2}$ denote the $CTR$ of the experinmental group and control group respectively then,

$$
\begin{aligned}
H_{0}: & \; p_{1} - p_{2} = 0 \\
H_{a}: & \; p_{1} - p_{2} \neq 0.
\end{aligned}
$$

The Test statistic $Z$ is given as:
$$
Z = \frac{\bar{p}_{1}  - \bar{p}_{2}}{\sqrt{\hat{p}(1 - \hat{p})(\frac{1}{n_{1}} - \frac{1}{n_{2}})}}
$$

Where:

- $ \bar{p}_{1}  \; \text{and} \; \bar{p}_{2} \; \text{are the estimated CTR of population }\; 1 \; \text{and population}\; 2 \; \text{respectively},$
- $\hat{p} \;  \text{is the pooled variance of} \; \bar{p}_{1}  \; \text{and} \; \bar{p}_{2} \; \text{given by:}\;$
$$
\hat{p} = \frac{\bar{p}_{1}n_{1}  + \bar{p}_{2}n_{2}}{n_{1} + n_{2}}
$$
- $ n_{1}  \; \text{and} \; n_{1} \; \text{are the sample size of population }\; 1 \; \text{and}\; 2 \text{respectively}.$

For a given $\alpha$, the objective is to find enough statistic evidence using the sample data to reject the null hypothesis. Just like the chi-test, The $p$-value is computed and compared with $\alpha$-value. If the $p$-value is less than $\alpha$, the null hypothesis is rejected.

To proceed with the test, the needed values stated above are computed in the next cell.

In [None]:
# define group_count() to compute the total number of observations in the two groups.
def group_count(df: pd.DataFrame, col: str):
    '''
    takes a dataframe and column name. Filters the dataframe with the column name, counts the values and extracts the total number of each groups and 
    returns them. 

    Arguments
    ====================================
    df: pandas dataframe object. The dataframe to be filtered.
    col: string object. the column name from the dataframe df to be filtered.
    N/B: col must have only two unique groups.

    Return
    ===================================
    Returns  the total number of each group.
    '''
    if len(df[col].unique()) > 2:
        print(f'{col} has more than two groups')
        return
    if len(df[col].unique()) < 2:
        print(f'{col} has less than two groups')
        return
    count_by_group = df[col].value_counts()

    exp_group = df[col].unique()[0]
    con_group = df[col].unique()[1]

    N_exp = count_by_group.loc[exp_group]
    N_con = count_by_group.loc[con_group]

    print('Experimental Group: ', N_exp)
    print('Control Group: ', N_con)

    return N_exp, N_con

N_exp, N_con = group_count(rec_sys_df, 'group')


Experimental Group:  10000
Control Group:  10000


The figures above are figures that have appeared before which show that the function was well defined. In the next cell $\bar{p}_{1}$ and $\bar{p}_{2}$ are caculated. 

In [None]:
def p_bar(df: pd.DataFrame, col: list):

    '''
    p_bar calculates the estimated CTR for two groups.

    Arguments:
    ==============================
    df: A pandas dataframe containing the two groups.

    col: A list object contain the name of group column and the click column.

    Return:
    returns two floating point numbers which are the estimated CTRs for the two groups.

    N/B: the group must be two.
    '''

    if len(df[col[0]].unique()) > 2:
        print(f'{col} has more than two groups')
        return
    if len(df[col[0]].unique()) < 2:
        print(f'{col} has less than two groups')
        return

    N_exp, N_con = group_count(rec_sys_df, col[0])
    
    total_clicks_by_group = df.groupby(col[0], sort=False)[col[1]].sum()
    groups = total_clicks_by_group.index.tolist()
    exp_group = groups[0]
    con_group = groups[1]
    p_bar_con = total_clicks_by_group.loc[con_group] / N_con
    p_bar_exp = total_clicks_by_group.loc[exp_group] / N_exp

    print(f'Control Group Sample CTR Estimate {(p_bar_con)}')
    print(f'Experimental Group Sample CTR Estimate {(p_bar_exp)}')

    return p_bar_con, p_bar_exp

p_bar_con, p_bar_exp = p_bar(rec_sys_df, ['group', 'click'])


Experimental Group:  10000
Control Group:  10000
Control Group Sample CTR Estimate 0.1989
Experimental Group Sample CTR Estimate 0.6116


The estimated $CTR$ for the two groups are displayed above. The experiment group has a higher $CTR$ compaired to the control group as seen before. 

Just to justify the use of z-test, in the next cell a function that will check if the two populations are approximately normal is defined. 
> If the sampling distributions of the estimated CTRs are approximately normal, it can easily be shown that the sampling distribution of their difference is also normal. The proof is beyound the scope of this study.

In [None]:
def group_normality_test(group_con_num: int, group_exp_num: int, group_con_CTR: float, group_exp_CTR: float):
    '''
    group_normality_test takes normality of two groups.

    Arguments:
    =============================
    group_con_num: int object of the first group.
    group_exp_num: int object of the second group.
    group_con_CTR: float object of the first group.
    group_exp_CTR: float object of the exp group.
    '''
    group_con_success_count = group_con_num * group_con_CTR
    group_con_fail_count = group_con_num * (1 - group_con_CTR)

    group_exp_success_count = group_exp_num * group_exp_CTR
    group_exp_fail_count = group_exp_num * (1 - group_exp_CTR)

    if group_con_success_count >= 5 and group_con_fail_count >= 5:
        print('The Sampling Distribution of The Control Group is Approximately Normal')
    else:
        print('The Sampling Distribution of The Control Group is Approximately Normal')

    if group_exp_success_count >= 5 and group_exp_fail_count >= 5:
        print('The Sampling Distribution of The Experiment Group is Approximately Normal')
    else:
        print('The Sampling Distribution of The Experiment Group is Approximately Normal')

group_normality_test(N_con, N_exp, p_bar_con, p_bar_exp)

The Sampling Distribution of The Control Group is Approximately Normal
The Sampling Distribution of The Experiment Group is Approximately Normal


Now, it is safe to assume that the sampling distribution of the point estimator of difference between the experiment group $CTR$ and the control group $CTR$ is approximtely normal.

In the next cell, the point estimator of the difference between the groups $CTRs$ and the pooled variance using the formulars stated above.

In [None]:
# subtract the CTR of the experiment group from the CTR of the Control group 
point_est_of_the_diff_btn_con_and_exp = p_bar_exp - p_bar_con
print(f'point estimator of the difference between exp group and con group:  {point_est_of_the_diff_btn_con_and_exp:.2f}')

# calculate the pooled variance, the p_hat
pooled_point_estimate = ((N_con * p_bar_con) + (N_exp * p_bar_exp))/(N_con + N_exp)
print('pooled variance: ', pooled_point_estimate)

point estimator of the difference between exp group and con group:  0.41
pooled variance:  0.40525


The point estimator for the diference between the two groups is approximately $0.41$ while the pooled variance is $0.40525$. The standard error and the z_statistic is computed in the next cell.

In [None]:
# Standard error using the above formular
standard_error = np.sqrt(pooled_point_estimate*(1-pooled_point_estimate)*((1/N_con)+(1/N_exp)))

# z_statistic using the above stated formular
z_statistic = point_est_of_the_diff_btn_con_and_exp/standard_error
print(f'standard error: {standard_error:.4f}',f'z_statistic: {z_statistic:.4f}')

standard error: 0.0069 z_statistic: 59.4416


$59.4416$ is somewhat an inflated value for a z-test statistic. This is due to the very low value of the standard error. This is already suggesting that the null hypothesis may be rejected. The $p$-value is computed in the next cell.

In [None]:
# since it is a two-way test, the p-value test for one tailed test is doubled.
p_value = 2 * norm.sf(abs(z_statistic))
print('p-value: ', p_value)

p-value:  0.0


In [None]:
# Parameters needed for the plot
mu = 0 # the mean of a standard normal distribution
sigma = 1 # the standard deviation of a standard normal distribution
alpha = 0.05 # the given alpha value for the test
Z_crit = norm.ppf(1 - alpha / 2) # the boundary of the rejection regions
Test_stat = z_statistic  # the computed test statistic

# x and y values for normal distribution
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 500)
y = norm.pdf(x, mu, sigma)

# Create a plotly graphic object
fig = go.Figure()

# Add the standard normal curve to the graphic object
fig.add_trace(go.Scatter(x=x, y=y, mode='lines', name='Standard Normal Distribution'))

# add and shade the left side rejection region (left)
fig.add_trace(go.Scatter(
    x=x[x < -Z_crit],
    y=y[x < -Z_crit],
    fill='tozeroy',
    mode='none',
    name='Rejection Region',
    fillcolor='rgba(255,0,0,0.4)'
))

# add and shade the left side rejection region (left)
fig.add_trace(go.Scatter(
    x=x[x > Z_crit],
    y=y[x > Z_crit],
    fill='tozeroy',
    mode='none',
    fillcolor='rgba(255,0,0,0.4)',
    showlegend=False
))

# Long vertical line for test statistic
fig.add_trace(go.Scatter(
    x=[Test_stat, Test_stat],
    y=[0, 0.4],  # Stretches the line vartically
    mode='lines',
    line=dict(color='green', dash='dash', width=2),
    name=f'Test Statistic = {Test_stat:.2f}'
))

#  the rejection region boundary (Z-critical lines, right)
fig.add_trace(go.Scatter(
    x=[Z_crit, Z_crit],
    y=[0, 0.4],
    mode='lines',
    line=dict(color='blue', dash='dash'),
    name=f'Z-critical = ±{Z_crit:.2f}'
))

#  the rejection region boundary (Z-critical lines, left)
fig.add_trace(go.Scatter(
    x=[-Z_crit, -Z_crit],
    y=[0, 0.4],
    mode='lines',
    line=dict(color='blue', dash='dash'),
    showlegend=False
))

# Update layout (extend y-axis)
fig.update_layout(
    title='Gaussian Distribution with Rejection Region<br>(A/B Testing for LunarTech CTA Button)',
    xaxis_title='Z-value',
    yaxis_title='Probability Density',
    yaxis=dict(range=[0, 0.55]),  # Extend y-axis range
    legend=dict(x=0.4, y=0.99),
    template='simple_white'
)

fig.show()


From the plot above, one can easily see how far the test statistic is on the right rejection region of standard normal plot. This is a enough statistical evidence against the null hypothesis.

In the cell below, a function that uses the $p$-value to validate the test is defined.

In [None]:
def z_test_validator(alpha, p_value):
    '''z_test_validator test is used to validate z_test by comparing the p_value with the alpha value.

    Arguments:
    =============================
    alpha: a float object, the given alpha value.
    p_value: a float object calculated from the test.
    '''
    if p_value <= alpha:
        print(f'With an alpha value of {alpha} and a p_value of {p_value}, \nthere is enough statistic evdvidence to reject the null hypothesis')
    else:
        print(f'With an alpha value of {alpha} and a p_value of {p_value}, \nthere is no enough statistic evdvidence to reject the null hypothesis')


z_test_validator(alpha=alpha, p_value=p_value)

With an alpha value of 0.05 and a p_value of 0.0, 
there is enough statistic evdvidence to reject the null hypothesis


In the cell that follows, a confidence interval with a confidence coefficient of $(1-\alpha)\%$ can be constructed to further solidify  the claim. The expectation is that the hypothesized $CTR$ $(0)$ should not be in the resulting $95\%$ confidence interval.

For an $\alpha$ of $0.05$ the confindence interval $CI$, is given by:
$$
CI = [(\bar{p}_1 - \bar{p}_{2}) \pm ME]
$$
Where $ME$ the margin of error is given by:
$$
ME = Z_{0.025} \cdot \sqrt{\frac{\bar{p}_{1}(1-\bar{p}_{1})}{n_{1}}+\frac{\bar{p}_{2}(1-\bar{p}_{2})}{n_{2}}}
$$

> The standard error here is different from the one used in hypothesis testing. In the case of the hypothesis testing, the standard error is based on the assumption that the null hypothesis is true in which case, the variability in both groups are pooled while in confidence interval estimation, the observed variability is used to compute the standard error.

In [None]:
def confidence_interval_calculator(z_crit, p_bar_exp, p_bar_con, N_exp, N_con):
    '''
    confidence_interval_calculator computes the cofidence interval of thw diference between two population proportions.

    Parameters:
    ================================
    z_crit: floating point object. the z critical values of the experiment.

    p_bar_exp: floating point object. The estimated proportion of group 1.

    p_bar_con: floating point object. The estimated proportion of group 2.
    
    N_exp: Int Object: Number of observations in group 1.

    N_con: Int Object: Number of observations in group 2.

    Return: The function returns a list object with the first item being the lower bound of the 
    interval and the second item being the upper bound of the interval
    '''

    p_exp_success = p_bar_exp
    p_exp_failure = 1 - p_exp_success

    p_con_success = p_bar_con
    p_con_failure = 1 - p_con_success

    me = z_crit * np.sqrt(((p_exp_success*p_exp_failure)/N_exp) + ((p_con_success*p_con_failure)/N_con))
    
    point_estimator = p_bar_exp - p_bar_con

    upper_bound =  point_estimator + me
    lower_bound = point_estimator - me

    return [round(float(lower_bound), 4), round(float(upper_bound), 4)]

CI = confidence_interval_calculator(z_crit=Z_crit, p_bar_exp=p_bar_exp, p_bar_con=p_bar_con, N_exp=N_exp, N_con=N_con)

print('The 95% interval is: ', CI)

The 95% interval is:  [0.4004, 0.425]


The interval above excludes $0$ which further confirms that there is a difference between the $CTR$s of the two groups. Being an interval defined on $\R^{+}$ shows that the difference is positive. This means that the $CTR$ of the experimental group is higher than the $CTR$ of the control group.