# AB-test analysis in a Python Notebook

This Notebook is a guide on how to analyze the results of an AB-test by using Python.

# Step 1 – Import data from SQL

We begin by importing the AB-test data from an SQL table.

Given the table conversionista-se.Sam.demo_abtest, import all the data from the AB-test using BigQuery SQL magic. Save the data in a dataframe called abtest_data.

In [5]:
%%bigquery abtest_data
SELECT *
FROM conversionista-se.Sam.demo_abtest

Query is running:   0%|          |

Downloading:   0%|          |

Now, let's print the size and head of the data to verify that it's been correctly loaded.

In [6]:
print(abtest_data.shape)
abtest_data.head()

(19889, 3)


Unnamed: 0,user_pseudo_id,variant,purchase
0,419246200.0,a,0
1,963152400.0,a,0
2,127549500.0,a,0
3,317772100.0,a,0
4,369581400.0,a,0


# Step 2 – Print summmary statistics

Now that the data has been imported, let's print some summary statistics like how many users and purchases there are in each variant, using Pandas.

In [8]:
import pandas as pd

# Ensure the 'variant' column is treated as a category
abtest_data['variant'] = pd.Categorical(abtest_data['variant'])

# Calculate summary statistics
summary = abtest_data.groupby('variant', observed = False).agg({
    'user_pseudo_id': 'nunique',  # Count unique users
    'purchase': 'sum'      # Sum of purchases
}).rename(columns={
    'user_pseudo_id': 'total_users',
    'purchase': 'total_purchases'
})

# Calculate conversion rate
summary['conversion_rate'] = summary['total_purchases'] / summary['total_users']

summary

Unnamed: 0_level_0,total_users,total_purchases,conversion_rate
variant,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
a,10045,512,0.050971
b,9844,593,0.06024


# Step 3 – SRM Check

Next, let's verify that the experiment has not been subject to a sample ratio mismatch (SRM). That is, we want to ensure that the randomization into each group has been done correctly.

We'll do this using a chi-squared test for independence, to check that the ratio between each variant, respectively, and the total users is not significantly different from 0.5.

In [14]:
import numpy as np
from scipy.stats import chi2_contingency

# Count the number of users in each variant
observed = abtest_data['variant'].value_counts().sort_index()

# Calculate the expected counts (assuming 50/50 split)
total_users = observed.sum()
expected_counts = np.array([total_users / 2, total_users / 2])

# Perform chi-squared test
chi2, p_value, dof, _ = chi2_contingency([observed, expected_counts])

# Print results
print("Sample Ratio Mismatch Test")
print("-------------------------")
print(f"Observed counts: A = {observed['a']}, B = {observed['b']}")
print(f"Expected counts: A = {expected_counts[0]:.0f}, B = {expected_counts[1]:.0f}")
print(f"Chi-squared statistic: {chi2:.4f}")
print(f"p-value: {p_value:.4f}")

# Interpret the results
alpha = 0.05  # significance level
if p_value < alpha:
    print("\nThere is evidence of a Sample Ratio Mismatch (p < 0.05).")
    print("The assignment to variants may not have been truly random.")
else:
    print("\nNo evidence of a Sample Ratio Mismatch (p >= 0.05).")
    print("The assignment to variants appears to be random.")

# Calculate and print the actual ratios
total = observed.sum()
ratio_a = observed['a'] / total
ratio_b = observed['b'] / total
print(f"\nActual ratios: A = {ratio_a:.4f}, B = {ratio_b:.4f}")

Sample Ratio Mismatch Test
-------------------------
Observed counts: A = 10045, B = 9844
Expected counts: A = 9944, B = 9944
Chi-squared statistic: 0.9956
p-value: 0.3184

No evidence of a Sample Ratio Mismatch (p >= 0.05).
The assignment to variants appears to be random.

Actual ratios: A = 0.5051, B = 0.4949


# Step 4 – Test for differences between variants

Finally, we'll do a test to see if the variants differ significantly from eachother. More precisely, we'll test if the conversion rate in variant B is statistically significantly better than that of A.

We'll do this using a z-test for proportions, and an $\alpha$ (p-value level) of 0.05.

In [15]:
import numpy as np
from scipy import stats

# Calculate conversion rates and sample sizes for each variant
conv_rate_a = abtest_data[abtest_data['variant'] == 'a']['purchase'].mean()
conv_rate_b = abtest_data[abtest_data['variant'] == 'b']['purchase'].mean()
n_a = abtest_data[abtest_data['variant'] == 'a'].shape[0]
n_b = abtest_data[abtest_data['variant'] == 'b'].shape[0]

# Calculate the pooled standard error
p_pooled = (conv_rate_a * n_a + conv_rate_b * n_b) / (n_a + n_b)
se_pooled = np.sqrt(p_pooled * (1 - p_pooled) * (1/n_a + 1/n_b))

# Calculate the z-score
z_score = (conv_rate_b - conv_rate_a) / se_pooled

# Calculate the p-value (one-tailed test)
p_value = 1 - stats.norm.cdf(z_score)

# Print results
print("Z-Test for Proportions: Variant B vs Variant A")
print("----------------------------------------------")
print(f"Conversion Rate A: {conv_rate_a:.4f}")
print(f"Conversion Rate B: {conv_rate_b:.4f}")
print(f"Sample Size A: {n_a}")
print(f"Sample Size B: {n_b}")
print(f"Z-Score: {z_score:.4f}")
print(f"P-value (one-tailed): {p_value:.4f}")

# Interpret the results
alpha = 0.05
if p_value < alpha:
    print("\nThe conversion rate in Variant B is statistically significantly higher than in Variant A (p < 0.05).")
    print("We reject the null hypothesis.")
else:
    print("\nThere is not enough evidence to conclude that the conversion rate in Variant B")
    print("is statistically significantly higher than in Variant A (p >= 0.05).")
    print("We fail to reject the null hypothesis.")

# Calculate and print relative improvement
relative_improvement = (conv_rate_b - conv_rate_a) / conv_rate_a
print(f"\nRelative improvement: {relative_improvement:.2%}")

Z-Test for Proportions: Variant B vs Variant A
----------------------------------------------
Conversion Rate A: 0.0510
Conversion Rate B: 0.0602
Sample Size A: 10045
Sample Size B: 9844
Z-Score: 2.8532
P-value (one-tailed): 0.0022

The conversion rate in Variant B is statistically significantly higher than in Variant A (p < 0.05).
We reject the null hypothesis.

Relative improvement: 18.19%
