# Statistical methods for A/B testing, segmentation and cohort analysis in Python

## 1. A/B testing

As an example how to compare metrics like conversion rate or click-through rate, I connected my website johannes.vc via Google Analytics to BigQuery, and from BigQuery to this analytics notebook. 

**Objective:**
- compare click-through traffic on Mobile and Web.
- Is the difference significant (5% level) or highly significant (1% level)?

In [1]:
%load_ext google.cloud.bigquery
import pandas as pd
import numpy as np

In [None]:
%%bigquery df 
SELECT 
    CASE
        WHEN device.deviceCategory = 'Tablet' THEN 'Mobile' 
        ELSE device.deviceCategory END AS group
    CASE 
        WHEN totals.pageviews > 2 THEN 0 
        ELSE 1 END AS metric
FROM `johannesvc.johannesvc.johannesvc`
WHERE group IN ('A', 'B')

In [None]:
# get the metric (binary 1-0) for each group
group_a = df[df['group'] == 'Mobile']['metric']
group_b = df[df['group'] == 'Desktop']['metric']

In [None]:
from scipy.stats import ttest_ind

# Perform independent two-sample t-test
t_stat, p_value = ttest_ind(group_a, group_b)

print(f"T-statistic: {t_stat}, P-value: {p_value}")

# Interpret the results
alpha = 0.05  # significance level
if p_value < alpha:
    print("There is a statistically significant difference between the two groups.")
else:
    print("There is no statistically significant difference between the two groups.")

## Explanation
In the context of a t-test, the `t_stat` (t-statistic) and `p_value` (p-value) are key outcomes that provide statistical measures to help evaluate the significance of the difference between two groups. Here's what they represent:

### T-Statistic (`t_stat`)

- The t-statistic is a ratio that compares the difference between the means of two groups to the variability of the groups. It's calculated using the formula:

  $t = \frac{\bar{x}_1 - \bar{x}_2}{s_{\bar{x}_1 - \bar{x}_2}}$
  

  where $(\bar{x}_1) and (\bar{x}_2)$ are the sample means of the two groups, and $(s_{\bar{x}_1 - \bar{x}_2})$ is the standard error of the difference between the two means. 

- The t-statistic measures how many standard errors the difference between the two means is away from 0. A larger absolute value of the t-statistic indicates a greater difference between the groups relative to the variability within the groups.

### P-Value (`p_value`)

- The p-value is a probability that measures the evidence against the null hypothesis. It tells you how likely it is to observe a test statistic as extreme as, or more extreme than, the one observed, assuming that the null hypothesis is true.

- In the context of an A/B test, the null hypothesis typically states that there is no difference between the means of the two groups. A small p-value (typically ≤ 0.05) indicates that the observed data would be very unlikely under the null hypothesis, leading to the rejection of the null hypothesis. This suggests that there is a statistically significant difference between the two groups.

- The significance level (α) is a threshold chosen before the test is conducted, against which the p-value is compared. Commonly, α is set to 0.05. If the p-value is less than or equal to α, the difference between the groups is considered statistically significant.

Together, the t-statistic and p-value provide a framework for making decisions in hypothesis testing:

- The **t-statistic** gives a measure of the magnitude of difference between groups, normalized by the variability of the data.
- The **p-value** gives the probability of observing such a difference (or more extreme) if the null hypothesis were true.

The combination of these outcomes helps researchers and data analysts determine whether the differences observed in their data (such as conversion rates or other metrics in A/B tests) are statistically significant and not just due to random chance.

# 2. Cohort analysis and segmentation
In cohort analysis and segmentation, these principles still apply, but the approach and context differ. Cohort analysis and segmentation involve breaking down data into specific groups (cohorts) based on shared characteristics or behaviors over time, and then analyzing these groups to understand trends, behaviors, and outcomes.

### 2.1 Cohort Analysis

- **Purpose**: Cohort analysis tracks the behavior of groups of users over time to identify trends or patterns. These cohorts could be based on the users' first purchase date, sign-up date, or any other event that marks their entry point into the dataset.

- **Application of Statistical Tests**: You can apply statistical tests to compare metrics across different cohorts. For example, if you're looking at monthly cohorts based on sign-up date and you want to compare the 6-month retention rate across cohorts, you could use a t-test to compare the retention rates between any two cohorts to see if the difference in retention is statistically significant.

### 2.2 Segmentation

- **Purpose**: Segmentation involves dividing a broader market or customer base into smaller segments based on certain criteria like demographics, behavior, or psychographics. The goal is to understand differences between these segments to tailor products, services, or marketing strategies.

- **Application of Statistical Tests**: Similar to cohort analysis, you can use statistical tests to compare key metrics across different segments. For instance, if you have segmented your users into high, medium, and low engagement based on their activity levels, you might use t-tests or ANOVA (Analysis of Variance) to compare average revenue across these segments. ANOVA is particularly useful for comparing more than two groups. 
  
  
To calculate the ANOVA F-value for each of the device types:

In [None]:
%%bigquery df2 
SELECT 
    device.deviceCategory AS group
    CASE 
        WHEN totals.pageviews > 2 THEN 0 
        ELSE 1 END AS metric
FROM `johannesvc.johannesvc.johannesvc`
WHERE group IN ('A', 'B')

In [None]:
# get the metric (binary 1-0) for each group
group_a = df2[df2['group'] == 'Mobile']['metric']
group_b = df2[df2['group'] == 'Desktop']['metric']
group_c = df2[df2['group'] == 'Tablet']['metric']

In [None]:
from scipy.stats import f_oneway

# assuming 3 groups
f_stat, p_value = f_oneway(group_a, group_b, group_c)
print(f"F-Statistic: {f_stat}, P-Value: {p_value}")