# Statistical methods for A/B testing, segmentation and cohort analysis in Python

## 1. A/B testing

As an example how to compare metrics like conversion rate or click-through rate, I connected my website johannes.vc via Google Analytics to BigQuery, and from BigQuery to this analytics notebook. 

**Objective:**
- compare click-through traffic on Mobile and Web.
- Is the difference significant (5% level) or highly significant (1% level)?

# Some EDA

Scroll down for a/b test

In [2]:
%load_ext google.cloud.bigquery
import pandas as pd
import numpy as np

In [7]:
%%bigquery
SELECT COUNT(DISTINCT user_pseudo_id) as unique_users
FROM `johannesvc.analytics_413581908.events_*`
-- WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240217'

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,unique_users
0,4


In [3]:
%%bigquery inspect
SELECT
  event_name, 
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location
FROM
  `johannesvc.analytics_413581908.events_*`
WHERE
  event_name = 'page_view'

Query is running:   0%|          |

Downloading:   0%|          |

In [4]:
inspect

Unnamed: 0,event_name,page_location
0,page_view,https://johannes.vc/
1,page_view,https://johannes.vc/
2,page_view,https://johannes.vc/
3,page_view,https://johannes.vc/
4,page_view,https://johannes.vc/
...,...,...
164,page_view,https://johannes.vc/
165,page_view,https://johannes.vc/
166,page_view,https://johannes.vc/
167,page_view,https://johannes.vc/


In [100]:
%%bigquery
WITH PageViewsWithLead AS (
  SELECT
    user_pseudo_id,
    event_timestamp,
    event_name,
    LAG(event_timestamp) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS prev_event_timestamp,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location
  FROM
    `johannesvc.analytics_413581908.events_20240215`
  WHERE
    event_name = 'page_view'
)

SELECT
  user_pseudo_id,
  page_location,
--   event_timestamp,
--   prev_event_timestamp,
  TIMESTAMP_DIFF(TIMESTAMP_MICROS(event_timestamp), TIMESTAMP_MICROS(prev_event_timestamp), SECOND) AS time_spent_seconds
FROM
  PageViewsWithLead
WHERE
  prev_event_timestamp IS NOT NULL
ORDER BY
  user_pseudo_id, event_timestamp

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,user_pseudo_id,page_location,time_spent_seconds
0,1416806056.1698413,https://johannes.vc/services/,6
1,1416806056.1698413,https://johannes.vc/writings/,18
2,1416806056.1698413,https://johannes.vc/,1
3,1416806056.1698413,https://johannes.vc/,4907


Two ways of calculating engagement time. 

In [11]:
%%bigquery
SELECT
  user_pseudo_id,
  event_name,
  -- event_timestamp,
  device.category,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
  (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'session_engaged') AS session_engaged,
  (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS engagement_time_msec,
  TIMESTAMP_DIFF(
    TIMESTAMP_MICROS(event_timestamp), TIMESTAMP_MICROS(
      LAG(event_timestamp) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp)
      ), SECOND) AS time_spent_seconds,
FROM
  `johannesvc.analytics_413581908.events_*`

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,user_pseudo_id,event_name,category,page_location,session_engaged,engagement_time_msec,time_spent_seconds
0,1416806056.1698413,session_start,mobile,https://johannes.vc/,1.0,,
1,1416806056.1698413,page_view,mobile,https://johannes.vc/,,,0.0
2,1416806056.1698413,scroll,mobile,https://johannes.vc/,,5155.0,6.0
3,1416806056.1698413,user_engagement,mobile,https://johannes.vc/,,1060.0,0.0
4,1416806056.1698413,page_view,mobile,https://johannes.vc/services/,,,0.0
5,1416806056.1698413,scroll,mobile,https://johannes.vc/services/,,5.0,5.0
6,1416806056.1698413,user_engagement,mobile,https://johannes.vc/services/,,18591.0,13.0
7,1416806056.1698413,page_view,mobile,https://johannes.vc/writings/,,,0.0
8,1416806056.1698413,scroll,mobile,https://johannes.vc/writings/,,8.0,1.0
9,1416806056.1698413,user_engagement,mobile,https://johannes.vc/writings/,,1759.0,0.0


# Simple grouping for A/B test

In [14]:
%%bigquery df
SELECT
    user_pseudo_id,
    COUNT(*) AS pageviews,
    device.category AS metric
  FROM
    `johannesvc.analytics_413581908.events_*`
  WHERE
    event_name = 'page_view'
  GROUP BY
    device.category,
    user_pseudo_id

Query is running:   0%|          |

Downloading:   0%|          |

In [15]:
# get the metric (binary 1-0) for each group
group_a = df[df['metric'] == 'mobile']['pageviews'].astype('float64')
group_b = df[df['metric'] == 'desktop']['pageviews'].astype('float64')

In [16]:
group_a, group_b

(0    5.0
 2    1.0
 3    2.0
 Name: pageviews, dtype: float64,
 1    1.0
 Name: pageviews, dtype: float64)

In [17]:
from scipy.stats import ttest_ind

# Perform independent two-sample t-test
t_stat, p_value = ttest_ind(group_a, group_b)

print(f"T-statistic: {t_stat}, P-value: {p_value}")

alpha = 0.05  # significance level
if p_value < alpha:
    print("There is a statistically significant difference between the two groups.")
else:
    print("There is no statistically significant difference between the two groups.")

T-statistic: 0.6933752452815364, P-value: 0.5597745468371882
There is no statistically significant difference between the two groups.


In [41]:
ttest_ind(group_a, group_b)

TtestResult(statistic=0.6933752452815364, pvalue=0.5597745468371882, df=2.0)

## Explanation
In the context of a t-test, the `t_stat` (t-statistic) and `p_value` (p-value) are key outcomes that provide statistical measures to help evaluate the significance of the difference between two groups. Here's what they represent:

### T-Statistic (`t_stat`)

- The t-statistic is a ratio that compares the difference between the means of two groups to the variability of the groups. It's calculated using the formula:

  $t = \frac{\bar{x}_1 - \bar{x}_2}{s_{\bar{x}_1 - \bar{x}_2}}$
  

  where $(\bar{x}_1) and (\bar{x}_2)$ are the sample means of the two groups, and $(s_{\bar{x}_1 - \bar{x}_2})$ is the standard error of the difference between the two means. 

- The t-statistic measures how many standard errors the difference between the two means is away from 0. A larger absolute value of the t-statistic indicates a greater difference between the groups relative to the variability within the groups.

### P-Value (`p_value`)

- The p-value is a probability that measures the evidence against the null hypothesis. It tells you how likely it is to observe a test statistic as extreme as, or more extreme than, the one observed, assuming that the null hypothesis is true.

- In the context of an A/B test, the null hypothesis typically states that there is no difference between the means of the two groups. A small p-value (typically ≤ 0.05) indicates that the observed data would be very unlikely under the null hypothesis, leading to the rejection of the null hypothesis. This suggests that there is a statistically significant difference between the two groups.

- The significance level (α) is a threshold chosen before the test is conducted, against which the p-value is compared. Commonly, α is set to 0.05. If the p-value is less than or equal to α, the difference between the groups is considered statistically significant.

Together, the t-statistic and p-value provide a framework for making decisions in hypothesis testing:

- The **t-statistic** gives a measure of the magnitude of difference between groups, normalized by the variability of the data.
- The **p-value** gives the probability of observing such a difference (or more extreme) if the null hypothesis were true.

The combination of these outcomes helps researchers and data analysts determine whether the differences observed in their data (such as conversion rates or other metrics in A/B tests) are statistically significant and not just due to random chance.

# 2. Cohort analysis and segmentation
In cohort analysis and segmentation, these principles still apply, but the approach and context differ. Cohort analysis and segmentation involve breaking down data into specific groups (cohorts) based on shared characteristics or behaviors over time, and then analyzing these groups to understand trends, behaviors, and outcomes.

### 2.1 Cohort Analysis

- **Purpose**: Cohort analysis tracks the behavior of groups of users over time to identify trends or patterns. These cohorts could be based on the users' first purchase date, sign-up date, or any other event that marks their entry point into the dataset.

- **Application of Statistical Tests**: You can apply statistical tests to compare metrics across different cohorts. For example, if you're looking at monthly cohorts based on sign-up date and you want to compare the 6-month retention rate across cohorts, you could use a t-test to compare the retention rates between any two cohorts to see if the difference in retention is statistically significant.

### 2.2 Segmentation

- **Purpose**: Segmentation involves dividing a broader market or customer base into smaller segments based on certain criteria like demographics, behavior, or psychographics. The goal is to understand differences between these segments to tailor products, services, or marketing strategies.

- **Application of Statistical Tests**: Similar to cohort analysis, you can use statistical tests to compare key metrics across different segments. For instance, if you have segmented your users into high, medium, and low engagement based on their activity levels, you might use t-tests or ANOVA (Analysis of Variance) to compare average revenue across these segments. ANOVA is particularly useful for comparing more than two groups. 
  
  
To calculate the ANOVA F-value for each of the device types:

In [19]:
%%bigquery df2 
SELECT 
    device.category AS metric,
    COUNT(*) AS pageviews
FROM `johannesvc.analytics_413581908.events_*`
WHERE
    event_name = 'page_view'
GROUP BY
    device.category,
    user_pseudo_id

Query is running:   0%|          |

Downloading:   0%|          |

In [24]:
df2

Unnamed: 0,metric,pageviews
0,mobile,1
1,mobile,2
2,mobile,5
3,desktop,1


In [38]:
# get the metric (binary 1-0) for each group
group_a = df2[df2['metric'] == 'mobile']['pageviews'].astype('float64')
group_b = df2[df2['metric'] == 'desktop']['pageviews'].astype('float64')
group_c = [5.0, 1.0] # df2[df2['metric'] == 'tablet']['pageviews'].astype('float64')

In [40]:
from scipy.stats import f_oneway

f_oneway(group_a, group_b, group_c)

F_onewayResult(statistic=0.255, pvalue=0.7901712196940587)