<a id="table-of-contents"></a>
# 📖 Table of Contents

[🗂️ Data Setup](#data-setup)     
- [🔍 Generate Data](#generate-data)      
- [⚙️ Define Test Configuration](#define-test-config)

[🛠️ Test Setup](#test-setup)    
- [📋 Print Config Summary](#print-config)

[📈 Inference](#inference)  
- [🔍 Infer Distribution From Data](#infer-distribution)      
- [📏 Infer Variance Equality](#infer-variance)    
- [📊 Infer Parametric Flag](#infer-parametric)    

[🧪 Hypothesis Testing](#hypothesis-testing)    
- [🧭 Determine Test To Run](#determine-test)    
- [🧠 Print Hypothesis Statement](#print-hypothesis)    
- [🧪 Run Hypothesis Test](#run-test)    

[📊 Test Summary](#test-summary)    

[🚀 Full Pipeline](#full-pipeline)    


<details><summary><strong>📖 Hypothesis Testing - Assumptions & Methods (Click to Expand)</strong></summary>

<table>
  <thead>
    <tr>
      <th>Test Type</th>
      <th>Use Case</th>
      <th>Parametric?</th>
      <th>Assumptions</th>
      <th>Non-Parametric Alternative</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>One-Sample t-test</td>
      <td>Compare sample mean vs. known value</td>
      <td>✅</td>
      <td>Normality of sample</td>
      <td>Sign test</td>
    </tr>
    <tr>
      <td>Two-Sample t-test</td>
      <td>Compare means of two independent groups</td>
      <td>✅</td>
      <td>- Normality (both groups)<br>- Equal variance (if pooled)<br>- Independence</td>
      <td>Mann-Whitney U</td>
    </tr>
    <tr>
      <td>Paired t-test</td>
      <td>Compare means of two related samples (before-after, matched)</td>
      <td>✅</td>
      <td>- Normality of *differences*<br>- No extreme outliers</td>
      <td>Wilcoxon signed-rank</td>
    </tr>
    <tr>
      <td>Proportions z-test</td>
      <td>Compare binary rates (e.g., CTR in A vs B)</td>
      <td>✅</td>
      <td>- np > 5, nq > 5 (sample size rule)<br>- Independence</td>
      <td>Fisher’s exact</td>
    </tr>
    <tr>
      <td>Chi-Square Test</td>
      <td>Categorical association (e.g., device type vs. AR adoption)</td>
      <td>✅</td>
      <td>- Expected count ≥ 5 in ≥ 80% of cells<br>- Independence</td>
      <td>Fisher’s exact</td>
    </tr>
    <tr>
      <td>ANOVA</td>
      <td>Compare means across 3+ groups</td>
      <td>✅</td>
      <td>- Normality<br>- Equal variance<br>- Independence</td>
      <td>Kruskal-Wallis</td>
    </tr>
    <tr>
      <td>Mann-Whitney U</td>
      <td>Compare medians/ranks of two independent groups</td>
      <td>❌</td>
      <td>- Same shape distribution (ideally)<br>- Ordinal or continuous data</td>
      <td>N/A</td>
    </tr>
    <tr>
      <td>Wilcoxon Signed-Rank</td>
      <td>Paired version of Mann-Whitney (for related samples)</td>
      <td>❌</td>
      <td>- Symmetry in differences<br>- Ordinal or continuous</td>
      <td>Sign test</td>
    </tr>
  </tbody>
</table>

</details>


<details><summary><strong>📖 Hypothesis Test Selection Matrix (Click to Expand)</strong></summary>

<table>
  <thead>
    <tr>
      <th>Outcome Type</th>
      <th>Group Relationship</th>
      <th>Group Count</th>
      <th>Outcome Distribution</th>
      <th>Business Problem</th>
      <th>Recommended Test</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>continuous</td>
      <td>–</td>
      <td>one-sample</td>
      <td>normal</td>
      <td>Is average order value different from $50?</td>
      <td>One-sample t-test</td>
    </tr>
    <tr>
      <td>binary</td>
      <td>–</td>
      <td>one-sample</td>
      <td>–</td>
      <td>Is conversion rate different from 10%?</td>
      <td>One-proportion z-test</td>
    </tr>
    <tr>
      <td>continuous</td>
      <td>independent</td>
      <td>two-sample</td>
      <td>normal</td>
      <td>Do users who saw the new recommendation engine spend more time on site?</td>
      <td>Two-sample t-test</td>
    </tr>
    <tr>
      <td>continuous</td>
      <td>independent</td>
      <td>two-sample</td>
      <td>non-normal</td>
      <td>Is there a difference in revenue between users who got coupon A vs B?</td>
      <td>Mann-Whitney U test</td>
    </tr>
    <tr>
      <td>continuous</td>
      <td>paired</td>
      <td>two-sample</td>
      <td>normal</td>
      <td>Did users spend more after homepage redesign?</td>
      <td>Paired t-test</td>
    </tr>
    <tr>
      <td>continuous</td>
      <td>paired</td>
      <td>two-sample</td>
      <td>non-normal</td>
      <td>Did time on site change after layout update (skewed)?</td>
      <td>Wilcoxon signed-rank test</td>
    </tr>
    <tr>
      <td>binary</td>
      <td>independent</td>
      <td>two-sample</td>
      <td>–</td>
      <td>Does a new CTA improve conversions?</td>
      <td>Proportions z-test</td>
    </tr>
    <tr>
      <td>binary</td>
      <td>paired</td>
      <td>two-sample</td>
      <td>–</td>
      <td>Do users convert more after adding trust badges?</td>
      <td>McNemar’s test</td>
    </tr>
    <tr>
      <td>categorical</td>
      <td>independent</td>
      <td>multi-sample</td>
      <td>–</td>
      <td>Do plan choices differ between layout A/B/C?</td>
      <td>Chi-square test</td>
    </tr>
  </tbody>
</table>

</details>


<details><summary><strong>📖 Hypothesis Test Selection Matrix (Click to Expand)</strong></summary>

<table>
  <thead>
    <tr>
      <th>Outcome Type</th>
      <th>Group Relationship</th>
      <th>Group Count</th>
      <th>Outcome Distribution</th>
      <th>Business Problem</th>
      <th>Recommended Test</th>
      <th>Notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>continuous</td>
      <td>–</td>
      <td>one-sample</td>
      <td>normal</td>
      <td>Is average order value different from $50?</td>
      <td>One-sample t-test</td>
      <td>Use Wilcoxon signed-rank if not normal</td>
    </tr>
    <tr>
      <td>binary</td>
      <td>–</td>
      <td>one-sample</td>
      <td>–</td>
      <td>Is conversion rate different from 10%?</td>
      <td>One-proportion z-test</td>
      <td>Use binomial exact test if n is small</td>
    </tr>
    <tr>
      <td>continuous</td>
      <td>independent</td>
      <td>two-sample</td>
      <td>normal</td>
      <td>Do users who saw recs spend more time on site?</td>
      <td>Two-sample t-test</td>
      <td>Use Welch’s t-test if variances unequal</td>
    </tr>
    <tr>
      <td>continuous</td>
      <td>independent</td>
      <td>two-sample</td>
      <td>non-normal</td>
      <td>Is revenue different between coupon A vs B?</td>
      <td>Mann-Whitney U test</td>
      <td>Non-parametric; tests medians</td>
    </tr>
    <tr>
      <td>continuous</td>
      <td>paired</td>
      <td>two-sample</td>
      <td>normal</td>
      <td>Did users spend more after redesign?</td>
      <td>Paired t-test</td>
      <td>Assumes differences are normal</td>
    </tr>
    <tr>
      <td>continuous</td>
      <td>paired</td>
      <td>two-sample</td>
      <td>non-normal</td>
      <td>Did time on site change (skewed)?</td>
      <td>Wilcoxon signed-rank test</td>
      <td>Use for non-normal paired diffs</td>
    </tr>
    <tr>
      <td>binary</td>
      <td>independent</td>
      <td>two-sample</td>
      <td>–</td>
      <td>Does new CTA improve conversion?</td>
      <td>Proportions z-test</td>
      <td>Chi-square for raw counts</td>
    </tr>
    <tr>
      <td>binary</td>
      <td>paired</td>
      <td>two-sample</td>
      <td>–</td>
      <td>Do users convert more after badges?</td>
      <td>McNemar’s test</td>
      <td>Use for paired binary outcomes</td>
    </tr>
    <tr>
      <td>categorical</td>
      <td>independent</td>
      <td>multi-sample</td>
      <td>–</td>
      <td>Do plan choices differ across layouts?</td>
      <td>Chi-square test</td>
      <td>Expected counts ≥5 in each cell</td>
    </tr>
    <tr>
      <td>count</td>
      <td>independent</td>
      <td>two-sample</td>
      <td>Poisson</td>
      <td>Do users add more items to cart?</td>
      <td>Poisson / NB test</td>
      <td>Use NB if overdispersion (variance > mean)</td>
    </tr>
    <tr>
      <td>continuous</td>
      <td>independent</td>
      <td>multi-sample</td>
      <td>normal</td>
      <td>Does time spent differ across A/B/C?</td>
      <td>ANOVA</td>
      <td>Welch ANOVA if variances differ</td>
    </tr>
    <tr>
      <td>continuous</td>
      <td>independent</td>
      <td>multi-sample</td>
      <td>non-normal</td>
      <td>Does spend differ across segments?</td>
      <td>Kruskal-Wallis test</td>
      <td>Non-parametric alternative to ANOVA</td>
    </tr>
    <tr>
      <td>any</td>
      <td>any</td>
      <td>any</td>
      <td>–</td>
      <td>Is effect still significant after adjusting for device & region?</td>
      <td>Regression (linear / logistic)</td>
      <td>Use when controlling for covariates</td>
    </tr>
    <tr>
      <td>any</td>
      <td>any</td>
      <td>two-sample</td>
      <td>–</td>
      <td>What’s the probability that B beats A?</td>
      <td>Bayesian A/B test</td>
      <td>Reports posterior probability instead of p-value</td>
    </tr>
    <tr>
      <td>any</td>
      <td>any</td>
      <td>two-sample</td>
      <td>–</td>
      <td>Is observed lift statistically rare?</td>
      <td>Permutation / Bootstrap</td>
      <td>Use when assumptions are violated</td>
    </tr>
  </tbody>
</table>

</details>


___

<a id="data-setup"></a>
# 🗂️ Data Setup

In [1]:
# Display Settings
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display, HTML
import warnings

# Data Transformation Libraries
import numpy as np
import pandas as pd

# Stats Libraries
from scipy.stats import (
    ttest_1samp, ttest_rel, ttest_ind, wilcoxon, mannwhitneyu,
    shapiro, chi2_contingency, f_oneway, kruskal, binom_test, fisher_exact
)
from statsmodels.stats.proportion import proportions_ztest


<a id="generate-data"></a>
#### 🧪 Generate Data from Config


In [2]:
def generate_data_from_config(config, seed=42):
    np.random.seed(seed)
    
    outcome = config['outcome_type']
    group_count = config['group_count']
    relationship = config['group_relationship']
    size = config['sample_size']
    effect = config['effect_size']
    pop_mean = config.get('population_mean', 0)

    # 1️⃣ One-sample case
    if group_count == 'one-sample':
        if outcome == 'continuous':
            values = np.random.normal(loc=pop_mean + effect, scale=1.0, size=size)
            df = pd.DataFrame({'value': values})
        elif outcome == 'binary':
            prob = pop_mean + effect
            values = np.random.binomial(1, prob, size=size)
            df = pd.DataFrame({'value': values})
        else:
            raise NotImplementedError("One-sample generation only supports continuous/binary for now.")

    # 2️⃣ Two-sample case
    elif group_count == 'two-sample':
        if relationship == 'independent':
            if outcome == 'continuous':
                A = np.random.normal(loc=5.0, scale=1.0, size=size)
                B = np.random.normal(loc=5.0 + effect, scale=1.0, size=size)
            elif outcome == 'binary':
                A = np.random.binomial(1, 0.4, size=size)
                B = np.random.binomial(1, 0.4 + effect, size=size)
            else:
                raise NotImplementedError
            df = pd.DataFrame({
                'group': ['A'] * size + ['B'] * size,
                'value': np.concatenate([A, B])
            })
        
        elif relationship == 'paired':
            if outcome == 'continuous':
                before = np.random.normal(loc=5.0, scale=1.0, size=size)
                after = before + effect + np.random.normal(0, 0.5, size=size)
            elif outcome == 'binary':
                before = np.random.binomial(1, 0.4, size=size)
                after = np.random.binomial(1, 0.4 + effect, size=size)
            else:
                raise NotImplementedError
            df = pd.DataFrame({
                'user_id': np.arange(size),
                'group_A': before,
                'group_B': after
            })
        else:
            raise ValueError("Missing or invalid group relationship.")

    else:
        raise NotImplementedError("Multi-sample not supported yet.")

    return df


<a id="define-test-config"></a>
#### ⚙️ Define Test Configuration

In [3]:
config = {
    'outcome_type': 'continuous',        # continuous, binary, categorical, count
    'group_relationship': 'independent', # independent or paired
    'group_count': 'two-sample',         # one-sample, two-sample, multi-sample
    'distribution': None,                # normal or non-normal → to be inferred
    'variance_equal': None,              # equal or unequal → to be inferred
    'tail_type': 'two-tailed',           # or 'one-tailed'
    'parametric': None,                  # True or False → to be inferred
    'alpha': 0.05,                       # significance level
    'sample_size': 100,                  # per group
    'effect_size': 0.5,                  # for generating synthetic difference
    # 'generated_data': None               # placeholder for attached df
}
config


{'outcome_type': 'continuous',
 'group_relationship': 'independent',
 'group_count': 'two-sample',
 'distribution': None,
 'variance_equal': None,
 'tail_type': 'two-tailed',
 'parametric': None,
 'alpha': 0.05,
 'sample_size': 100,
 'effect_size': 0.5}

In [4]:
df = generate_data_from_config(config)
df

Unnamed: 0,group,value
0,A,5.496714
1,A,4.861736
2,A,5.647689
3,A,6.523030
4,A,4.765847
...,...,...
195,B,5.885317
196,B,4.616143
197,B,5.653725
198,B,5.558209


[Back to the top](#table-of-contents)
___


<a id="test-setup"></a>

# 🛠️ Test Setup


<details><summary><strong>📖 Test Settings Explanation (Click to Expand)</strong></summary>

### 📊 **Test Type (test_type)**
This setting defines the type of test you want to perform.

- **one_sample**: Comparing the sample mean against a known value (e.g., a population mean).
- **two_sample**: Comparing the means of two independent groups (e.g., A vs B).
- **paired**: Comparing means from the same group at two different times (before vs after).
- **proportions**: Comparing proportions (e.g., the conversion rates of two groups).

**Example**: You might want to test if the mean age of two groups of people (Group A and Group B) differs, or if the proportion of people who converted in each group is different.

### 📏 **Tail Type (tail_type)**
This setting determines whether you are performing a one-tailed or two-tailed test.

- **one_tailed**: You are testing if the value is greater than or less than the reference value (directional).
- **two_tailed**: You are testing if the value is different from the reference value, either higher or lower (non-directional).

**Example**:  
- **One-tailed**: Testing if new treatment increases sales (you only care if it's greater).  
- **Two-tailed**: Testing if there is any difference in sales between two treatments (it could be either an increase or decrease).

### 🧮 **Parametric (parametric)**
This setting indicates whether the test is **parametric** or **non-parametric**.

- **True (Parametric)**: This means we assume that the data follows a certain distribution, often a **normal distribution**. The most common parametric tests are **t-tests** and **z-tests**. Parametric tests are generally more powerful if the assumptions are met.
  
- **False (Non-Parametric)**: Non-parametric tests don’t assume any specific distribution. These are used when the data doesn’t follow a normal distribution or when the sample size is small. Examples include **Mann-Whitney U** (alternative to the t-test) and **Wilcoxon Signed-Rank** (alternative to paired t-test).

**Why does this matter?**  
Parametric tests tend to be more powerful because they make assumptions about the distribution of the data (e.g., normality). Non-parametric tests are more flexible and can be used when these assumptions are not met, but they may be less powerful.

### 📊 **Equal Variance (equal_variance)**
This setting is used specifically for **two-sample t-tests**.

- **True**: Assumes that the two groups have **equal variances** (i.e., the spread of data is the same in both groups). This is used for the **pooled t-test**.
  
- **False**: Assumes the two groups have **different variances**. This is used for the **Welch t-test**, which is more robust when the assumption of equal variances is violated.

**Why is this important?**  
If the variances are not equal, using a pooled t-test (which assumes equal variance) can lead to incorrect conclusions. The Welch t-test is safer when in doubt about the equality of variances.

### 🔑 **Significance Level (alpha)**
The **alpha** level is your **threshold for statistical significance**.

- Commonly set at **0.05**, this means that you are willing to accept a 5% chance of wrongly rejecting the null hypothesis (i.e., a 5% chance of a Type I error).
  
- If the **p-value** (calculated from your test) is less than **alpha**, you reject the null hypothesis. If it's greater than alpha, you fail to reject the null hypothesis.

**Example**:  
- **alpha = 0.05** means there’s a 5% risk of concluding that a treatment has an effect when it actually doesn’t.

### 🎯 **Putting It All Together**

For instance, let's say you're testing if a new feature (Group A) increases user engagement compared to the existing feature (Group B). Here’s how each configuration works together:

- **test_type** = `'two_sample'`: You're comparing two independent groups (A vs B).
- **tail_type** = `'two_tailed'`: You’re testing if there’s any difference (increase or decrease) in engagement.
- **parametric** = `True`: You assume the data is normally distributed, so a t-test will be appropriate.
- **equal_variance** = `True`: You assume the two groups have equal variance, so you’ll use a pooled t-test.
- **alpha** = `0.05`: You’re using a 5% significance level for your hypothesis test.

</details>


<a id="print-config"></a>
#### 📋 Print Config Summary

In [5]:
def print_config_summary(config):
    print("📋 Hypothesis Test Configuration Summary\n")

    print(f"🔸 Outcome Type           : {config['outcome_type']}")
    print(f"🔸 Group Relationship      : {config['group_relationship']}")
    print(f"🔸 Group Count             : {config['group_count']}")
    print(f"🔸 Distribution of Outcome : {config['distribution']}")
    print(f"🔸 Equal Variance          : {config['variance_equal']}")
    print(f"🔸 Parametric Test         : {config['parametric']}")
    print(f"🔸 Tail Type               : {config['tail_type']}")
    print(f"🔸 Significance Level α    : {config['alpha']}")

    print("\n🧠 Inference Summary:")
    if config['group_count'] == 'one-sample':
        print("→ This is a one-sample test comparing a sample to a known value.")
    elif config['group_count'] == 'two-sample':
        if config['group_relationship'] == 'independent':
            print("→ Comparing two independent groups (A vs B).")
        elif config['group_relationship'] == 'paired':
            print("→ Comparing paired measurements (before vs after, same users).")
    
    # print("\n🧪 Selected Test:")
    # print(f"✅ {determine_test_to_run(config)}")


In [6]:
config
print_config_summary(config)

{'outcome_type': 'continuous',
 'group_relationship': 'independent',
 'group_count': 'two-sample',
 'distribution': None,
 'variance_equal': None,
 'tail_type': 'two-tailed',
 'parametric': None,
 'alpha': 0.05,
 'sample_size': 100,
 'effect_size': 0.5}

📋 Hypothesis Test Configuration Summary

🔸 Outcome Type           : continuous
🔸 Group Relationship      : independent
🔸 Group Count             : two-sample
🔸 Distribution of Outcome : None
🔸 Equal Variance          : None
🔸 Parametric Test         : None
🔸 Tail Type               : two-tailed
🔸 Significance Level α    : 0.05

🧠 Inference Summary:
→ Comparing two independent groups (A vs B).


[Back to the top](#table-of-contents)
___


<a id="inference"></a>

# 📈 Inference

<a id="infer-distribution"></a>
#### 🔍 Infer Distribution


In [7]:
from scipy.stats import shapiro

def infer_distribution_from_data(config, df):
    group_count = config['group_count']
    relationship = config['group_relationship']
    outcome = config['outcome_type']

    if outcome != 'continuous':
        config['distribution'] = 'NA'
        return config

    if group_count == 'one-sample':
        stat, p = shapiro(df['value'])
        config['distribution'] = 'normal' if p > 0.05 else 'non-normal'
        return config
    
    elif group_count == 'two-sample':
        if relationship == 'independent':
            a = df[df['group'] == 'A']['value']
            b = df[df['group'] == 'B']['value']
        elif relationship == 'paired':
            a = df['group_A']
            b = df['group_B']
        else:
            config['distribution'] = 'NA'
            return config

        p1 = shapiro(a)[1]
        p2 = shapiro(b)[1]
        config['distribution'] = 'normal' if (p1 > 0.05 and p2 > 0.05) else 'non-normal'
        return config
    
    else:
        config['distribution'] = 'NA'
        return config

<a id="infer-variance"></a>
#### 📏 Infer Variance


In [8]:
from scipy.stats import levene

def infer_variance_equality(config, df):
    if config['group_count'] != 'two-sample' or config['group_relationship'] != 'independent':
        config['variance_equal'] = 'NA'
        return config

    a = df[df['group'] == 'A']['value']
    b = df[df['group'] == 'B']['value']
    stat, p = levene(a, b)
    config['variance_equal'] = 'equal' if p > 0.05 else 'unequal'
    return config

<a id="infer-parametric"></a>
#### 📏 Infer Parametric Flag

In [9]:
def infer_parametric_flag(config):
    if config['outcome_type'] != 'continuous':
        config['parametric'] = 'NA'
        return config

    is_normal = config['distribution'] == 'normal'
    is_equal_var = config['variance_equal'] in ['equal', 'NA']  # NA = not required for paired

    config['parametric'] = True if is_normal and is_equal_var else False
    return config

In [10]:
config
print_config_summary(config)

{'outcome_type': 'continuous',
 'group_relationship': 'independent',
 'group_count': 'two-sample',
 'distribution': None,
 'variance_equal': None,
 'tail_type': 'two-tailed',
 'parametric': None,
 'alpha': 0.05,
 'sample_size': 100,
 'effect_size': 0.5}

📋 Hypothesis Test Configuration Summary

🔸 Outcome Type           : continuous
🔸 Group Relationship      : independent
🔸 Group Count             : two-sample
🔸 Distribution of Outcome : None
🔸 Equal Variance          : None
🔸 Parametric Test         : None
🔸 Tail Type               : two-tailed
🔸 Significance Level α    : 0.05

🧠 Inference Summary:
→ Comparing two independent groups (A vs B).


[Back to the top](#table-of-contents)
___


<a id="hypothesis-testing"></a>
<h1>🧪 Hypothesis Testing</h1>

<a id="determine-test"></a>
#### 🧭 Determine Test

In [11]:
def determine_test_to_run(config):
    outcome = config['outcome_type']
    group_rel = config['group_relationship']
    group_count = config['group_count']
    dist = config['distribution']
    equal_var = config['variance_equal']
    parametric = config['parametric']

    # 1️⃣ One-sample cases
    if group_count == 'one-sample':
        if outcome == 'continuous':
            return 'one_sample_ttest' if dist == 'normal' else 'one_sample_wilcoxon'
        elif outcome == 'binary':
            return 'one_proportion_ztest'
    
    # 2️⃣ Two-sample independent
    if group_count == 'two-sample' and group_rel == 'independent':
        if outcome == 'continuous':
            if parametric:
                return 'two_sample_ttest_pooled' if equal_var == 'equal' else 'two_sample_ttest_welch'
            else:
                return 'mann_whitney_u'
        elif outcome == 'binary':
            return 'two_proportion_ztest'
        elif outcome == 'categorical':
            return 'chi_square'

    # 3️⃣ Two-sample paired
    if group_count == 'two-sample' and group_rel == 'paired':
        if outcome == 'continuous':
            return 'paired_ttest' if parametric else 'wilcoxon_signed_rank'
        elif outcome == 'binary':
            return 'mcnemar'

    # 4️⃣ Multi-group continuous
    if group_count == 'multi-sample' and outcome == 'continuous':
        return 'anova' if dist == 'normal' else 'kruskal_wallis'

    # 5️⃣ Multi-group categorical
    if group_count == 'multi-sample' and outcome == 'categorical':
        return 'chi_square'

    # 6️⃣ Count data
    if outcome == 'count':
        return 'poisson_test'

    # 7️⃣ Adjusted models or Bayesian
    # Could be added later

    return 'test_not_found'
determine_test_to_run(config)


'mann_whitney_u'

<a id="print-hypothesis"></a>
#### 🧠 Print Hypothesis

In [12]:
def print_hypothesis_statement(config):
    test = determine_test_to_run(config)
    outcome = config['outcome_type']
    tail = config['tail_type']
    group_count = config['group_count']
    rel = config['group_relationship']

    print("🧪 Hypothesis Statement\n")

    # One-sample (mean or proportion)
    if test == 'one_sample_ttest':
        print("H₀: The average outcome equals the benchmark value.")
        print("H₁: The average outcome is different from the benchmark." if tail == 'two-tailed' else "H₁: The average outcome is greater/less than the benchmark.")
    
    elif test == 'one_proportion_ztest':
        print("H₀: The proportion equals the expected baseline.")
        print("H₁: The proportion is different from the baseline." if tail == 'two-tailed' else "H₁: The proportion is greater/less than the baseline.")

    # Two-sample, independent
    elif test in ['two_sample_ttest_pooled', 'two_sample_ttest_welch', 'mann_whitney_u', 'two_proportion_ztest']:
        print("H₀: The average (or proportion) is the same across groups A and B.")
        print("H₁: The average (or proportion) differs between groups." if tail == 'two-tailed' else "H₁: Group B is greater/less than group A.")
    
    # Paired
    elif test in ['paired_ttest', 'wilcoxon_signed_rank']:
        print("H₀: There is no average difference between the paired values (before vs after).")
        print("H₁: There is an average difference in the paired values." if tail == 'two-tailed' else "H₁: After is greater/less than before.")

    elif test == 'mcnemar':
        print("H₀: Proportion of success before = after.")
        print("H₁: Proportion of success changed after treatment.")

    # Multi-group
    elif test in ['anova', 'kruskal_wallis']:
        print("H₀: All group means (or distributions) are equal.")
        print("H₁: At least one group differs.")

    elif test == 'chi_square':
        print("H₀: Distribution of categories is the same across groups.")
        print("H₁: At least one category is distributed differently across groups.")

    # Count data
    elif test == 'poisson_test':
        print("H₀: Count rate (λ) is equal across groups.")
        print("H₁: Count rate differs between groups.")

    # Bayesian / permutation
    elif test == 'bayesian_ab':
        print("Posterior probability that Group B is better than Group A.")
    elif test == 'permutation_test':
        print("H₀: The observed difference is due to chance.")
        print("H₁: The observed difference is rare under random shuffling.")

    else:
        print("❓ Could not determine hypothesis statements for test:", test)
print_hypothesis_statement(config)

🧪 Hypothesis Statement

H₀: The average (or proportion) is the same across groups A and B.
H₁: The average (or proportion) differs between groups.


<a id="run-test"></a>
#### 🧪 Run Hypothesis Test

In [13]:
def run_hypothesis_test(config, df):
    test_name = determine_test_to_run(config)
    alpha = config.get('alpha', 0.05)

    result = {
        'test': test_name,
        'statistic': None,
        'p_value': None,
        'significant': None,
        'alpha': alpha
    }

    try:
        if test_name == 'one_sample_ttest':
            stat, p = ttest_1samp(df['value'], config['population_mean'])

        elif test_name == 'one_sample_wilcoxon':
            stat, p = wilcoxon(df['value'] - config['population_mean'])

        elif test_name == 'one_proportion_ztest':
            x = np.sum(df['value'])
            n = len(df)
            stat, p = proportions_ztest(x, n, value=config['population_mean'])

        elif test_name == 'two_sample_ttest_pooled':
            a = df[df['group'] == 'A']['value']
            b = df[df['group'] == 'B']['value']
            stat, p = ttest_ind(a, b, equal_var=True)

        elif test_name == 'two_sample_ttest_welch':
            a = df[df['group'] == 'A']['value']
            b = df[df['group'] == 'B']['value']
            stat, p = ttest_ind(a, b, equal_var=False)

        elif test_name == 'mann_whitney_u':
            a = df[df['group'] == 'A']['value']
            b = df[df['group'] == 'B']['value']
            stat, p = mannwhitneyu(a, b, alternative='two-sided')

        elif test_name == 'paired_ttest':
            stat, p = ttest_rel(df['group_A'], df['group_B'])

        elif test_name == 'wilcoxon_signed_rank':
            stat, p = wilcoxon(df['group_A'], df['group_B'])

        elif test_name == 'two_proportion_ztest':
            a = df[df['group'] == 'A']['value']
            b = df[df['group'] == 'B']['value']
            counts = [np.sum(a), np.sum(b)]
            nobs = [len(a), len(b)]
            stat, p = proportions_ztest(count=counts, nobs=nobs)

        elif test_name == 'mcnemar':
            # Contingency table: [ [before+after], [before only], [after only], [neither] ]
            both = np.sum((df['group_A'] == 1) & (df['group_B'] == 1))
            before_only = np.sum((df['group_A'] == 1) & (df['group_B'] == 0))
            after_only = np.sum((df['group_A'] == 0) & (df['group_B'] == 1))
            neither = np.sum((df['group_A'] == 0) & (df['group_B'] == 0))
            table = np.array([[both, before_only], [after_only, neither]])
            stat, p = chi2_contingency(table, correction=True)[:2]

        elif test_name == 'anova':
            groups = [g['value'].values for _, g in df.groupby('group')]
            stat, p = f_oneway(*groups)

        elif test_name == 'kruskal_wallis':
            groups = [g['value'].values for _, g in df.groupby('group')]
            stat, p = kruskal(*groups)

        elif test_name == 'chi_square':
            contingency = pd.crosstab(df['group'], df['value'])
            stat, p, _, _ = chi2_contingency(contingency)

        else:
            warnings.warn(f"Test not implemented: {test_name}")
            return result

        result['statistic'] = stat
        result['p_value'] = p
        result['significant'] = p < alpha
        return result

    except Exception as e:
        warnings.warn(f"Error running test: {e}")
        return result


In [14]:
run_hypothesis_test(config, df)

{'test': 'mann_whitney_u',
 'statistic': 3300.0,
 'p_value': 3.288061008165521e-05,
 'significant': True,
 'alpha': 0.05}

[Back to the top](#table-of-contents)
___


<a id="test-summary"></a>
# 📊 Test Summary


[Back to the top](#table-of-contents)
___


<a id="full-pipeline"></a>
# 🚀 Full Pipeline


[Back to the top](#table-of-contents)
___
