# Python T-Test and Confidence Interval Estimation Demo

This notebook demonstrates the usage of the Python T-Test library for statistical analysis.

## Installation

First, make sure you have the required dependencies installed:

```bash
uv sync
```

In [None]:
import random
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.append('.')
from ttest import calculate_confidence_interval, perform_t_test

# Set random seed for reproducibility
random.seed(1234)
np.random.seed(1234)

## Basic Usage Example

Let's start with a simple example comparing two groups of data.

In [None]:
# Generate sample data
points1 = random.sample(range(10, 30), 10)
points2 = random.sample(range(15, 35), 10)

print("Group 1 data:", points1)
print("Group 2 data:", points2)
print("Group 1 mean:", np.mean(points1))
print("Group 2 mean:", np.mean(points2))

### Performing a T-Test

The t-test helps us determine if there's a statistically significant difference between the means of two groups.

In [None]:
# Perform two-sided t-test
p_value = perform_t_test(points1, points2)
print(f"P-value for two-sided t-test: {p_value:.6f}")

# Interpret the result
alpha = 0.05
if p_value < alpha:
    print(f"Result: Statistically significant difference (p < {alpha})")
else:
    print(f"Result: No statistically significant difference (p >= {alpha})")

In [None]:
# Perform one-sided t-test
p_value_one_sided = perform_t_test(points1, points2, two_sided=False)
print(f"P-value for one-sided t-test: {p_value_one_sided:.6f}")
print("One-sided test checks if Group 1 mean > Group 2 mean")

### Calculating Confidence Intervals

Confidence intervals give us a range of plausible values for the population mean.

In [None]:
# Calculate 95% confidence interval for Group 1
ci_95 = calculate_confidence_interval(points1, 0.95)
print(f"95% Confidence Interval for Group 1: [{ci_95[0]:.2f}, {ci_95[1]:.2f}]")

# Calculate 90% confidence interval for Group 1
ci_90 = calculate_confidence_interval(points1, 0.90)
print(f"90% Confidence Interval for Group 1: [{ci_90[0]:.2f}, {ci_90[1]:.2f}]")

print(f"\nGroup 1 sample mean: {np.mean(points1):.2f}")
print("Notice how the 90% CI is narrower than the 95% CI")

## Visualization

Let's visualize our data and the confidence intervals.

In [None]:
# Create visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Plot 1: Box plots of the two groups
ax1.boxplot([points1, points2], labels=['Group 1', 'Group 2'])
ax1.set_title('Distribution Comparison')
ax1.set_ylabel('Values')
ax1.grid(True, alpha=0.3)

# Plot 2: Confidence intervals
groups = ['Group 1', 'Group 2']
means = [np.mean(points1), np.mean(points2)]
ci_95_group1 = calculate_confidence_interval(points1, 0.95)
ci_95_group2 = calculate_confidence_interval(points2, 0.95)

errors = [
    [means[0] - ci_95_group1[0], means[1] - ci_95_group2[0]],
    [ci_95_group1[1] - means[0], ci_95_group2[1] - means[1]]
]

ax2.errorbar(groups, means, yerr=errors, fmt='o', capsize=5, capthick=2)
ax2.set_title('95% Confidence Intervals')
ax2.set_ylabel('Mean Values')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"T-test p-value: {p_value:.6f}")
print(f"Interpretation: {'Significant difference' if p_value < 0.05 else 'No significant difference'} between groups")

## Real-World Example: A/B Testing

Let's simulate an A/B test scenario where we're comparing conversion rates between two website designs.

In [None]:
# Simulate A/B test data
# Design A: baseline conversion times (in seconds)
design_a_times = np.random.normal(45, 8, 50)  # mean=45s, std=8s, n=50
# Design B: improved conversion times
design_b_times = np.random.normal(42, 7, 50)  # mean=42s, std=7s, n=50

print("A/B Test: Website Conversion Times")
print(f"Design A - Mean: {np.mean(design_a_times):.2f}s, Std: {np.std(design_a_times):.2f}s")
print(f"Design B - Mean: {np.mean(design_b_times):.2f}s, Std: {np.std(design_b_times):.2f}s")

# Perform statistical test
ab_p_value = perform_t_test(design_a_times, design_b_times)
print(f"\nT-test p-value: {ab_p_value:.6f}")

# Calculate confidence intervals
ci_a = calculate_confidence_interval(design_a_times)
ci_b = calculate_confidence_interval(design_b_times)

print(f"\nDesign A 95% CI: [{ci_a[0]:.2f}, {ci_a[1]:.2f}] seconds")
print(f"Design B 95% CI: [{ci_b[0]:.2f}, {ci_b[1]:.2f}] seconds")

# Business interpretation
if ab_p_value < 0.05:
    improvement = np.mean(design_a_times) - np.mean(design_b_times)
    print(f"\n✅ RECOMMENDATION: Deploy Design B")
    print(f"   Statistically significant improvement of {improvement:.2f} seconds")
else:
    print(f"\n❌ RECOMMENDATION: Keep current design")
    print(f"   No statistically significant difference detected")

## Summary

This notebook demonstrated:

1. **T-tests**: Compare means between two groups
   - Two-sided: tests if means are different
   - One-sided: tests if one mean is greater than another

2. **Confidence Intervals**: Estimate range of plausible values for population mean
   - Higher confidence levels → wider intervals
   - Larger sample sizes → narrower intervals

3. **Practical Applications**: A/B testing, quality control, research studies

4. **Statistical Power**: Larger samples help detect smaller differences

### Key Takeaways:
- Always visualize your data before testing
- Consider practical significance alongside statistical significance
- Larger sample sizes increase statistical power
- Confidence intervals provide more information than just p-values