# Interactive t test demonstration

Run the following cell to see a histogram plot. Each colour on the histogram represents a different group since we are using an independant samples t test. Let's pretend these groups are people who cycle to work and people who use a train. The dependant variable could be mood on arrival (positive-negative).


Play with the sliders to change the parameters of sample size, mean difference and within-group variance. These are the core components we use to decide if there is a significant difference between two groups. Notice how dinstinguishable the two groups are on the histogram as well as the p value.


We can be more **confident** that there is a difference between groups when the sample size is greater, and the differences between people within each group (within group variance) is relatively low compared with the difference between each group (the mean difference). In performing the statistical test, we are trying to estimate how much of the overall variance in the sample (differences in mood measured) is due to the different groups, and how much is due to all other factors such as random noise, measurement error or variables that we didn't include in the analysis.

### Go ahead and play!

In [32]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind
import ipywidgets as widgets
from IPython.display import display

def generate_data(mean_diff=0, within_group_var=1, n=100):
    """
    Generate two groups of data with specified mean difference and within-group variance.

    Parameters:
    - mean_diff: Difference in means between the two groups.
    - within_group_var: Variance within each group.
    - n: Number of data points in each group.

    Returns:
    - group1, group2: Two numpy arrays containing the data for the two groups.
    """
    group1 = np.random.normal(0, np.sqrt(within_group_var), n)
    group2 = np.random.normal(mean_diff, np.sqrt(within_group_var), n)
    return group1, group2

def generate_data(mean_diff=0, within_group_var=1, n=100):
    """
    Generate two groups of data with specified mean difference and within-group variance.

    Parameters:
    - mean_diff: Difference in means between the two groups.
    - within_group_var: Variance within each group.
    - n: Number of data points in each group.

    Returns:
    - group1, group2: Two numpy arrays containing the data for the two groups.
    """
    group1 = np.random.normal(0, np.sqrt(within_group_var), n)
    group2 = np.random.normal(mean_diff, np.sqrt(within_group_var), n)
    return group1, group2
def plot_and_compute_p(mean_diff, within_group_var, n):
    # Generate data
    group1, group2 = generate_data(mean_diff, within_group_var, n)

    # Perform t-test
    _, p_value = ttest_ind(group1, group2)
    
    # Determine text color based on p-value
    p_color = 'green' if p_value < 0.05 else 'black'

    # Plot
    plt.figure(figsize=(10, 6))
    plt.hist(group1, bins=30, alpha=0.5, label='Group 1 (train)')
    plt.hist(group2, bins=30, alpha=0.5, label='Group 2 (cycle)')
    plt.legend()
    plt.title(f'Distribution of Two Groups with Mean Difference of {mean_diff} \n and sample size {n}', fontsize=14)
    # Add p-value in different color
    plt.text(0.05, 0.95, f'p-value: {p_value:.3f}', horizontalalignment='left', verticalalignment='top', transform=plt.gca().transAxes, color=p_color, fontsize=14)
    # plt.title(f'Distribution of Two Groups with Mean Difference of {mean_diff} and p-value: {p_value:.3f}')
    plt.xlabel('Value (mood)')
    plt.ylabel('Frequency')
    plt.show()

# Create interactive widgets

# Widget layout configuration
layout = widgets.Layout(width='500px', display='flex', justify_content='center')  # Adjust 'width' as needed

mean_diff_slider = widgets.FloatSlider(value=2, min=0, max=5, step=0.1, description='Mean Difference:', layout=layout)
within_group_var_slider = widgets.FloatSlider(value=1, min=0.1, max=5, step=0.1, description='Variance:', layout=layout)
sample_size_slider = widgets.IntSlider(value=100, min=10, max=500, step=10, description='Sample Size:', layout=layout)

# Stack widgets vertically and center
ui = widgets.VBox([mean_diff_slider, within_group_var_slider, sample_size_slider], layout=widgets.Layout(display='flex', flex_flow='column', align_items='center', width='100%'))
out = widgets.interactive_output(plot_and_compute_p, {'mean_diff': mean_diff_slider, 'within_group_var': within_group_var_slider, 'n': sample_size_slider})

display(ui, out)

# mean_diff_slider = widgets.FloatSlider(value=2, min=0, max=5, step=0.1, description='Mean Difference between groups:')
# within_group_var_slider = widgets.FloatSlider(value=1, min=0.1, max=5, step=0.1, description='Variance within groups:')
# sample_size_slider = widgets.IntSlider(value=100, min=10, max=500, step=10, description='Sample Size:')

# widgets.interactive(plot_and_compute_p, mean_diff=mean_diff_slider, within_group_var=within_group_var_slider, n=sample_size_slider)

VBox(children=(FloatSlider(value=2.0, description='Mean Difference:', layout=Layout(display='flex', justify_co…

Output()

You have now built an intuitive understanding of statistical testing which can be applied to other more complex scenarios. Consider how these parameters affect tests with 3 groups or additional variables, such as rainy days for this example. 

Stay tuned for upcoming demonstrations of other statistics including ANOVA, regression and clustering analyis. We will also explore how statistical assumptions affect tests e.g., homogeniety of variance, imbalanced samples, non-normal distributions.