# Why Python? A Hands-On Comparison with JASP

You just learned how to run statistical tests in JASP. Now let's do the **exact same tests** in Python and see what we gain.

## Why bother with code when JASP exists?

JASP is excellent for learning statistical concepts and running quick analyses. Python offers additional capabilities that become valuable as your research grows:

| | JASP (GUI) | Python (Code) |
|---|---|---|
| **Reproducibility** | Need to remember which buttons you clicked | Save code, run again anytime — same result guaranteed |
| **Automation** | Analyze 50 brain regions? Click 50 times | Write a loop — done in seconds |
| **Flexibility** | Limited to built-in options | Create any analysis you can describe |
| **Transparency** | Results appear; steps are hidden | You control (and can inspect) each step |
| **Sharing** | Colleague needs JASP installed | Share a .py file or notebook, anyone can run it|
| **Scale** | Great for one dataset at a time | Process thousands of files overnight |

**Transferable skills:** Learning Python for statistics opens doors to machine learning, neuroimaging analysis, bioinformatics, finance, web development, and more. The syntax you learn today applies across all these fields.

**Bottom line:** JASP is great for learning concepts and quick checks. Python gives you power and flexibility for real research.

Let's prove it. We'll replicate everything you just did in JASP — and then go further.

---

## Part 1: Setup and Loading Data

First, let's load the Python libraries we need. Think of libraries as toolboxes — each one gives us specific capabilities.

In [None]:
# Import the libraries we'll use throughout this lab
# Each line loads a different "toolbox" of functions

import pandas as pd              # pandas: for loading and working with data tables
import numpy as np               # numpy: for numerical calculations
import matplotlib.pyplot as plt  # matplotlib: for creating basic plots
import seaborn as sns            # seaborn: for prettier statistical plots
import scipy.stats as stats      # scipy.stats: for statistical tests

# --- Colab Setup: Download data files from GitHub ---
# This cell automatically downloads the data files if running in Google Colab
import os
if 'google.colab' in str(get_ipython()) or not os.path.exists('class_data_undergrad.xlsx'):
    import urllib.request
    base_url = 'https://raw.githubusercontent.com/cmahlen/python-stats-demo/main/'
    files = ['class_data_undergrad.xlsx', 'ttest_data.xlsx', 'class_data_longitudinal.xlsx']
    for f in files:
        if not os.path.exists(f):
            print(f"Downloading {f}...")
            urllib.request.urlretrieve(base_url + f, f)
    print("Data files ready!")
else:
    print("Using local data files")

print("Libraries loaded successfully!")

### The Study

A pharmacology experiment examining how dopaminergic drugs affect locomotor activity in rodents.

**Dependent Variable**: Number of squares entered in an open-field test (measure of locomotion)

**Independent Variables** (8 drug treatment groups):
- **Control**: No drug (baseline)
- **Amph**: Amphetamine (dopamine releaser — increases locomotion)
- **Res only**: Reserpine (depletes dopamine — decreases locomotion)
- **Res+Amph**: Can amphetamine overcome reserpine's effects?
- **Res+MT**: Reserpine + alpha-methyltyrosine (blocks dopamine synthesis)
- **Res+MT+Amph**: Triple combination
- **Res+MT+DOPA**: L-DOPA (dopamine precursor) to restore function?
- **Res+MT+Amph+DOPA**: Maximum restoration attempt

### Loading Data: The Peek-Then-Use Pattern

**Good habit:** Always look at your data before analyzing it! This helps you catch problems early.

We use `pd.read_excel()` to load Excel files. The `pd.` part means "use the pandas library."

In [None]:
# Load the ANOVA dataset from an Excel file
# pd.read_excel() reads the file and stores it in a variable called anova_df
# "df" is short for "DataFrame" — pandas' name for a data table

anova_df = pd.read_excel('class_data_undergrad.xlsx')

# .head() shows the first 5 rows — a quick peek at your data
# This is your first sanity check: do the columns look right?
anova_df.head()

In [None]:
# You can also look at the last five rows with .tail()
anova_df.tail()

In [None]:
# More quality checks — always do these after loading data!

print("DATA QUALITY CHECK")
print("=" * 40)

# len() counts how many rows
print(f"Number of rows: {len(anova_df)}")

# .columns gives us the column names
print(f"Column names: {list(anova_df.columns)}")

# .isna().sum().sum() counts ALL missing values in the entire dataset
print(f"Missing values: {anova_df.isna().sum().sum()}")

# .unique() shows all the different values in a column
print(f"Groups in the data: {anova_df['Group'].unique()}")

### Visualize Before Analyzing!

**Good habit:** Always plot your data before running statistics. Visualizations help you:
- Spot outliers or data entry errors
- See the pattern before confirming it with numbers
- Choose the right statistical test

We'll use a **swarm plot** which shows every individual data point. This is better than a box plot because you see the *actual data*, not just a summary.

In [None]:
# Create a swarm plot showing every data point
# sns.swarmplot() is from seaborn — it plots each observation as a dot

plt.figure(figsize=(12, 5))  # figsize sets the width and height in inches

sns.swarmplot(data=anova_df,           # which dataset to use
              x='Group',                # what goes on the x-axis
              y='Squares entered',      # what goes on the y-axis
              color='steelblue',        # color of the dots (try 'red', 'green', etc.!)
              size=6)                   # size of each dot (try 4, 8, 10!)

# Try experimenting with rotation= to see what it does. 
plt.xticks(rotation=45, ha='right')  # rotate x labels so they don't overlap
plt.title('Locomotor Activity by Treatment Group')
plt.ylabel('Squares Entered')
plt.xlabel('Treatment Group')
plt.tight_layout()  # prevents labels from getting cut off
plt.show()

print("What patterns do you notice? Which groups look different from Control?")
print("\nTry changing the color to 'coral' or 'forestgreen' and re-run the cell!")
print("\nYou might notice there is a warning telling you to use stripplot.")
print("\nTry it! Replace swarmplot with stripplot.")

You can use virtually infinitely many colors here. For just some, check out 

https://matplotlib.org/stable/gallery/color/named_colors.html

In [None]:
# Calculate descriptive statistics by group
# This is what JASP calls "Descriptives"

# .groupby('Group') splits the data by treatment group
# ['Squares entered'] selects just that column
# .agg() calculates multiple statistics at once

descriptives = anova_df.groupby('Group')['Squares entered'].agg(['count', 'mean', 'std'])

# Rename columns to be clearer
descriptives.columns = ['N', 'Mean', 'SD']

# Round to 2 decimal places for readability
descriptives = descriptives.round(2)

# Reorder groups logically (not alphabetically)
group_order = ['Control', 'Amph', 'Res only', 'Res+Amph', 'Res+MT', 'Res+MT+Amph', 'Res+MT+DOPA', 'Res+MT+Amph+DOPA']
descriptives = descriptives.reindex(group_order)

print("Descriptive Statistics by Treatment Group")
print("=" * 50)
print(descriptives.to_string())
print("\nCompare with your JASP output!")

---

## Part 2: T-Tests

Now let's load a different dataset for t-tests. This one compares a "New" treatment to an "Old" treatment.

In [None]:
# Load the t-test data
# This file has two columns: 'New' and 'Old'

ttest_df = pd.read_excel('ttest_data.xlsx')

# Always peek at your data first!
print("First few rows:")
print(ttest_df.head())

print(f"\nNumber of rows: {len(ttest_df)}")
print(f"Columns: {list(ttest_df.columns)}")

In [None]:
# Extract the two groups as separate arrays
# .dropna() removes any missing values (NaN)
# .values converts from pandas Series to numpy array

new_data = ttest_df['New'].dropna().values
old_data = ttest_df['Old'].dropna().values

# Check that we got the data
print(f"New group: n={len(new_data)}, mean={new_data.mean():.2f}, SD={new_data.std(ddof=1):.2f}")
print(f"Old group: n={len(old_data)}, mean={old_data.mean():.2f}, SD={old_data.std(ddof=1):.2f}")

### Visualize Before Testing

Before running any statistical test, we should look at the data. Let's create a histogram to see the distribution of each group.

In [None]:
# Create overlapping histograms to compare distributions
# plt.hist() creates a histogram

plt.figure(figsize=(8, 4))

# Plot both groups on the same axes
# alpha controls transparency (0=invisible, 1=solid) — try different values!
# bins controls how many bars — try 5, 10, 15, 20!
plt.hist(new_data, bins=8, alpha=0.6, label='New', color='steelblue', edgecolor='black')
plt.hist(old_data, bins=8, alpha=0.6, label='Old', color='coral', edgecolor='black')

plt.xlabel('Score')
plt.ylabel('Count')
plt.title('Comparing New vs Old Treatment Groups')
plt.legend()  # shows which color is which group
plt.show()

print("Try changing bins=8 to bins=5 or bins=15 and see how the plot changes!")
print("Try changing alpha=0.6 to alpha=0.3 or alpha=0.9!")

### Independent Samples T-Test

**Question**: Is there a difference between the New and Old groups?

In JASP: T-Tests → Independent Samples T-Test → drag variables → click options...

In Python: One line of code!

In [None]:
# Run an independent samples t-test
# stats.ttest_ind() compares two independent groups
# It returns two values: the t-statistic and the p-value

t_stat, p_value = stats.ttest_ind(new_data, old_data)

# Print the results
print("INDEPENDENT SAMPLES T-TEST")
print("=" * 40)
print(f"New group mean: {new_data.mean():.2f} (n={len(new_data)})")
print(f"Old group mean: {old_data.mean():.2f} (n={len(old_data)})")
print(f"\nt = {t_stat:.3f}")
print(f"p = {p_value:.4f}")

# Interpret the result
if p_value < 0.05:
    print("\nSignificant at alpha = 0.05? Yes")
else:
    print("\nSignificant at alpha = 0.05? No")

---

### Your Turn: Create a Different Visualization

We just made a histogram. Now try creating a **swarm plot** comparing the New and Old groups.

You'll need to use `sns.swarmplot()` like we did earlier. But the t-test data is in a different format (two separate columns instead of a 'Group' column), so we need to restructure it first.

The code below restructures the data for you. Your job is to fill in the `sns.swarmplot()` call.

<details>
<summary>Click to reveal answer</summary>

```python
sns.swarmplot(data=ttest_long, x='Group', y='Score', color='purple', size=8)
plt.title('New vs Old Treatment')
plt.show()
```
Yours doesn't need to be exactly like this, but this will work :) 
</details>

In [None]:
# First, restructure the data into "long format" for seaborn
# (This code is provided for you — just run it)
ttest_long = pd.DataFrame({
    'Group': ['New']*len(new_data) + ['Old']*len(old_data),
    'Score': list(new_data) + list(old_data)
})

In [None]:
# How do you check what is in ttest_long? Do that here?.


In [None]:
# now run swarmplot:

sns.swarmplot(
    data=...,
    x=...,
    y=...,
    #anything else you want!
    
)

### Teaching Moment: Independent vs. Paired T-Tests

What happens if we run the same data as a **paired** t-test? The results will differ!

- **Independent t-test**: Assumes the groups are unrelated (different subjects)
- **Paired t-test**: Assumes each observation in one group is matched to one in the other

---

**Important Warning:** The following is a "what happens if we use the wrong test" demonstration. A paired t-test requires **true subject-level pairing** (e.g., the same person measured before and after treatment). We're artificially pairing unrelated observations here to show how results change — **never do this with real data!**

In [None]:
# Compare independent vs paired t-tests
# For paired t-test, we need equal sample sizes, so we use the smaller n

n_min = min(len(new_data), len(old_data))

# Take only the first n_min values from each group
# WARNING: This is artificial pairing — just for demonstration!
new_paired = new_data[:n_min]
old_paired = old_data[:n_min]

# Independent t-test (what we should use for this data)
t_ind, p_ind = stats.ttest_ind(new_paired, old_paired)

# Paired t-test (ttest_rel = "related" samples)
t_paired, p_paired = stats.ttest_rel(new_paired, old_paired)

print("COMPARISON: Independent vs. Paired T-Tests")
print("=" * 50)
print(f"Using first {n_min} observations from each group\n")
print(f"Independent t-test: t = {t_ind:.3f}, p = {p_ind:.4f}")
print(f"Paired t-test:      t = {t_paired:.3f}, p = {p_paired:.4f}")
print("\nKey insight: The p-values differ!")
print("\nWhen to use each:")
print("  - Independent: Different subjects in each group (e.g., treatment vs control)")
print("  - Paired: Same subjects measured twice (e.g., before vs after)")

---

### Checking Assumptions: Equal Variances

The standard t-test assumes both groups have similar variances (spread). If they don't, we should use **Welch's t-test** instead.

Let's load a second dataset where the variances are very different, then visualize and test for it.

In [None]:
# Load the second sheet from ttest_data.xlsx — this one has unequal variances
# sheet_name=1 means the second sheet (Python counts from 0)

ttest_df2 = pd.read_excel('ttest_data.xlsx', sheet_name=1)

# Peek at the data
print("Data Set 2 (Unequal Variances):")
print(ttest_df2.head())

# Extract the two groups
new_data2 = ttest_df2['New'].dropna().values
old_data2 = ttest_df2['Old'].dropna().values

print(f"\nNew group: mean={new_data2.mean():.1f}, SD={new_data2.std(ddof=1):.1f}")
print(f"Old group: mean={old_data2.mean():.1f}, SD={old_data2.std(ddof=1):.1f}")
print(f"\nVariance ratio: {new_data2.var(ddof=1) / old_data2.var(ddof=1):.1f}x difference!")

In [None]:
# Visualize the variance difference — much clearer than numbers!
# Notice how the New group is much more spread out

fig, axes = plt.subplots(1, 2, figsize=(10, 4))

# Left plot: overlapping histograms
axes[0].hist(new_data2, bins=8, alpha=0.6, label='New (spread)', color='steelblue', edgecolor='black')
axes[0].hist(old_data2, bins=8, alpha=0.6, label='Old (tight)', color='coral', edgecolor='black')
axes[0].set_xlabel('Score')
axes[0].set_ylabel('Count')
axes[0].set_title('Histogram: Different Variances')
axes[0].legend()

# Right plot: swarm plot
# First restructure data for seaborn
variance_df = pd.DataFrame({
    'Group': ['New (spread)']*len(new_data2) + ['Old (tight)']*len(old_data2),
    'Score': list(new_data2) + list(old_data2)
})
sns.swarmplot(data=variance_df, x='Group', y='Score', ax=axes[1], palette=['steelblue', 'coral'])
axes[1].set_title('Swarm Plot: Different Variances')

plt.tight_layout()
plt.show()

print("See how the New group's dots are much more spread out? That's unequal variance!")

In [None]:
# Levene's test formally checks if variances are equal
# If p < 0.05, variances are significantly different — use Welch's t-test

lev_stat, lev_p = stats.levene(new_data2, old_data2)

print("LEVENE'S TEST FOR EQUALITY OF VARIANCES")
print("=" * 45)
print(f"Test statistic: {lev_stat:.3f}")
print(f"p-value: {lev_p:.4f}")

if lev_p < 0.05:
    print("\nResult: Variances are UNEQUAL (p < 0.05)")
    print("Recommendation: Use Welch's t-test")
else:
    print("\nResult: Variances are similar (p >= 0.05)")
    print("Recommendation: Standard t-test is fine")

In [None]:
# Compare standard vs Welch's t-test
# The only difference in code is equal_var=True vs equal_var=False

# Standard t-test (assumes equal variances)
t_standard, p_standard = stats.ttest_ind(new_data2, old_data2, equal_var=True)

# Welch's t-test (does NOT assume equal variances)
t_welch, p_welch = stats.ttest_ind(new_data2, old_data2, equal_var=False)

print("T-TEST COMPARISON (Data Set 2)")
print("=" * 50)
print(f"\nStandard t-test (equal_var=True):  t = {t_standard:.3f}, p = {p_standard:.4f}")
print(f"Welch's t-test  (equal_var=False): t = {t_welch:.3f}, p = {p_welch:.4f}")
print("\nWhen variances are unequal, Welch's t-test is more accurate!")
print("Many statisticians recommend ALWAYS using Welch's t-test.")

---

### Your Turn: Run Levene's Test

Run Levene's test on our original t-test data (`new_data` and `old_data`) to check if those groups have equal variances.

Use `stats.levene()` just like we did above.

<details>
<summary>Click to reveal answer</summary>

```python
lev_stat, lev_p = stats.levene(new_data, old_data)
print(f"Levene's test: F = {lev_stat:.3f}, p = {lev_p:.4f}")
if lev_p < 0.05:
    print("Variances are unequal — use Welch's t-test")
else:
    print("Variances are similar — standard t-test is fine")
```

</details>

In [None]:
# Test whether new_data and old_data have equal variances
# Use stats.levene()

# YOUR CODE HERE:



---

## Part 3: One-Way ANOVA

**Question**: Does locomotor activity differ across the 8 drug treatment groups?

In JASP: ANOVA → drag Dependent Variable → drag Fixed Factors → check Post Hoc...

In Python: First we need to separate our data by group, then run the test.

In [None]:
# Separate the data by treatment group
# We'll store each group's data in a dictionary

# A dictionary uses curly braces {} and stores key:value pairs
groups = {}

# Loop through each group name and extract that group's data
for group_name in group_order:
    # anova_df[anova_df['Group'] == group_name] filters to just that group
    # ['Squares entered'] selects the column we want
    # .values converts to a numpy array
    groups[group_name] = anova_df[anova_df['Group'] == group_name]['Squares entered'].values

# Check that it worked by printing one group
print(f"Control group data: {groups['Control']}")
print(f"Control group n = {len(groups['Control'])}")

In [None]:
# Before running the ANOVA, let's visualize all groups
# This swarm plot shows every data point

plt.figure(figsize=(12, 5))

# Use order= to control the group order on the x-axis
sns.swarmplot(data=anova_df, 
              x='Group', 
              y='Squares entered',
              order=group_order,
              color='darkblue',
              size=5,
              alpha=0.7)  # slight transparency

plt.xticks(rotation=45, ha='right')
plt.title('Locomotor Activity by Treatment Group')
plt.ylabel('Squares Entered')
plt.xlabel('Treatment Group')
plt.tight_layout()
plt.show()

print("Which groups have high locomotion? Which have almost none?")

In [None]:
# Run the one-way ANOVA
# stats.f_oneway() takes multiple groups as arguments
# The * unpacks our dictionary values into separate arguments

F_stat, p_value = stats.f_oneway(*groups.values())

# Calculate degrees of freedom for reporting
k = len(groups)       # number of groups
N = len(anova_df)     # total sample size
df_between = k - 1    # degrees of freedom between groups
df_within = N - k     # degrees of freedom within groups

# Format p-value properly (show "p < .001" if very small)
if p_value < 0.001:
    p_str = "p < .001"
else:
    p_str = f"p = {p_value:.4f}"

print("ONE-WAY ANOVA: Locomotor Activity by Treatment Group")
print("=" * 55)
print(f"\nF({df_between}, {df_within}) = {F_stat:.2f}, {p_str}")
print("\nCompare with JASP: F(7, 102) = 82.00, p < .001")

print("\nGroup means:")
for group_name in group_order:
    mean = groups[group_name].mean()
    print(f"  {group_name:20s}: {mean:6.2f}")

---

## Part 4: Post-Hoc Comparisons (Tukey HSD)

The ANOVA tells us groups differ, but **which** groups differ from each other?

With 8 groups, we have 8×7/2 = **28 pairwise comparisons**.

### Why Not Just Run 28 T-Tests?

You might think: "If I want to compare 8 groups, I'll just run all pairwise t-tests!"

**The problem:** With 28 comparisons at alpha = 0.05, you expect ~1-2 false positives by chance alone. This is called the **multiple comparisons problem**.

**Solutions:**
1. **Tukey HSD** (what we'll use) — adjusts p-values to control family-wise error rate
2. **Bonferroni correction** — multiply each p-value by the number of tests (more conservative)

In [None]:
# Import tukey_hsd from scipy.stats
from scipy.stats import tukey_hsd

# Run Tukey HSD on all groups
# We pass each group's data as a separate argument
result = tukey_hsd(*[groups[g] for g in group_order])

print("TUKEY HSD POST-HOC COMPARISONS")
print("=" * 70)
print(f"{'Comparison':<45} {'Mean Diff':>10} {'p-value':>10}")
print("-" * 70)

# Count significant comparisons
n_significant = 0

# Loop through all pairs of groups
for i in range(len(group_order)):
    for j in range(i+1, len(group_order)):
        g1, g2 = group_order[i], group_order[j]
        
        # Calculate mean difference
        mean_diff = groups[g1].mean() - groups[g2].mean()
        
        # Get p-value from the result matrix
        p_val = result.pvalue[i, j]
        
        # Add significance stars
        if p_val < 0.001:
            sig_marker = "***"
            n_significant += 1
        elif p_val < 0.01:
            sig_marker = "**"
            n_significant += 1
        elif p_val < 0.05:
            sig_marker = "*"
            n_significant += 1
        else:
            sig_marker = ""
        
        comparison = f"{g1} vs {g2}"
        print(f"{comparison:<45} {mean_diff:>10.2f} {p_val:>10.3f} {sig_marker}")

print("-" * 70)
print(f"Significant comparisons: {n_significant} of 28")
print("\n*** p < .001, ** p < .01, * p < .05")

In [None]:
# Highlight key pharmacological findings
# We extract specific p-values from the Tukey result matrix

# Helper function to format p-values nicely
def format_p(p):
    if p < 0.001:
        return "p < .001"
    else:
        return f"p = {p:.3f}"

print("KEY PHARMACOLOGICAL FINDINGS")
print("=" * 60)

# Get p-values from Tukey result matrix
# Indices match group_order: Control=0, Amph=1, Res only=2, etc.
p_amph_ctrl = result.pvalue[0, 1]      # Control vs Amph
p_res_ctrl = result.pvalue[0, 2]       # Control vs Res only
p_resamph_ctrl = result.pvalue[0, 3]   # Control vs Res+Amph
p_final_ctrl = result.pvalue[0, 7]     # Control vs Res+MT+Amph+DOPA

print("\n1. Amphetamine INCREASES locomotion:")
diff = groups['Amph'].mean() - groups['Control'].mean()
print(f"   Amph vs Control: +{diff:.1f} squares ({format_p(p_amph_ctrl)})")

print("\n2. Reserpine ABOLISHES locomotion:")
diff = groups['Res only'].mean() - groups['Control'].mean()
print(f"   Res only vs Control: {diff:.1f} squares ({format_p(p_res_ctrl)})")

print("\n3. Amphetamine PARTIALLY restores function after reserpine:")
print(f"   Res+Amph mean: {groups['Res+Amph'].mean():.1f} (vs Control: {format_p(p_resamph_ctrl)})")

print("\n4. MT blocks the amphetamine restoration:")
print(f"   Res+MT+Amph mean: {groups['Res+MT+Amph'].mean():.1f} (essentially zero)")

print("\n5. L-DOPA + Amph can restore function even with MT:")
print(f"   Res+MT+Amph+DOPA mean: {groups['Res+MT+Amph+DOPA'].mean():.1f} (vs Control: {format_p(p_final_ctrl)})")

print("\nThese findings demonstrate the dopamine hypothesis of locomotion!")

---

### Your Turn: Run a T-Test Between Two Groups

Run an independent samples t-test comparing the `Control` group to the `Amph` group.

Use `stats.ttest_ind()` with the data from the `groups` dictionary.

Hint: Access Control data with `groups['Control']` and Amph data with `groups['Amph']`

<details>
<summary>Click to reveal answer</summary>

```python
t_stat, p_val = stats.ttest_ind(groups['Control'], groups['Amph'])
print(f"t = {t_stat:.3f}, p = {p_val:.4f}")
print(f"Control mean: {groups['Control'].mean():.2f}")
print(f"Amph mean: {groups['Amph'].mean():.2f}")
```

</details>

In [None]:
# Compare Control to Amph using stats.ttest_ind()

# YOUR CODE HERE:



---

## Part 5: Two-Way Repeated Measures ANOVA

So far we've analyzed data measured at a single time point. But what if we measure the same subjects at multiple times?

**The longitudinal study:** The same animals were tested again 1 week after the initial drug treatment (after the drugs wore off).

**Research question:**
- Does locomotor activity change over time?
- Does the change depend on which drug group the animal was in?
- Is there a Drug × Time interaction?

This requires a **mixed ANOVA**:
- **Between-subjects factor:** Drug treatment (different animals in each group)
- **Within-subjects factor:** Time (same animals measured twice)

In [None]:
# Repeated measures ANOVA requires the pingouin library
# This cell installs it (only needed once per session in Colab)
!pip install pingouin -q
import pingouin as pg

In [None]:
# Load the longitudinal data
long_df = pd.read_excel('class_data_longitudinal.xlsx')

print("Longitudinal Data (same animals, two time points):")
print(long_df.head(10))
print(f"\nShape: {long_df.shape}")
print(f"Columns: {list(long_df.columns)}")

In [None]:
# Clean: Remove rows with missing data (marked as '.')
# Convert 1-week column to numeric
long_df['Week1'] = pd.to_numeric(long_df['Squares entered 1 week'], errors='coerce')
long_df_clean = long_df.dropna(subset=['Week1'])
print(f"Removed {len(long_df) - len(long_df_clean)} rows with missing data")

# Add subject ID (needed for repeated measures)
long_df_clean = long_df_clean.reset_index(drop=True)
long_df_clean['Subject'] = range(len(long_df_clean))

# Reshape to "long format" for pingouin
# Each row = one observation (subject × time combination)
df_time1 = long_df_clean[['Subject', 'Group', 'Squares entered']].copy()
df_time1['Time'] = 'Initial'
df_time1 = df_time1.rename(columns={'Squares entered': 'Squares'})

df_time2 = long_df_clean[['Subject', 'Group', 'Week1']].copy()
df_time2['Time'] = '1 Week'
df_time2 = df_time2.rename(columns={'Week1': 'Squares'})

rm_df = pd.concat([df_time1, df_time2], ignore_index=True)
print("\nReshaped data (long format):")
print(rm_df.head(10))

In [None]:
# Visualize: Does the pattern differ by group over time?
plt.figure(figsize=(12, 5))

# Calculate means for each Group × Time combination
means = rm_df.groupby(['Group', 'Time'])['Squares'].mean().unstack()
means = means.reindex(group_order)  # Use our standard group order

# Plot
means.plot(kind='bar', ax=plt.gca(), color=['steelblue', 'coral'], edgecolor='black')
plt.title('Locomotor Activity: Initial vs 1 Week Later')
plt.ylabel('Squares Entered (mean)')
plt.xlabel('Treatment Group')
plt.legend(title='Time')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

print("Notice how Amph drops dramatically — the drug effect wore off!")
print("And the Res groups recovered somewhat over the week.")

In [None]:
# Run the mixed ANOVA using pingouin
# - dv: dependent variable (what we're measuring)
# - within: within-subjects factor (Time)
# - between: between-subjects factor (Group)
# - subject: identifies each subject

aov = pg.mixed_anova(data=rm_df,
                     dv='Squares',
                     within='Time',
                     between='Group',
                     subject='Subject')

print("TWO-WAY MIXED ANOVA RESULTS")
print("=" * 60)
print(aov.round(4).to_string())

print("\n--- Interpretation ---")
for idx, row in aov.iterrows():
    effect = row['Source']
    f_val = row['F']
    p_val = row['p-unc']
    p_str = "p < .001" if p_val < 0.001 else f"p = {p_val:.4f}"
    sig = "**SIGNIFICANT**" if p_val < 0.05 else "not significant"
    print(f"{effect}: F = {f_val:.2f}, {p_str} — {sig}")

### What do the results mean?

- **Group effect:** Do drug groups differ overall (averaging across time)?
- **Time effect:** Does locomotion change from initial to 1 week (averaging across groups)?
- **Group × Time interaction:** Does the TIME effect depend on which GROUP you're in?

The **interaction** is often the most interesting! It tells us whether drug effects persist or fade over time differently for different treatments.

---

## Part 6: Things Python Can Do That JASP Can't

### Advantage 1: Automation

What if you needed to run pairwise comparisons for multiple outcome measures? In JASP, you'd click through menus dozens of times. In Python, it's a loop.

**Warning:** Running many t-tests without correction inflates your false positive rate! This is just to demonstrate automation — always use Tukey HSD or Bonferroni correction for real analyses.

In [None]:
# Run all 28 pairwise t-tests in a loop
# This demonstrates automation — NOT recommended without correction!

print("AUTOMATED PAIRWISE T-TESTS (28 comparisons)")
print("=" * 55)

# Store results in a list
results_list = []

# Nested loops go through all pairs
for i in range(len(group_order)):
    for j in range(i+1, len(group_order)):  # j > i avoids duplicates
        g1, g2 = group_order[i], group_order[j]
        
        # Run t-test
        t, p = stats.ttest_ind(groups[g1], groups[g2])
        
        # Store in our results list
        results_list.append({
            'Group 1': g1,
            'Group 2': g2,
            't': t,
            'p': p,
            'significant': p < 0.05
        })

# Convert to DataFrame for easy viewing
results_df = pd.DataFrame(results_list)

print(f"\nRan {len(results_df)} t-tests in a loop.")
print(f"Significant results (uncorrected): {results_df['significant'].sum()}")

# Apply Bonferroni correction
# Multiply each p-value by the number of tests
results_df['p_bonferroni'] = (results_df['p'] * len(results_df)).clip(upper=1.0)
results_df['sig_corrected'] = results_df['p_bonferroni'] < 0.05

print(f"Significant after Bonferroni correction: {results_df['sig_corrected'].sum()}")

print("\nIn JASP, you'd click through 28 separate comparisons.")
print("In Python, 10 lines of code.")

### Advantage 2: Custom Visualizations

JASP gives you canned plots. Python gives you full control over every aspect of your figures.

In [None]:
# Create a publication-quality bar plot with error bars

fig, ax = plt.subplots(figsize=(12, 6))

# Calculate means and standard errors for each group
means = [groups[g].mean() for g in group_order]
sems = [groups[g].std() / np.sqrt(len(groups[g])) for g in group_order]

# Color code by drug effect
# Try changing these colors! Options: 'red', 'blue', 'green', 'orange', 'purple', etc.
colors = ['green',      # Control
          'red',        # Amph (stimulant effect)
          'royalblue',  # Res only (depleted)
          'orange',     # Res+Amph (partial restoration)
          'royalblue',  # Res+MT
          'royalblue',  # Res+MT+Amph
          'royalblue',  # Res+MT+DOPA
          'orange']     # Res+MT+Amph+DOPA (restoration)

# Create the bar plot
# yerr adds error bars, capsize sets the width of the error bar caps
bars = ax.bar(range(len(group_order)), means, yerr=sems, 
              color=colors, edgecolor='black', capsize=4, alpha=0.8)

# Customize the plot
ax.set_xticks(range(len(group_order)))
ax.set_xticklabels(group_order, rotation=45, ha='right', fontsize=10)
ax.set_ylabel('Squares Entered (mean +/- SEM)', fontsize=12)
ax.set_xlabel('Treatment Group', fontsize=12)

# Add title with our computed statistics
if p_value < 0.001:
    title_p = "p < .001"
else:
    title_p = f"p = {p_value:.3f}"
ax.set_title(f'Locomotor Activity by Treatment\nF({df_between}, {df_within}) = {F_stat:.2f}, {title_p}', fontsize=14)

# Add a horizontal reference line at Control mean
ax.axhline(y=groups['Control'].mean(), color='gray', linestyle='--', alpha=0.5, label='Control baseline')
ax.legend()

plt.tight_layout()
plt.show()

print("Color key: Green=Control, Red=Stimulant, Blue=Depleted, Orange=Restored")
print("\nTry changing the colors list above and re-running!")

### Advantage 3: Bootstrap Confidence Intervals

This isn't even an option in JASP's menus. In Python, it's straightforward.

### Why Bootstrap?

Traditional statistics assume your data follows a specific distribution (usually normal). But what if it doesn't? Or what if you have a small sample?

**Bootstrapping** lets you estimate uncertainty without those assumptions:
1. Resample your data (with replacement) thousands of times
2. Calculate your statistic each time
3. The spread of those results IS your confidence interval

This is especially useful when:
- Sample sizes are small
- Data is skewed or has outliers
- You want to verify your results aren't dependent on distributional assumptions

In [None]:
# Bootstrap confidence interval for Amph vs Control difference

amph = groups['Amph']
control = groups['Control']

# Number of bootstrap samples — more = more precise but slower
n_bootstrap = 10000

# Store bootstrapped differences
boot_diffs = []

# Set random seed for reproducibility
np.random.seed(42)

# Bootstrap loop
for _ in range(n_bootstrap):
    # Resample WITH replacement (some values will be picked multiple times)
    boot_amph = np.random.choice(amph, size=len(amph), replace=True)
    boot_ctrl = np.random.choice(control, size=len(control), replace=True)
    
    # Calculate mean difference for this resample
    boot_diffs.append(boot_amph.mean() - boot_ctrl.mean())

boot_diffs = np.array(boot_diffs)

# Calculate confidence interval using percentiles
ci_lower = np.percentile(boot_diffs, 2.5)
ci_upper = np.percentile(boot_diffs, 97.5)

print("BOOTSTRAP ANALYSIS (10,000 resamples)")
print("=" * 45)
print(f"\nAmphetamine vs Control difference:")
print(f"  Observed difference: {amph.mean() - control.mean():.1f} squares")
print(f"  Bootstrap mean:      {np.mean(boot_diffs):.1f} squares")
print(f"  95% CI: [{ci_lower:.1f}, {ci_upper:.1f}]")
print(f"\nProportion where Amph > Control: {(boot_diffs > 0).mean():.1%}")

In [None]:
# Visualize the bootstrap distribution

plt.figure(figsize=(8, 4))

# Histogram of bootstrap differences
plt.hist(boot_diffs, bins=50, edgecolor='black', alpha=0.7, color='indianred')

# Add vertical lines for reference
plt.axvline(0, color='black', linewidth=2, linestyle='--', label='No difference')
plt.axvline(ci_lower, color='navy', linewidth=2, linestyle=':', label='95% CI')
plt.axvline(ci_upper, color='navy', linewidth=2, linestyle=':')

plt.xlabel('Amphetamine - Control (squares entered)')
plt.ylabel('Count')
plt.title('Bootstrap Distribution of Mean Difference')
plt.legend()
plt.tight_layout()
plt.show()

print("The entire distribution is above zero — strong evidence that Amph > Control!")

### Advantage 4: Reproducibility

If someone asks *"How did you get that result?"*, you can hand them this notebook. Every step is documented, every analysis is re-runnable.

In JASP, you'd have to write out: *"I clicked ANOVA, then dragged Squares entered into the dependent variable box, then I dragged Group into the grouping variable, then I checked the Tukey option under Post Hoc..."*

In Python, the code **is** the documentation.

---

## Common Errors and What They Mean

When you see an error, don't panic! Here's what common errors mean:

| Error | What it means | How to fix |
|-------|---------------|------------|
| `FileNotFoundError` | Python can't find the file | Check filename spelling, make sure file is in the same folder as the notebook |
| `KeyError: 'column_name'` | That column doesn't exist in your data | Check spelling with `df.columns` to see all column names |
| `ValueError: could not convert string to float` | There's text in a column that should be numbers | Check for header rows or non-numeric data with `df.head()` |
| `NameError: name 'x' is not defined` | You're using a variable before creating it | Make sure you ran the cell that creates that variable first |
| `IndentationError` | Python code isn't lined up correctly | Check that spaces/tabs are consistent |

**Pro tip:** Read error messages from the bottom up — the last line usually tells you what went wrong!

---

## Bonus: More Repeated Measures Options

In Part 5, we used `pingouin` for the mixed ANOVA. The library also supports:

**Within-subjects only ANOVA** (all factors are repeated measures):
```python
pg.rm_anova(data=df, dv='Score', within='Time', subject='Subject')
```

**Multiple within-subjects factors**:
```python
pg.rm_anova(data=df, dv='Score', within=['Time', 'Condition'], subject='Subject')
```

See the full documentation at: [pingouin-stats.org](https://pingouin-stats.org/)

---

## Checkpoint: JASP vs Python

You just replicated **every test** from the JASP lab in Python. Let's count the lines of code:

| Test | Python Code | JASP |
|------|-------------|------|
| Independent t-test | `stats.ttest_ind(a, b)` — **1 line** | 4-5 clicks, drag variables |
| Paired t-test | `stats.ttest_rel(a, b)` — **1 line** | Reshape data, 4-5 clicks |
| Welch's t-test | `stats.ttest_ind(a, b, equal_var=False)` — **1 line** | Buried in options menu |
| Levene's test | `stats.levene(a, b)` — **1 line** | Separate analysis |
| One-way ANOVA | `stats.f_oneway(g1, g2, ...)` — **1 line** | 3-4 clicks, check boxes |
| Tukey HSD | `tukey_hsd(g1, g2, ...)` — **1 line** | Check post-hoc options |
| 28 pairwise tests | Loop: 10 lines total | Click 28x through menus |

The tests are equally simple. **But Python can do things JASP cannot.** We demonstrated:
- Automation (loop through all comparisons)
- Custom publication-quality visualizations
- Bootstrap confidence intervals
- Complete reproducibility

---

## Summary: The Python Cheat Sheet

| What you want to do | Python code |
|---|---|
| Load Excel data | `df = pd.read_excel('file.xlsx')` |
| Load CSV data | `df = pd.read_csv('file.csv')` |
| View first rows | `df.head()` |
| Check data structure | `df.info()` |
| Check for missing values | `df.isna().sum()` |
| Group means | `df.groupby('Group')['DV'].mean()` |
| Descriptive stats | `df.groupby('Group')['DV'].agg(['count', 'mean', 'std'])` |
| Independent t-test | `stats.ttest_ind(group1, group2)` |
| Welch's t-test | `stats.ttest_ind(group1, group2, equal_var=False)` |
| Paired t-test | `stats.ttest_rel(pre, post)` |
| Levene's test | `stats.levene(group1, group2)` |
| One-way ANOVA | `stats.f_oneway(g1, g2, g3, ...)` |
| Tukey HSD | `tukey_hsd(g1, g2, g3, ...)` |
| Mixed ANOVA (repeated measures) | `pg.mixed_anova(data=df, dv='Score', within='Time', between='Group', subject='Subject')` |
| Histogram | `plt.hist(x, bins=10, color='blue')` |
| Swarm plot | `sns.swarmplot(data=df, x='Group', y='Value')` |
| Bar plot | `plt.bar(x, heights, yerr=errors)` |

**That's it.** With these functions, you can do everything JASP does — and much more.

You don't have to memorize them. There is no test. The more you use them, the more you will remember.

**Remember:** The skills you learned today transfer directly to machine learning, neuroimaging, bioinformatics, and countless other fields!

For more, check out this great talk on why data science (and statistics in general) is better when done with code instead of with a GUI:


https://www.youtube.com/watch?v=cpbtcsGE0OA

It's a long video, but you can find a summary of the key points here 

https://asterisk.dynevor.org/you-cant-do-data-science-in-a-gui.html