# How are freshman backpacks coming off the C1?

## Question:
Does perceived sex (Male vs Female) predict which shoulder (Left, Right, Both, or Neither) a student uses to carry their backpack when getting off the C1 bus?

## Rationale:
Observing how students carry backpacks offers a simple behavioral dataset for testing categorical prediction. While trivial on its face, it demonstrates how to structure a behavioral observation into a formal contingency table suitable for hypothesis testing.

## Operational Notes:
- Students seated on the bus may hold the backpack on their lap rather than wear it.
- “Getting off the bus” is the key observation moment—students must reorient or lift the backpack, revealing shoulder choice.
- Rain (especially this last week) likely impacted the student choice.
- Perceived sex is coded visually (M/F) and may not correspond perfectly to actual sex or gender identity. This should be noted as a limitation.

## Power Law for Chi-Square's Test for Indepedence

The chi-square test of independence is appropriate because:
- Both variables (perceived sex and shoulder choice / lack-thereof) are categorical and nominal.
- We are testing whether the distribution of shoulder choices differs by sex.
- It makes no assumption about ordering or directionality, only association.

The power law used here expresses that the noncentrality parameter ($\lambda$) grows with both effect size ($w$) and sample size ($n$):

$$
\lambda = n w^2
$$

Power increases with $\lambda$, meaning:
- Larger effects (greater $w$) or
- Larger samples (greater $n$)
make it more likely to detect an association.

In practical terms, even a small effect (i.e. $w = 0.1$) requires several hundred observations for 80% power at α = 0.05, while a medium effect (i.e. $w = 0.3$) could be detected with fewer than 150 observations (see below).

This defines the observational effort needed before moving to live C1 data collection.


In [4]:
import numpy as np
from scipy.stats import ncx2, chi2

def n_for_power(w, df=3, alpha=0.05, power=0.80):
    crit = chi2.ppf(1-alpha, df)
    lo, hi = 1, 1_000_000
    for _ in range(60):
        mid = (lo+hi)//2
        nc = mid*(w**2)
        p = 1 - ncx2.cdf(crit, df, nc)
        if p >= power: hi = mid
        else: lo = mid+1
    return lo

effect_sizes = [0.05, 0.1, 0.15, 0.2, 0.25, 0.3]

for w in effect_sizes:
    n_needed = n_for_power(w, df=3, alpha=0.05, power=0.8)
    print(f"w = {w},  required n : {n_needed}")


w = 0.05,  required n : 4362
w = 0.1,  required n : 1091
w = 0.15,  required n : 485
w = 0.2,  required n : 273
w = 0.25,  required n : 175
w = 0.3,  required n : 122
