# Notebook 06 â€” Inferential Statistics: Air vs Ground and Certificates

This notebook uses simple inferential statistics to compare medication-error patterns:

- Air vs Ground (two-sample t-test)
- Across certificates (ANOVA based on the Source field)

The goal is to see whether observed differences are likely to be real or due to chance.

In [None]:
import pandas as pd
from scipy import stats

file_path = "../data/Krista_240726_Final.xlsx"
med_df = pd.read_excel(file_path, sheet_name="Medication")
med_df.head()

## Create a simple binary error flag for testing

Here we define a binary variable to use as the outcome in the tests. This is a simple example using dose-related errors.

In [None]:
med_df['dose_error_flag'] = med_df['Pattern Specifics'].str.contains('dose', case=False, na=False).astype(int)
med_df['dose_error_flag'].value_counts(dropna=False)

## Two-sample t-test: Air vs Ground

We compare the average dose_error_flag between Air and Ground.

In [None]:
air_group = med_df[med_df['Branch'].str.contains('Air', case=False, na=False)]['dose_error_flag'].dropna()
ground_group = med_df[~med_df['Branch'].str.contains('Air', case=False, na=False)]['dose_error_flag'].dropna()

t_stat, p_val = stats.ttest_ind(air_group, ground_group, equal_var=True)
print('t-statistic:', t_stat)
print('p-value:', p_val)

## ANOVA: Differences across certificates (Source)

We examine dose_error_flag across different certificates using a one-way ANOVA.

In [None]:
groups = []
labels = []

for cert in med_df['Source'].dropna().unique():
    group = med_df.loc[med_df['Source'] == cert, 'dose_error_flag'].dropna()
    if len(group) > 0:
        groups.append(group)
        labels.append(cert)

f_stat, p_val_anova = stats.f_oneway(*groups)
print('Certificates:', labels)
print('F-statistic:', f_stat)
print('p-value:', p_val_anova)