# Step 5 — Pattern Specifics by Branch (Air vs Ground)

This notebook focuses on how detailed medication error patterns
(`Pattern Specifics`) distribute across **Branch** \(Air vs Ground\).

Goals:

- Identify the most frequent `Pattern Specifics` categories.
- Compare how often these high-volume patterns occur in **Air** vs **Ground** branches.
- Visualize the relationship using a **heatmap** so high-risk
  branch–pattern combinations are immediately visible.

This mirrors the bivariate EDA step from the loan project, but instead
of financial variables vs Personal_Loan, we compare **clinical error
patterns vs Branch** to build an early branch-specific risk profile.

In [None]:
# Step 5 — Pattern Specifics by Branch (Air vs Ground)

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")

# ---------------------------------------------------------
# 1. Load the Medication data
# ---------------------------------------------------------
try:
    med = pd.read_excel('Krista 240726 Final.xlsx', sheet_name='Medication')
except FileNotFoundError:
    med = pd.read_csv('Krista 240726 Final.xlsx - Medication.csv')

print("Medication sheet shape:", med.shape)

# ---------------------------------------------------------
# 2. Drop missing Pattern Specifics and find the most frequent patterns
# ---------------------------------------------------------
pattern_counts = (
    med["Pattern Specifics"]
    .dropna()
    .value_counts()
)

n_top = 10
top_patterns = pattern_counts.head(n_top).index.tolist()

print(f"Top {n_top} Pattern Specifics in the Medication dataset:")
print(pattern_counts.head(n_top))

# Filter Medication data to only those top patterns
med_top = med[med["Pattern Specifics"].isin(top_patterns)].copy()

# ---------------------------------------------------------
# 3. Create crosstab: Pattern Specifics (rows) x Branch (columns)
# ---------------------------------------------------------
ct_branch_pattern = pd.crosstab(
    med_top["Pattern Specifics"],
    med_top["Branch"]
)

print("\nCrosstab of top Pattern Specifics by Branch (Air vs Ground):")
display(ct_branch_pattern)

# ---------------------------------------------------------
# 4. Plot heatmap of top patterns by Branch
# ---------------------------------------------------------
plt.figure(figsize=(10, 6))
sns.heatmap(
    ct_branch_pattern,
    annot=True,
    fmt="d",
    cmap="YlOrRd"
)
plt.title(f"Top {n_top} Medication Error Patterns by Branch (Air vs Ground)")
plt.xlabel("Branch")
plt.ylabel("Pattern Specifics")
plt.tight_layout()
plt.show()
