# Step 4 â€” Feature Engineering: Structured Pattern Flags

This notebook converts the free-text **`Pattern Specifics`** field into
simple, structured clinical flags that can be used in later analysis
and modeling.

Goals:

- Create yes/no flags for major error themes that appear repeatedly in the data:
  - Dosing-related issues (max dose, volume, over/under dose)
  - Wrong medication events ("instead of" the intended drug)
  - Protocol / checklist / policy issues
- Quantify how often each error type occurs.
- Prepare the dataset for later risk modeling and decision-tree work.

This mirrors the **feature engineering** step from the loan assignment,
but here the features are **clinical risk patterns** instead of financial variables.

In [None]:
import pandas as pd
import numpy as np

# ---------------------------------------------------------
# 1. Load the Medication data
# ---------------------------------------------------------
try:
    med = pd.read_excel('Krista 240726 Final.xlsx', sheet_name='Medication')
except FileNotFoundError:
    med = pd.read_csv('Krista 240726 Final.xlsx - Medication.csv')

print("Medication sheet shape:", med.shape)

# Safety check: make sure the column exists
if "Pattern Specifics" not in med.columns:
    raise KeyError("The 'Pattern Specifics' column is not present in the Medication sheet.")

# ---------------------------------------------------------
# 2. Create pattern flags (1 = pattern present, 0 = not present)
# ---------------------------------------------------------

# Dosing Errors
# Catches 'dosing', 'max dose', 'volume', 'overdose', 'underdose', etc.
med["Flag_Dosing_Error"] = med["Pattern Specifics"].str.contains(
    r"dosing|max dose|max\.\s*dose|volume|overdose|underdose",
    case=False, na=False
).astype(int)

# Wrong Medication
# Catches 'wrong med', 'wrong medication', 'instead of', 'incorrect medication'
med["Flag_Wrong_Med"] = med["Pattern Specifics"].str.contains(
    r"wrong med|wrong medication|instead of|incorrect medication",
    case=False, na=False
).astype(int)

# Protocol / Compliance Errors
# Catches 'protocol', 'checklist', 'policy', 'procedure'
med["Flag_Protocol_Error"] = med["Pattern Specifics"].str.contains(
    r"protocol|checklist|policy|procedure",
    case=False, na=False
).astype(int)

# ---------------------------------------------------------
# 3. Quick frequency check for each flag
# ---------------------------------------------------------
flag_cols = ["Flag_Dosing_Error", "Flag_Wrong_Med", "Flag_Protocol_Error"]

print("New pattern-flag columns added:")
print(flag_cols)

print("\nEvent counts by flag (1 = pattern present):")
for col in flag_cols:
    count = med[col].sum()
    print(f"{col}: {count} events")

# ---------------------------------------------------------
# 4. Preview the first 10 rows with the new flags
# ---------------------------------------------------------
cols_to_display = [
    "Report ID", "Branch", "Primary Risk", "Risk Event",
    "Medication 1", "Pattern Specifics"
] + flag_cols

display(med[cols_to_display].head(10))
