# Module 0: Jupyter + Python Bootcamp for Medical Students
## Learn the workflow before learning the models

**Goal:** Become comfortable running and editing notebooks with basic Python and medical data.

### How to use this notebook
- Run cells from top to bottom.
- If something breaks, use `Kernel -> Restart Kernel and Run All Cells`.
- Edit values in the exercise cells and re-run to see what changes.
- This is beginner-friendly and intentionally simple.

### Learning objectives
1. Run and troubleshoot Jupyter notebook cells.
2. Use core Python basics: variables, lists, conditions, loops, and functions.
3. Load and inspect a small synthetic patient dataset with pandas.
4. Create basic plots and interpret them clinically.
5. Build a tiny risk score without training a model.

## Section 0: Why this matters in clinic
You receive a notebook from a colleague that flags high-risk patients.
Before trusting outputs, you need to know how to run, inspect, and sanity-check code.

### Clinical vignette
A patient has diabetes, high blood pressure, and multiple prior admissions.
A notebook says the patient is low risk.
Your job is not to become a software engineer. Your job is to catch obvious mistakes fast.

In [None]:
import sys

print('Python version:', sys.version.split()[0])
print('Notebook kernel is running correctly.')

## Section 1: Jupyter survival guide
A notebook has two cell types:
- **Markdown**: explanation text
- **Code**: executable Python

### Quick actions
- Run current cell: `Shift + Enter`
- Insert new cell below: `B` (in command mode)
- Change cell to markdown: `M`
- Change cell to code: `Y`
- Restart when confused: `Kernel -> Restart`

In [1]:
# Example of a common error: NameError
try:
    print(last_hba1c)
except NameError as err:
    print('You got:', err)
    print('Fix: define the variable before using it.')

You got: name 'last_hba1c' is not defined
Fix: define the variable before using it.


In [None]:
last_hba1c = 7.8
print('Last HbA1c is now defined:', last_hba1c)

In [None]:
# Basic arithmetic with clinical values
systolic_bp = 148
diastolic_bp = 88
mean_arterial_pressure = (systolic_bp + 2 * diastolic_bp) / 3
print('Mean arterial pressure:', round(mean_arterial_pressure, 1))

## Section 2: Python essentials with medical examples

In [None]:
age = 67
smoker = True
ldl_mg_dl = 162

print('age ->', age, type(age))
print('smoker ->', smoker, type(smoker))
print('ldl_mg_dl ->', ldl_mg_dl, type(ldl_mg_dl))

In [None]:
patients = ['P001', 'P002', 'P003', 'P004']
print('First patient:', patients[0])
print('Total patients in this list:', len(patients))

In [None]:
def simple_triage(systolic_bp, has_chest_pain):
    # Educational toy rule. Not for clinical use.
    if systolic_bp >= 160 or has_chest_pain:
        return 'Urgent review'
    return 'Routine follow-up'

print(simple_triage(170, False))
print(simple_triage(135, True))
print(simple_triage(128, False))

In [None]:
mini_panel = [
    {'patient_id': 'P010', 'systolic_bp': 130},
    {'patient_id': 'P011', 'systolic_bp': 170},
    {'patient_id': 'P012', 'systolic_bp': 136},
]

for row in mini_panel:
    if row['systolic_bp'] >= 140:
        print(row['patient_id'], '-> elevated BP')
    else:
        print(row['patient_id'], '-> not elevated BP')

In [None]:
def calculate_bmi(weight_kg, height_m):
    return weight_kg / (height_m ** 2)

bmi_value = calculate_bmi(88, 1.72)
print('BMI:', round(bmi_value, 1))

### Try it yourself
1. Change the blood pressure values and rerun.
2. Edit `simple_triage` threshold from `160` to `150` and compare outputs.
3. Compute BMI for a different patient.

## Section 3: Tables are your stethoscope (pandas basics)

In [None]:
import pandas as pd

df = pd.read_csv('../data/chapter_00_patients.csv')
df.head()

In [None]:
print('Rows, columns:', df.shape)
print('Columns:', list(df.columns))
print('\nData types:')
print(df.dtypes)

In [None]:
df.describe(numeric_only=True).T

In [None]:
high_bp = df[df['systolic_bp'] >= 150]
print('Patients with systolic BP >= 150:', len(high_bp))
high_bp[['patient_id', 'age', 'systolic_bp', 'diabetes']].head(10)

In [None]:
print('Missing values per column:')
print(df.isna().sum())

df_clean = df.copy()
df_clean['bmi'] = df_clean['bmi'].fillna(df_clean['bmi'].median())
df_clean['ldl_mg_dl'] = df_clean['ldl_mg_dl'].fillna(df_clean['ldl_mg_dl'].median())
df_clean['hba1c'] = df_clean['hba1c'].fillna(df_clean['hba1c'].median())

print('\nAfter fill:')
print(df_clean.isna().sum())

In [None]:
patients_with_score = df_clean.copy()
patients_with_score['risk_score'] = (
    (patients_with_score['age'] >= 65).astype(int)
    + (patients_with_score['systolic_bp'] >= 150).astype(int)
    + (patients_with_score['ldl_mg_dl'] >= 160).astype(int)
    + patients_with_score['diabetes']
    + patients_with_score['smoker']
)

patients_with_score[['patient_id', 'risk_score']].head(10)

## Section 4: Visual first - quick clinical plots

In [None]:
import matplotlib.pyplot as plt

plt.style.use('ggplot')

In [None]:
fig, ax = plt.subplots(figsize=(7, 4))
ax.hist(patients_with_score['age'], bins=8, color='#2a9d8f', edgecolor='black')
ax.set_title('Age Distribution (Synthetic Cohort)')
ax.set_xlabel('Age (years)')
ax.set_ylabel('Count')
plt.show()

In [None]:
fig, ax = plt.subplots(figsize=(7, 4))
counts = patients_with_score['diabetes'].value_counts().sort_index()
labels = ['No Diabetes', 'Diabetes']
ax.bar(labels, counts.values, color=['#8ecae6', '#e76f51'])
ax.set_title('Diabetes Counts in Cohort')
ax.set_ylabel('Number of patients')
plt.show()

In [None]:
fig, ax = plt.subplots(figsize=(7, 4))
scatter = ax.scatter(
    patients_with_score['bmi'],
    patients_with_score['systolic_bp'],
    c=patients_with_score['risk_score'],
    cmap='viridis',
    s=80
)
ax.set_title('BMI vs Systolic BP (color = risk score)')
ax.set_xlabel('BMI')
ax.set_ylabel('Systolic BP')
fig.colorbar(scatter, ax=ax, label='Risk score')
plt.show()

## Section 5: Mini challenge - define who gets follow-up first
Use the toy `risk_score` to flag a shortlist.
Then adjust thresholds and see how many patients are flagged.

In [None]:
threshold = 8
flagged = patients_with_score[patients_with_score['risk_score'] >= threshold]
print('Threshold:', threshold)
print('Flagged:', len(flagged), 'of', len(patients_with_score))
flagged[['patient_id', 'age', 'systolic_bp', 'ldl_mg_dl', 'risk_score']].head(10)

In [None]:
from IPython.display import display
from ipywidgets import IntSlider, interact

def review_threshold(threshold=8):
    flagged_local = patients_with_score[patients_with_score['risk_score'] >= threshold]
    print('Threshold:', threshold)
    print('Flagged patients:', len(flagged_local), 'out of', len(patients_with_score))
    display(flagged_local[['patient_id', 'age', 'systolic_bp', 'ldl_mg_dl', 'risk_score']].head(10))

interact(review_threshold, threshold=IntSlider(value=8, min=4, max=12, step=1));

## Section 6: Notebook habits checklist
### Reproducibility checklist
- Restart kernel and run all before sharing results.
- Keep file paths relative (for example `../data/...`).
- Do not trust one chart; verify with raw rows.

### Safety checklist before acting on output
- Is this real patient data or synthetic demo data?
- Are there missing values or wrong units?
- Does the output match clinical common sense?
- Is this model or score validated for your population?

## Wrap-up
You now have the core skills needed for the rest of Medical AI 101:
- run notebooks safely
- read and edit basic Python
- inspect medical tables
- generate quick visual checks

Next module: `01_history_of_medical_ai.ipynb`.