# Python 103 – Applied AI Readiness for Healthcare


## Course Overview
This notebook builds on your skills from Python 101 and 102 to introduce essential tools for building basic AI pipelines. You will learn how to preprocess clinical data, work with machine learning models using `scikit-learn`, and retrieve data using APIs. The notebook concludes with a mini-project to structure a small data pipeline or model prototype.

## Learning Objectives
- Understand the basics of machine learning with `scikit-learn`
- Preprocess patient data (normalization, encoding, scaling)
- Build a simple machine learning pipeline
- Retrieve data using public APIs
- Organize code and results in Jupyter Notebooks
- Complete a prototype mini-project using real-world clinical data

## 1. Setup and Data Access
We will continue using the `patjs/patient1` dataset from Hugging Face.

In [None]:
!pip install -q datasets scikit-learn

In [None]:
from datasets import load_dataset
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt

In [None]:
data = load_dataset('patjs/patient1')
patients_df = data['patients'].to_pandas()
encounters_df = data['encounters'].to_pandas()

In [None]:
print(patients_df.head())
print(encounters_df.head())

## 2. Preprocessing Data for Machine Learning
We will use age and gender to predict whether a patient has had more than one encounter.

In [None]:
# Feature engineering: calculate age
patients_df['BIRTHDATE'] = pd.to_datetime(patients_df['BIRTHDATE'])
patients_df['AGE'] = 2024 - patients_df['BIRTHDATE'].dt.year
# Count encounters
encounter_counts = encounters_df.groupby('PATIENT').size().reset_index(name='NUM_ENCOUNTERS')
# Merge data
merged_df = pd.merge(patients_df, encounter_counts, left_on='Id', right_on='PATIENT', how='inner')
# Create binary target
merged_df['MULTI_VISIT'] = (merged_df['NUM_ENCOUNTERS'] > 1).astype(int)
print(merged_df[['AGE', 'GENDER', 'MULTI_VISIT']].head())

### Encode categorical variables and scale numerical inputs

In [None]:
le = LabelEncoder()
merged_df['GENDER_ENC'] = le.fit_transform(merged_df['GENDER'])
X = merged_df[['AGE', 'GENDER_ENC']]
y = merged_df['MULTI_VISIT']

## 3. Building a scikit-learn Pipeline

In [None]:
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', LogisticRegression())
])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
pipeline.fit(X_train, y_train)
print("Model Accuracy:", pipeline.score(X_test, y_test))

## 4. Calling APIs for External Data
You can use APIs to augment datasets or interact with cloud services.
Here's how you might query a public health API (example placeholder).

In [None]:
import requests
response = requests.get('https://api.github.com/repos/huggingface/datasets')
if response.status_code == 200:
    print(response.json()['description'])

## 5. Organizing Work in Jupyter Notebooks
- Use markdown for documenting process
- Use meaningful headings and comments
- Store intermediate outputs and versions
- Export notebooks using File > Download As

## 6. Mini-Project: Build a Simple Risk Model
Using the same `merged_df`:
- Create a new feature based on `AGE` and `NUM_ENCOUNTERS`
- Train a decision tree classifier using `sklearn`
- Evaluate its performance and document your process

## Summary
- You learned how to structure ML pipelines
- Applied encoding, scaling, and basic modeling
- Used APIs to retrieve external data
- Practiced organizing an AI development notebook

You're now ready to start working on real-world AI applications using Python.