# Python 101 – Foundations of Python Programming for Clinical Data

## Course Overview
This notebook introduces core Python concepts and syntax. Learners will be able to write basic scripts, understand foundational programming principles, install and use libraries, and perform simple data manipulations.

## Learning Objectives
- Set up Python with VS Code, Anaconda, or Jupyter Notebooks
- Install and import libraries
- Understand Python syntax including variables, types, and control structures
- Write basic functions and loops
- Handle basic errors
- Perform foundational data manipulations

## 1. Environment Setup
For this course, we recommend using **Google Colab** or **Hugging Face Spaces Notebooks** to avoid local setup complications. These platforms allow you to run Python code directly in your browser and come pre-installed with common data science libraries.

### Google Colab
- Visit: https://colab.research.google.com
- Upload this notebook or open from GitHub/Hugging Face
- You can run code cells interactively in the browser

### Hugging Face Notebooks (Spaces)
- Visit: https://huggingface.co/spaces
- Create a new notebook space using `notebook` as the SDK
- You can use datasets from the 🤗 HuggingFace Datasets Hub with simple APIs

### Optional: Local Setup with VS Code
If you prefer working locally:
- Install [Python](https://www.python.org/downloads/) and [VS Code](https://code.visualstudio.com/)
- Install Jupyter: `pip install notebook`
- Launch: `jupyter notebook`

Continue below to install packages (if needed) and begin exploring data.

## 2. Installing and Importing Libraries
Python packages can be installed using pip. For example:
```
pip install pandas numpy matplotlib
```
To use a package in your code, import it with an alias:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## 3. Python Syntax and Concepts
### Variable Assignment and Types
Python uses dynamic typing and indentation to define code blocks.

In [None]:
name = "Alex"
age = 45
smoker = False
print(f"{name} is {age} years old. Smoker: {smoker}")

### String Formatting
Use f-strings for readable and formatted output.

In [None]:
bmi = 27.5
print(f"Patient BMI is {bmi:.1f}")

## 4. Control Flow
Use conditional statements to direct code execution.

In [None]:
age = 70
if age > 65:
    print("High risk age group")
elif age > 45:
    print("Moderate risk")
else:
    print("Low risk")

## 5. Loops
Use loops to process lists or repeat tasks.

In [None]:
conditions = ["hypertension", "diabetes"]
for condition in conditions:
    print(f"Condition: {condition}")

## 6. Functions
Functions help you reuse logic and write clean code.

In [None]:
def calculate_bmi(weight, height):
    return weight / (height ** 2)

bmi = calculate_bmi(70, 1.75)
print(f"Calculated BMI: {bmi:.2f}")

## 7. Error Handling
Use try/except blocks to prevent crashes.

In [None]:
try:
    age = int(input("Enter patient age: "))
    print(f"Age entered: {age}")
except ValueError:
    print("Invalid input. Please enter a number.")

## 8. Practical Exercise
Create a simple patient record using a dictionary and print summary information.

In [None]:
patient = {
    "name": "Jordan",
    "age": 63,
    "conditions": ["asthma", "arthritis"],
    "smoker": True
}

print(f"{patient['name']} is {patient['age']} years old.")
print(f"Conditions: {', '.join(patient['conditions'])}")
print(f"Smoker status: {patient['smoker']}")

## Summary
- Python syntax is clean and readable
- You learned how to define variables, functions, loops, and handle errors
- You created and interacted with clinical-style data in Python

This foundational knowledge is essential for working with healthcare data and preparing for more advanced topics in Python 102.

## Loading Data from Hugging Face Datasets
In this course, we use publicly available synthetic healthcare data hosted on Hugging Face. This ensures reproducibility and avoids the need to download large files manually.

The dataset is hosted at: https://huggingface.co/datasets/patjs/patient1

Using the `datasets` library, we can load the data directly into pandas DataFrames for analysis.

In [None]:
!pip install -q datasets

from datasets import load_dataset
import pandas as pd

In [None]:
# Load the dataset from Hugging Face Hub
data = load_dataset('patjs/patient1')

# Convert to pandas DataFrames
patients_df = data['patients'].to_pandas()
encounters_df = data['encounters'].to_pandas()

In [None]:
# Preview the datasets
print("Patients Sample:")
print(patients_df.head())

print("Encounters Sample:")
print(encounters_df.head())

## Previewing Synthetic Healthcare Data
To practice working with healthcare data, we will use small sample datasets that resemble those generated by [Synthea](https://synthetichealth.github.io/synthea/), a tool for generating synthetic patient records.

These files include patient demographics and clinical encounters.

### Sample Files
- `patients.csv`: Basic patient information
- `encounters.csv`: Clinical encounter records for each patient

These are simplified and de-identified files meant for educational use.

In [None]:
import pandas as pd

# Load sample patient data
patients_df = pd.read_csv('patients.csv')
print("Patients:")
print(patients_df.head())

In [None]:
# Load sample encounter data
encounters_df = pd.read_csv('encounters.csv')
print("Encounters:")
print(encounters_df.head())

### Column Descriptions
**patients.csv**:
- `Id`: Unique patient identifier (anonymized)
- `BIRTHDATE`: Patient date of birth
- `GENDER`: Biological sex
- `RACE`, `ETHNICITY`: Standard demographic markers
- `DEATHDATE`: Included for completeness but blank for living patients

**encounters.csv**:
- `Id`: Unique encounter ID
- `START`, `STOP`: Timestamp of the clinical visit
- `PATIENT`: Links to `patients.csv` by patient ID
- `REASONDESCRIPTION`: Reason for the encounter (visit purpose)

## 9. Saving Processed Data to Files
In real-world healthcare data projects, it's important to export your cleaned or transformed data for downstream analysis or sharing.
We will demonstrate how to save a DataFrame to CSV and JSON formats, and explain each step involved.

### Required Setup
Ensure the following helper functions and file paths are defined before saving your processed dataset:

- A function to load CSVs into pandas DataFrames
- A function to normalize encounter data
- Functions to convert time columns and remap IDs
- A defined path containing clinical CSV files (e.g., 'synthea')

In [None]:
# Sample setup
path_small = 'synthea'
filenames = ['patients.csv', 'encounters.csv', 'observations.csv', 'conditions.csv',
             'medications.csv', 'procedures.csv']

In [None]:
# Load CSVs into a dictionary of DataFrames
df_dict = load_csvs_to_pandas(path_small, filenames)

In [None]:
# Normalize encounter information
normalized_encounter_df = create_normalized_encounter(df_dict)

In [None]:
# Convert 'START' and 'STOP' time columns to datetime
normalized_encounter_df = convert_time_to_datetime(normalized_encounter_df, 'START')
normalized_encounter_df = convert_time_to_datetime(normalized_encounter_df, 'STOP')

In [None]:
# Create ID maps for patients and encounters
patient_id_map = create_id_map(df_dict['patients'], 'Id')
encounter_id_map = create_id_map(df_dict['encounters'], 'Id')

In [None]:
# Replace GUIDs with integer IDs
final_df = replace_complex_ids(normalized_encounter_df, patient_id_map, 'PATIENT_ID')
final_df = replace_complex_ids(final_df, encounter_id_map, 'ENCOUNTER_ID')

### Saving the Final DataFrame
Use pandas to save your DataFrame in the following formats:

In [None]:
# Save to CSV
final_df.to_csv('final.csv', index=False)

# Save to line-delimited JSON with ISO-formatted dates
final_df.to_json('final.json', orient='records', lines=True, date_format='iso')

### (Optional) Upload to a Cloud Service or Dataset Repository
You can also upload the file to Hugging Face, S3, or another file storage platform if needed.
Here’s an example using a placeholder save function:

In [None]:
with open('final.csv', 'rb') as f:
    save_file(f, 'final.csv')  # Replace with your cloud-saving logic