# Getting Started with Python (for Medical Students)


**How to use this notebook**
- A notebook is made of **cells**.
- **Markdown cells** (like this one) contain formatted notes.
- **Code cells** contain Python code.
- To run a code cell, click it and press **Shift + Enter**.

Throughout this notebook, you’ll see a few new phrases:
- **Program**: written instructions
- **Execution**: the computer running those instructions
- **Trace execution**: follow what the program does step‑by‑step
- **Debugging**: finding and fixing errors


## Session Objectives

By the end, you should be able to:

1. Explain what a **program** is and what **execution** means.
2. Use **print statements** to view information and **debug**.
3. Create and use **variables**.
4. Do **basic math** in Python (e.g., `2 + 2`).
5. Recognize common **data types** (string, list/array‑like, object).
6. **Load data** into Python
7. Use code to analyze a data table and compute basic numerical summaries


## 1) Programs and Execution

A **program** is a set of instructions written for a computer.

**Execution** means the computer runs those instructions **top → bottom**, one line at a time.


In [1]:
# A tiny program: one instruction
print("Hello, world!")

Hello, world!


In [None]:
# Execution happens top to bottom:
print("Step 1")
print("Step 2")
print("Step 3")

## 2) Print Statements (your window into the program)

`print()` is how Python **talks back to you**.

Use it to:
- **Return information to you** (show values on the screen)
- **Trace the execution of a program** (see what happens step‑by‑step)
- **Debug** (check where things go wrong)



In [2]:
# Print can show text (a "string")
print("This is a message from Python.")

This is a message from Python.


In [3]:
# Print can show multiple things at once
patient = "Alex"
age = 52
print("Patient:", patient, "| Age:", age)

Patient: Alex | Age: 52


### Debugging example (intentional error)

We’ll create an error on purpose, then use `print()` to see what happened.

> **Debugging** = identifying the source of the error and fixing it.


In [4]:
a = 10
b = 0

print("a =", a)
print("b =", b)

# This line will cause an error (division by zero)
print("a / b =", a / b)

a = 10
b = 0


ZeroDivisionError: division by zero

## 3) Variables 

A **variable** is a named container that stores a value.

Why variables matter:
- They let you reuse information
- They help you track the **state** of the program at a particular point in **execution**


In [5]:
diagnosis = "Hypertension"   # text
systolic_bp = 148           # a number

print("Diagnosis:", diagnosis)
print("Systolic BP:", systolic_bp)

Diagnosis: Hypertension
Systolic BP: 148


In [6]:
# Variables can change during execution (trace the execution!)
age = 60
print("Starting age:", age)

age = age + 1
print("After a birthday:", age)

Starting age: 60
After a birthday: 61


## 4) Python as a Calculator (basic math)

Python can evaluate expressions like a calculator:

- `+` add
- `-` subtract
- `*` multiply
- `/` divide
- `**` exponent (power)



In [None]:
2 + 2

In [None]:
# Dose calculation example
weight_kg = 70
dose_per_kg = 0.5

dose_mg = weight_kg * dose_per_kg
print("Calculated dose (mg):", dose_mg)

## 5) Data Types (what *kind* of value is this?)

A **data type** tells Python what kind of thing a value is.

Common types you’ll see today:
- **int**: whole number (e.g., `5`)
- **float**: decimal number (e.g., `3.14`)
- **str** (string): text (e.g., `"cancer"`)
- **bool**: True/False
- **list**: a collection of values (array‑like in Python)
- **object**: a more complex “thing” (for example, a spreadsheet‑style table)

You can ask Python for the type using `type(...)`.


In [7]:
x = 5
y = 3.14
label = "Troponin"
flag = True

print("x:", x, "| type:", type(x))
print("y:", y, "| type:", type(y))
print("label:", label, "| type:", type(label))
print("flag:", flag, "| type:", type(flag))

x: 5 | type: <class 'int'>
y: 3.14 | type: <class 'float'>
label: Troponin | type: <class 'str'>
flag: True | type: <class 'bool'>


## 6) Lists and Arrays (array‑like data)

In many contexts, people say **array** to mean:
> “a bunch of values stored together”

In **Python**, the most common beginner-friendly “array‑like” structure is a **list**.

Example: multiple lab values stored together:


In [None]:
labs = [120, 130, 140]
print(labs)
print("Type:", type(labs))

### Indexing (getting one value)

Python counts positions starting at **0**:

- `labs[0]` is the first value  
- `labs[1]` is the second value  


In [None]:
print("First lab value:", labs[0])
print("Second lab value:", labs[1])

## 7) Loading Data 

A common workflow in medicine/research is:
1. **Load data** (often from a CSV file: basically a spreadsheet saved as text)
2. Inspect the first few rows
3. Analyze

Below, we’ll load a small CSV **from a text string** so this notebook works anywhere.


In [5]:
import pandas as pd

df = pd.read_csv("/Users/aarthimuthukumar/Documents/Medical School/Research Papers/Coding Bootamp/hiv_diagnoses_2022.csv")

print("Data loaded! df is a DataFrame (a spreadsheet-like object).")
print("Type:", type(df))

Data loaded! df is a DataFrame (a spreadsheet-like object).
Type: <class 'pandas.DataFrame'>


In [6]:
# Peek at the first rows
df.head()

Unnamed: 0,Indicator,Year,Geography,FIPS,Cases,Rate per 100000
0,HIV diagnoses,2022,"Abbeville County, SC",45001,Data suppressed,Data suppressed
1,HIV diagnoses,2022,"Acadia Parish, LA",22001,12,25.8
2,HIV diagnoses,2022,"Accomack County, VA",51001,Data suppressed,Data suppressed
3,HIV diagnoses,2022,"Ada County, ID",16001,15,3.4
4,HIV diagnoses,2022,"Adair County, IA",19001,0,0


In [7]:
# Peek the first 10 rows
df.head(10)

Unnamed: 0,Indicator,Year,Geography,FIPS,Cases,Rate per 100000
0,HIV diagnoses,2022,"Abbeville County, SC",45001,Data suppressed,Data suppressed
1,HIV diagnoses,2022,"Acadia Parish, LA",22001,12,25.8
2,HIV diagnoses,2022,"Accomack County, VA",51001,Data suppressed,Data suppressed
3,HIV diagnoses,2022,"Ada County, ID",16001,15,3.4
4,HIV diagnoses,2022,"Adair County, IA",19001,0,0
5,HIV diagnoses,2022,"Adair County, KY",21001,0,0
6,HIV diagnoses,2022,"Adair County, MO",29001,Data suppressed,Data suppressed
7,HIV diagnoses,2022,"Adair County, OK",40001,0,0
8,HIV diagnoses,2022,"Adams County, CO",8001,52,11.9
9,HIV diagnoses,2022,"Adams County, IA",19003,0,0


In [9]:
# Basic info you often check first
print("Rows, Columns:", df.shape)
print("Column names:", list(df.columns))

Rows, Columns: (3231, 6)
Column names: ['Indicator', 'Year', 'Geography', 'FIPS', 'Cases', 'Rate per 100000']


## Quick practice (2 minutes)

1. Make a variable called `favorite_food` and print it.
2. Make a list called `heart_rates` with 3 numbers and print the first one.
3. Change a variable (e.g., `age`) and use `print()` to **trace** how it changed.

If you get an error:
- Read the message
- Add `print()` statements above the failing line
- Try to **identify the source of the error**
