# Introduction to Python Programming
### BMI 511: Advanced Topics in Biomedical Informatics

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/duttaprat/BMI_511/blob/main/2025/02_05_2025/python_programming_tutorial.ipynb)

**Instructor:** Dr. Pratik Dutta  
**Department:** Biomedical Informatics, Stony Brook University  
**Date:** February 5, 2025

---

## Learning Objectives

By the end of this tutorial, you will be able to:
1. Understand Python basics: variables, data types, and operators
2. Work with control flow: conditionals and loops
3. Use functions and understand scope
4. Manipulate data structures: lists, dictionaries, sets, and tuples
5. Handle files and perform basic I/O operations
6. Apply these concepts to biomedical informatics problems

---
## Table of Contents

1. [Getting Started with Python](#section1)
2. [Variables and Data Types](#section2)
3. [Operators](#section3)
4. [Control Flow: Conditionals](#section4)
5. [Control Flow: Loops](#section5)
6. [Functions](#section6)
7. [Data Structures](#section7)
8. [File I/O](#section8)
9. [Biomedical Applications](#section9)
10. [Practice Exercises](#section10)

---
<a id='section1'></a>
## 1. Getting Started with Python

Python is a high-level, interpreted programming language that's widely used in biomedical informatics for:
- Data analysis and visualization
- Machine learning and AI
- Genomics and bioinformatics
- Clinical data processing

### Why Google Colab?
- **No installation required** - runs in your browser
- **Free GPU/TPU access** - for computational tasks
- **Easy sharing** - collaborate with colleagues
- **Pre-installed libraries** - common packages ready to use

In [None]:
# Your first Python program!
print("Hello, Biomedical Informatics!")

# Check Python version
import sys
print(f"Python version: {sys.version}")

---
<a id='section2'></a>
## 2. Variables and Data Types

Variables are containers for storing data values. Python has several built-in data types.

In [None]:
# Numeric types
age = 25                    # Integer
temperature = 98.6          # Float
complex_num = 3 + 4j        # Complex number

print(f"Age: {age}, Type: {type(age)}")
print(f"Temperature: {temperature}, Type: {type(temperature)}")
print(f"Complex: {complex_num}, Type: {type(complex_num)}")

In [None]:
# String type
gene_name = "BRCA1"
patient_id = 'P12345'
dna_sequence = "ATCGATCG"

print(f"Gene: {gene_name}")
print(f"Patient ID: {patient_id}")
print(f"DNA Sequence: {dna_sequence}")
print(f"Length of sequence: {len(dna_sequence)}")

In [None]:
# Boolean type
is_healthy = True
has_mutation = False

print(f"Is healthy: {is_healthy}, Type: {type(is_healthy)}")
print(f"Has mutation: {has_mutation}")

In [None]:
# Type conversion
x = "123"
y = int(x)        # String to integer
z = float(x)      # String to float

print(f"Original: {x} (type: {type(x)})")
print(f"As integer: {y} (type: {type(y)})")
print(f"As float: {z} (type: {type(z)})")

---
<a id='section3'></a>
## 3. Operators

Operators are used to perform operations on variables and values.

In [None]:
# Arithmetic operators
a = 10
b = 3

print(f"Addition: {a} + {b} = {a + b}")
print(f"Subtraction: {a} - {b} = {a - b}")
print(f"Multiplication: {a} * {b} = {a * b}")
print(f"Division: {a} / {b} = {a / b}")
print(f"Floor Division: {a} // {b} = {a // b}")
print(f"Modulus: {a} % {b} = {a % b}")
print(f"Exponentiation: {a} ** {b} = {a ** b}")

In [None]:
# Comparison operators
x = 5
y = 10

print(f"{x} == {y}: {x == y}")
print(f"{x} != {y}: {x != y}")
print(f"{x} < {y}: {x < y}")
print(f"{x} > {y}: {x > y}")
print(f"{x} <= {y}: {x <= y}")
print(f"{x} >= {y}: {x >= y}")

In [None]:
# Logical operators
is_diabetic = True
is_hypertensive = False

print(f"AND: {is_diabetic and is_hypertensive}")
print(f"OR: {is_diabetic or is_hypertensive}")
print(f"NOT: {not is_diabetic}")

In [None]:
# String operators
seq1 = "ATCG"
seq2 = "GCTA"

# Concatenation
combined = seq1 + seq2
print(f"Combined sequence: {combined}")

# Repetition
repeated = seq1 * 3
print(f"Repeated sequence: {repeated}")

# Membership
print(f"'AT' in seq1: {'AT' in seq1}")
print(f"'GG' in seq1: {'GG' in seq1}")

---
<a id='section4'></a>
## 4. Control Flow: Conditionals

Conditionals allow you to execute different code based on conditions.

In [None]:
# Simple if statement
blood_glucose = 150  # mg/dL

if blood_glucose > 126:
    print("High blood glucose - potential diabetes")
    print("Recommend further testing")

In [None]:
# if-else statement
heart_rate = 75  # bpm

if heart_rate < 60:
    print("Bradycardia - heart rate too low")
elif heart_rate > 100:
    print("Tachycardia - heart rate too high")
else:
    print("Normal heart rate")

In [None]:
# Nested conditionals
bmi = 28.5
has_comorbidity = True

if bmi < 18.5:
    status = "Underweight"
elif bmi < 25:
    status = "Normal weight"
elif bmi < 30:
    status = "Overweight"
    if has_comorbidity:
        print("Warning: Overweight with comorbidities")
else:
    status = "Obese"

print(f"BMI Status: {status}")

---
<a id='section5'></a>
## 5. Control Flow: Loops

Loops allow you to repeat code multiple times.

In [None]:
# For loop - iterating over a sequence
nucleotides = ['A', 'T', 'C', 'G']

print("DNA Nucleotides:")
for nucleotide in nucleotides:
    print(f"- {nucleotide}")

In [None]:
# For loop with range
print("Patient IDs:")
for i in range(1, 6):
    patient_id = f"P{i:04d}"  # Format with leading zeros
    print(patient_id)

In [None]:
# While loop
dna_sequence = "ATCGATCG"
position = 0

print("Finding 'G' nucleotides:")
while position < len(dna_sequence):
    if dna_sequence[position] == 'G':
        print(f"Found 'G' at position {position}")
    position += 1

In [None]:
# Loop control: break and continue
temperatures = [98.6, 99.5, 101.2, 98.9, 97.8]

print("Checking temperatures (stop at fever):")
for i, temp in enumerate(temperatures):
    if temp > 100.4:  # Fever threshold
        print(f"Fever detected at reading {i+1}: {temp}°F")
        break
    elif temp < 98:
        print(f"Reading {i+1}: Low temperature, skipping...")
        continue
    else:
        print(f"Reading {i+1}: Normal - {temp}°F")

---
<a id='section6'></a>
## 6. Functions

Functions are reusable blocks of code that perform specific tasks.

In [None]:
# Simple function
def calculate_bmi(weight_kg, height_m):
    """Calculate Body Mass Index."""
    bmi = weight_kg / (height_m ** 2)
    return bmi

# Using the function
weight = 70  # kg
height = 1.75  # meters
result = calculate_bmi(weight, height)
print(f"BMI: {result:.2f}")

In [None]:
# Function with default parameters
def analyze_sequence(sequence, nucleotide='A'):
    """Count occurrences of a nucleotide in a DNA sequence."""
    count = sequence.count(nucleotide)
    percentage = (count / len(sequence)) * 100
    return count, percentage

dna = "ATCGATCGATCG"
count, pct = analyze_sequence(dna)
print(f"Adenine (A): {count} occurrences ({pct:.1f}%)")

count, pct = analyze_sequence(dna, 'G')
print(f"Guanine (G): {count} occurrences ({pct:.1f}%)")

In [None]:
# Function with multiple return values
def calculate_gc_content(sequence):
    """Calculate GC content of a DNA sequence."""
    sequence = sequence.upper()
    g_count = sequence.count('G')
    c_count = sequence.count('C')
    total = len(sequence)
    gc_content = ((g_count + c_count) / total) * 100
    
    return g_count, c_count, gc_content

dna = "ATCGATCGGGCC"
g, c, gc_pct = calculate_gc_content(dna)
print(f"G: {g}, C: {c}, GC Content: {gc_pct:.2f}%")

In [None]:
# Lambda functions (anonymous functions)
complement = lambda nucleotide: {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}[nucleotide]

print(f"Complement of A: {complement('A')}")
print(f"Complement of G: {complement('G')}")

# Using lambda with map
sequence = "ATCG"
complement_seq = ''.join(map(complement, sequence))
print(f"Sequence: {sequence}")
print(f"Complement: {complement_seq}")

---
<a id='section7'></a>
## 7. Data Structures

Python provides several built-in data structures for organizing data.

### 7.1 Lists
Lists are ordered, mutable collections.

In [None]:
# Creating and manipulating lists
genes = ['BRCA1', 'TP53', 'EGFR', 'KRAS']

print(f"Genes: {genes}")
print(f"First gene: {genes[0]}")
print(f"Last gene: {genes[-1]}")

# Adding elements
genes.append('MYC')
print(f"After append: {genes}")

# Removing elements
genes.remove('EGFR')
print(f"After remove: {genes}")

# Slicing
print(f"First two genes: {genes[:2]}")
print(f"Last two genes: {genes[-2:]}")

In [None]:
# List comprehension
temperatures = [98.6, 99.5, 100.2, 97.8, 101.1]

# Convert to Celsius
celsius = [(temp - 32) * 5/9 for temp in temperatures]
print(f"Fahrenheit: {temperatures}")
print(f"Celsius: {[f'{c:.1f}' for c in celsius]}")

# Filter fevers (>100.4°F)
fevers = [temp for temp in temperatures if temp > 100.4]
print(f"Fever readings: {fevers}")

### 7.2 Dictionaries
Dictionaries store key-value pairs.

In [None]:
# Patient record as a dictionary
patient = {
    'id': 'P12345',
    'name': 'John Doe',
    'age': 45,
    'diagnoses': ['Hypertension', 'Type 2 Diabetes'],
    'blood_type': 'O+'
}

print(f"Patient ID: {patient['id']}")
print(f"Age: {patient['age']}")
print(f"Diagnoses: {', '.join(patient['diagnoses'])}")

# Adding new key-value pair
patient['last_visit'] = '2024-01-15'
print(f"\nUpdated patient record:")
for key, value in patient.items():
    print(f"  {key}: {value}")

In [None]:
# Genetic code dictionary
genetic_code = {
    'AUG': 'Methionine',
    'UUU': 'Phenylalanine',
    'UUC': 'Phenylalanine',
    'UUA': 'Leucine',
    'UAA': 'STOP',
    'UAG': 'STOP',
    'UGA': 'STOP'
}

codon = 'AUG'
amino_acid = genetic_code.get(codon, 'Unknown')
print(f"Codon {codon} codes for: {amino_acid}")

# Check if codon is a stop codon
if genetic_code.get('UAA') == 'STOP':
    print("UAA is a stop codon")

### 7.3 Sets
Sets are unordered collections of unique elements.

In [None]:
# Gene sets for pathway analysis
pathway_a = {'BRCA1', 'TP53', 'EGFR', 'MYC'}
pathway_b = {'TP53', 'KRAS', 'MYC', 'PTEN'}

print(f"Pathway A: {pathway_a}")
print(f"Pathway B: {pathway_b}")

# Set operations
common = pathway_a & pathway_b  # Intersection
print(f"\nCommon genes: {common}")

unique_to_a = pathway_a - pathway_b  # Difference
print(f"Unique to Pathway A: {unique_to_a}")

all_genes = pathway_a | pathway_b  # Union
print(f"All genes: {all_genes}")

### 7.4 Tuples
Tuples are ordered, immutable collections.

In [None]:
# Genomic coordinates (chromosome, start, end)
region = ('chr1', 1000000, 1005000)

chromosome, start, end = region  # Tuple unpacking
print(f"Chromosome: {chromosome}")
print(f"Start: {start:,}")
print(f"End: {end:,}")
print(f"Length: {end - start:,} bp")

# Multiple genomic regions
regions = [
    ('chr1', 1000000, 1005000),
    ('chr2', 2000000, 2010000),
    ('chrX', 5000000, 5008000)
]

print("\nGenomic Regions:")
for chrom, start, end in regions:
    print(f"  {chrom}: {start:,} - {end:,}")

---
<a id='section8'></a>
## 8. File I/O

Reading and writing files is essential for working with biomedical data.

In [None]:
# Writing to a file
dna_sequence = "ATCGATCGATCG"

with open('sequence.txt', 'w') as file:
    file.write(f"DNA Sequence: {dna_sequence}\n")
    file.write(f"Length: {len(dna_sequence)} bp\n")

print("File 'sequence.txt' created successfully!")

In [None]:
# Reading from a file
with open('sequence.txt', 'r') as file:
    content = file.read()
    print("File contents:")
    print(content)

In [None]:
# Writing FASTA format
fasta_data = [
    ('>Gene1', 'ATCGATCGATCG'),
    ('>Gene2', 'GCTAGCTAGCTA'),
    ('>Gene3', 'TTAATTAATTAA')
]

with open('sequences.fasta', 'w') as file:
    for header, sequence in fasta_data:
        file.write(f"{header}\n")
        file.write(f"{sequence}\n")

print("FASTA file created!")

In [None]:
# Reading FASTA format
sequences = {}

with open('sequences.fasta', 'r') as file:
    current_header = None
    for line in file:
        line = line.strip()
        if line.startswith('>'):
            current_header = line[1:]  # Remove '>'
            sequences[current_header] = ''
        else:
            sequences[current_header] += line

print("Parsed sequences:")
for header, seq in sequences.items():
    print(f"{header}: {seq}")

---
<a id='section9'></a>
## 9. Biomedical Applications

Let's apply what we've learned to real biomedical informatics problems.

### 9.1 DNA Sequence Analysis

In [None]:
def reverse_complement(sequence):
    """Generate the reverse complement of a DNA sequence."""
    complement_map = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
    complement = ''.join([complement_map[base] for base in sequence])
    return complement[::-1]

def translate_dna(sequence):
    """Translate DNA to protein (simplified)."""
    # Convert DNA to RNA
    rna = sequence.replace('T', 'U')
    
    # Simplified genetic code
    codon_table = {
        'AUG': 'M', 'UUU': 'F', 'UUC': 'F', 'UUA': 'L', 'UUG': 'L',
        'UCU': 'S', 'UCC': 'S', 'UCA': 'S', 'UCG': 'S',
        'UAU': 'Y', 'UAC': 'Y', 'UAA': '*', 'UAG': '*', 'UGA': '*',
        'UGU': 'C', 'UGC': 'C', 'UGG': 'W',
        'CUU': 'L', 'CUC': 'L', 'CUA': 'L', 'CUG': 'L',
        'CCU': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P',
        'CAU': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
        'CGU': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R',
        'AUU': 'I', 'AUC': 'I', 'AUA': 'I',
        'ACU': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T',
        'AAU': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K',
        'AGU': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',
        'GUU': 'V', 'GUC': 'V', 'GUA': 'V', 'GUG': 'V',
        'GCU': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',
        'GAU': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E',
        'GGU': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G'
    }
    
    protein = ''
    for i in range(0, len(rna) - 2, 3):
        codon = rna[i:i+3]
        amino_acid = codon_table.get(codon, 'X')
        if amino_acid == '*':  # Stop codon
            break
        protein += amino_acid
    
    return protein

# Test the functions
dna = "ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG"

print(f"Original DNA: {dna}")
print(f"Reverse Complement: {reverse_complement(dna)}")
print(f"Protein: {translate_dna(dna)}")

### 9.2 Clinical Data Processing

In [None]:
# Patient vital signs monitoring
def categorize_blood_pressure(systolic, diastolic):
    """Categorize blood pressure according to AHA guidelines."""
    if systolic < 120 and diastolic < 80:
        return "Normal"
    elif systolic < 130 and diastolic < 80:
        return "Elevated"
    elif systolic < 140 or diastolic < 90:
        return "Hypertension Stage 1"
    elif systolic < 180 or diastolic < 120:
        return "Hypertension Stage 2"
    else:
        return "Hypertensive Crisis"

# Sample patient data
patients = [
    {'id': 'P001', 'systolic': 118, 'diastolic': 78},
    {'id': 'P002', 'systolic': 128, 'diastolic': 79},
    {'id': 'P003', 'systolic': 145, 'diastolic': 92},
    {'id': 'P004', 'systolic': 185, 'diastolic': 125}
]

print("Blood Pressure Analysis:")
print("-" * 60)
for patient in patients:
    category = categorize_blood_pressure(patient['systolic'], patient['diastolic'])
    print(f"Patient {patient['id']}: {patient['systolic']}/{patient['diastolic']} mmHg - {category}")

### 9.3 Variant Analysis

In [None]:
def parse_variant(variant_string):
    """Parse a variant in HGVS notation.
    
    Example: 'chr17:41276045:C>T'
    """
    parts = variant_string.split(':')
    chromosome = parts[0]
    position = int(parts[1])
    ref, alt = parts[2].split('>')
    
    return {
        'chromosome': chromosome,
        'position': position,
        'reference': ref,
        'alternate': alt
    }

def classify_variant_type(ref, alt):
    """Classify variant as SNV, insertion, or deletion."""
    if len(ref) == 1 and len(alt) == 1:
        return "SNV"
    elif len(ref) < len(alt):
        return "Insertion"
    elif len(ref) > len(alt):
        return "Deletion"
    else:
        return "Complex"

# Example variants
variants = [
    'chr17:41276045:C>T',  # BRCA1 SNV
    'chr13:32936732:G>A',  # BRCA2 SNV
    'chr7:55249071:G>GT',  # EGFR insertion
]

print("Variant Analysis:")
print("-" * 70)
for var_string in variants:
    var = parse_variant(var_string)
    var_type = classify_variant_type(var['reference'], var['alternate'])
    
    print(f"Variant: {var_string}")
    print(f"  Type: {var_type}")
    print(f"  Location: {var['chromosome']}:{var['position']}")
    print(f"  Change: {var['reference']} → {var['alternate']}")
    print()

---
<a id='section10'></a>
## 10. Practice Exercises

Try these exercises to reinforce your learning!

### Exercise 1: GC Content Calculator
Write a function that calculates the GC content of a DNA sequence and classifies it as Low (<40%), Medium (40-60%), or High (>60%).

In [None]:
# Your code here
def analyze_gc_content(sequence):
    """
    Calculate GC content and classify it.
    
    Args:
        sequence: DNA sequence string
    
    Returns:
        tuple: (gc_percentage, classification)
    """
    # TODO: Implement this function
    pass

# Test your function
# test_sequence = "ATCGATCGGGCCATCG"
# gc_pct, category = analyze_gc_content(test_sequence)
# print(f"GC Content: {gc_pct:.2f}% - {category}")

### Exercise 2: Patient Age Calculator
Write a function that takes a birth date and calculates the patient's current age in years.

In [None]:
# Your code here
from datetime import datetime

def calculate_age(birth_date_string):
    """
    Calculate age from birth date.
    
    Args:
        birth_date_string: Date in format 'YYYY-MM-DD'
    
    Returns:
        int: Age in years
    """
    # TODO: Implement this function
    pass

# Test your function
# age = calculate_age('1980-05-15')
# print(f"Patient age: {age} years")

### Exercise 3: Codon Usage Analysis
Write a function that counts the frequency of each codon in a DNA sequence.

In [None]:
# Your code here
def count_codons(sequence):
    """
    Count frequency of codons in a DNA sequence.
    
    Args:
        sequence: DNA sequence string (length should be multiple of 3)
    
    Returns:
        dict: Dictionary with codon counts
    """
    # TODO: Implement this function
    pass

# Test your function
# test_seq = "ATGATGATGAAACCCGGGTTT"
# codon_counts = count_codons(test_seq)
# for codon, count in codon_counts.items():
#     print(f"{codon}: {count}")

### Exercise 4: BMI Category Tracker
Write a program that tracks BMI measurements over time and identifies trends.

In [None]:
# Your code here
def analyze_bmi_trend(bmi_measurements):
    """
    Analyze BMI trend over time.
    
    Args:
        bmi_measurements: List of BMI values in chronological order
    
    Returns:
        str: Trend description ('Increasing', 'Decreasing', 'Stable')
    """
    # TODO: Implement this function
    pass

# Test your function
# measurements = [28.5, 27.8, 27.2, 26.9, 26.5]
# trend = analyze_bmi_trend(measurements)
# print(f"BMI Trend: {trend}")

---
## Summary

Congratulations! You've learned:

✅ Python fundamentals: variables, data types, operators  
✅ Control flow: conditionals and loops  
✅ Functions and code organization  
✅ Data structures: lists, dictionaries, sets, tuples  
✅ File I/O operations  
✅ Biomedical applications of Python

### Next Steps

1. Practice with the exercises above
2. Explore Python libraries for bioinformatics:
   - **BioPython**: Sequence analysis, file parsing
   - **Pandas**: Data manipulation and analysis
   - **NumPy**: Numerical computing
   - **Matplotlib/Seaborn**: Data visualization
3. Work on real biomedical datasets
4. Build your own bioinformatics tools

### Resources

- Python Documentation: https://docs.python.org/
- BioPython Tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html
- Real Python: https://realpython.com/
- Rosalind (Bioinformatics problems): http://rosalind.info/

---
## Questions?

Feel free to reach out if you have any questions about this material!

**Dr. Pratik Dutta**  
Department of Biomedical Informatics  
Stony Brook University