# Lecture 3: Functions - Writing Reusable Code
## Python for Biology
**Learning Objectives:**
- Understand what functions are and why they're useful
- Write simple functions with and without parameters
- Return values from functions
- Use docstrings to document functions
- Refactor messy 'spaghetti code' into clean, organized functions

---

## Section 1: What Are Functions?

### Guided Example 1.1: The Problem - Repeated Code

Imagine you need to calculate GC content multiple times:

In [None]:
# Calculate GC content for sequence 1
seq1 = 'ATCGATCGGCGC'
gc_count1 = seq1.count('G') + seq1.count('C')
gc_percent1 = (gc_count1 / len(seq1)) * 100
print(f"Sequence 1 GC%: {gc_percent1:.1f}%")

# Calculate GC content for sequence 2
seq2 = 'AAATTTGGGCCC'
gc_count2 = seq2.count('G') + seq2.count('C')
gc_percent2 = (gc_count2 / len(seq2)) * 100
print(f"Sequence 2 GC%: {gc_percent2:.1f}%")

# Calculate GC content for sequence 3
seq3 = 'GCGCGCGC'
gc_count3 = seq3.count('G') + seq3.count('C')
gc_percent3 = (gc_count3 / len(seq3)) * 100
print(f"Sequence 3 GC%: {gc_percent3:.1f}%")

**The problem:**
- We're writing the same code 3 times!
- If we make a mistake, we have to fix it in 3 places
- If we want to change how we calculate GC%, we have to update 3 places
- This is hard to read and maintain

**Functions solve this problem!**

### Guided Example 1.2: The Solution - Using a Function

Let's write a function to do this calculation once, then use it many times:

In [None]:
# Define a function to calculate GC content
def calculate_gc_content(sequence):
    """Calculate the GC content percentage of a DNA sequence."""
    gc_count = sequence.count('G') + sequence.count('C')
    gc_percent = (gc_count / len(sequence)) * 100
    return gc_percent

# Now use it for all three sequences
seq1 = 'ATCGATCGGCGC'
seq2 = 'AAATTTGGGCCC'
seq3 = 'GCGCGCGC'

print(f"Sequence 1 GC%: {calculate_gc_content(seq1):.1f}%")
print(f"Sequence 2 GC%: {calculate_gc_content(seq2):.1f}%")
print(f"Sequence 3 GC%: {calculate_gc_content(seq3):.1f}%")

**Much better!**
- The calculation code is written **once**
- We **reuse** it by calling `calculate_gc_content()`
- Easier to read, test, and maintain
- If we need to fix something, we fix it in **one place**

---

## Section 2: Function Anatomy

Let's break down the parts of a function:

In [None]:
def calculate_gc_content(sequence):
    """Calculate the GC content percentage of a DNA sequence."""
    gc_count = sequence.count('G') + sequence.count('C')
    gc_percent = (gc_count / len(sequence)) * 100
    return gc_percent

**Parts of a function:**
1. **`def`**: Keyword that starts a function definition
2. **`calculate_gc_content`**: Function name (use descriptive names!)
3. **`(sequence)`**: Parameter - the input the function needs
4. **`:`**: Colon starts the function body
5. **Docstring** (in triple quotes): Describes what the function does
6. **Function body**: The code that does the work (indented!)
7. **`return`**: Sends a value back to whoever called the function

**Calling a function:**
```python
result = calculate_gc_content('ATCG')  # 'ATCG' is the argument
```

---

## Section 3: Simple Functions

### Guided Example 3.1: Function with no parameters

Some functions don't need any input - they just do something every time:

In [None]:
def greet_scientist():
    """Print a greeting message."""
    print("Welcome to the Biology Lab!")
    print("Ready to analyze some genes?")

# Call the function (note the empty parentheses)
greet_scientist()

**Key points:**
- No parameters needed: `def greet_scientist():`
- Still need parentheses when calling: `greet_scientist()`
- No `return` statement - it just prints and finishes

### Practice Example 3.1: Write a simple greeting function

Write a function called `print_dna_message()` that prints:
- "DNA is the blueprint of life!"
- "It contains the instructions for building proteins."

In [None]:
# YOUR CODE HERE: Define the function


# Test your function:
# print_dna_message()

### Guided Example 3.2: Function with one parameter

Functions become powerful when they can work with different inputs:

In [None]:
def complement_base(base):
    """Return the DNA complement of a single base."""
    if base == 'A':
        return 'T'
    elif base == 'T':
        return 'A'
    elif base == 'G':
        return 'C'
    elif base == 'C':
        return 'G'
    else:
        return 'N'  # Unknown

# Test with different inputs
print(complement_base('A'))  # Should print: T
print(complement_base('G'))  # Should print: C
print(complement_base('T'))  # Should print: A

**Key points:**
- `base` is a parameter - a placeholder for the input
- When we call `complement_base('A')`, `'A'` is the argument
- The function can return different values based on the input
- Same function, different results!

### Practice Example 3.2: Write a function with one parameter

Write a function called `calculate_protein_mass()` that:
- Takes one parameter: `num_amino_acids` (a number)
- Multiplies it by 110 (average mass of an amino acid in Daltons)
- Returns the estimated protein mass

In [None]:
# YOUR CODE HERE: Define the function


# Test your function:
# print(calculate_protein_mass(100))  # Should print: 11000
# print(calculate_protein_mass(250))  # Should print: 27500

### Guided Example 3.3: Function with multiple parameters

Functions can take multiple inputs:

In [None]:
def calculate_concentration(mass, volume):
    """Calculate concentration in mg/mL.
    
    Parameters:
        mass: mass in mg
        volume: volume in mL
    
    Returns:
        concentration in mg/mL
    """
    concentration = mass / volume
    return concentration

# Test with different values
print(f"Concentration: {calculate_concentration(50, 10):.1f} mg/mL")  # 5.0
print(f"Concentration: {calculate_concentration(100, 25):.1f} mg/mL")  # 4.0

**Key points:**
- Multiple parameters separated by commas: `def func(param1, param2):`
- When calling, provide arguments in the same order
- Notice the detailed docstring - it explains each parameter!

### Practice Example 3.3: Function with two parameters

Write a function called `calculate_dilution()` that:
- Takes two parameters: `initial_conc` and `dilution_factor`
- Calculates final concentration: `initial_conc / dilution_factor`
- Returns the final concentration

In [None]:
# YOUR CODE HERE: Define the function


# Test your function:
# print(calculate_dilution(100, 10))  # Should print: 10.0
# print(calculate_dilution(50, 5))    # Should print: 10.0

---

## Section 4: Understanding Return Values

### Guided Example 4.1: Printing vs Returning

**Important:** `print` and `return` are NOT the same!

In [None]:
# This function PRINTS but doesn't return anything
def count_genes_print(genes):
    """Count genes and print the result."""
    count = len(genes)
    print(f"Number of genes: {count}")
    # No return statement!

# This function RETURNS the count
def count_genes_return(genes):
    """Count genes and return the result."""
    count = len(genes)
    return count

# Test both
genes = ['BRCA1', 'TP53', 'MYC']

result1 = count_genes_print(genes)  # Prints, but result1 is None
print(f"Result 1: {result1}")  # None

result2 = count_genes_return(genes)  # Returns a value
print(f"Result 2: {result2}")  # 3

# We can do math with returned values!
double = result2 * 2
print(f"Double: {double}")  # 6

**Key differences:**
- **`print`**: Shows something on the screen, but the value is lost
- **`return`**: Sends a value back that you can store and use
- **Rule of thumb**: Use `return` for functions, use `print` for displaying results

**When to use each:**
- Use `return` when you want to use the value later (calculations, comparisons, etc.)
- Use `print` when you just want to show information to the user

### Practice Example 4.1: Return vs Print

Write TWO versions of a function that counts 'A' nucleotides:
1. `count_a_print(sequence)` - prints the count
2. `count_a_return(sequence)` - returns the count

Then test both and see the difference!

In [None]:
# YOUR CODE HERE: Write both functions


# Test them:
# seq = 'ATCGATCGATCG'
# result1 = count_a_print(seq)
# print(f"Stored result 1: {result1}")  # Should be None

# result2 = count_a_return(seq)
# print(f"Stored result 2: {result2}")  # Should be the count
# print(f"Double: {result2 * 2}")  # Can do math with it!

---

## Section 5: Docstrings - Documenting Your Functions

### Guided Example 5.1: Why docstrings matter

Docstrings help you and others understand what a function does:

In [None]:
# Bad: No documentation
def calc(s):
    return (s.count('G') + s.count('C')) / len(s) * 100

# Good: Clear docstring
def calculate_gc_content(sequence):
    """Calculate the GC content percentage of a DNA sequence.
    
    Parameters:
        sequence (str): A DNA sequence containing only A, T, G, C
    
    Returns:
        float: GC content as a percentage (0-100)
    
    Example:
        >>> calculate_gc_content('ATCG')
        50.0
    """
    gc_count = sequence.count('G') + sequence.count('C')
    return (gc_count / len(sequence)) * 100

# You can view the docstring with help()
help(calculate_gc_content)

**Good docstrings include:**
1. **Brief description**: What does the function do?
2. **Parameters**: What inputs does it need? What type?
3. **Returns**: What does it give back? What type?
4. **Example** (optional): Show how to use it

**Format:**
- Use triple quotes: `"""..."""`
- Put it right after the `def` line
- First line should be a brief summary

### Practice Example 5.1: Write a function with a good docstring

Write a function called `reverse_complement()` that:
- Takes a DNA sequence
- Returns the reverse complement
- **Include a detailed docstring!**

Hints for the function:
- First get the complement
- Then reverse it with `[::-1]`

In [None]:
# YOUR CODE HERE: Write the function with a good docstring


# Test it:
# print(reverse_complement('ATCG'))  # Should be: CGAT
# help(reverse_complement)  # Should show your docstring

---

## Section 6: Refactoring Spaghetti Code

"Spaghetti code" is messy, hard-to-read code that does everything in one big block. Functions help organize it!

### Guided Example 6.1: Messy Code → Clean Functions

**Before: Spaghetti code**

In [None]:
# Analyze three DNA sequences - MESSY VERSION
seq1 = 'ATCGATCG'
seq2 = 'GCGCGCGC'
seq3 = 'AAAATTTT'

# Sequence 1 analysis
gc1 = (seq1.count('G') + seq1.count('C')) / len(seq1) * 100
at1 = (seq1.count('A') + seq1.count('T')) / len(seq1) * 100
print(f"Seq1 - Length: {len(seq1)}, GC%: {gc1:.1f}, AT%: {at1:.1f}")

# Sequence 2 analysis
gc2 = (seq2.count('G') + seq2.count('C')) / len(seq2) * 100
at2 = (seq2.count('A') + seq2.count('T')) / len(seq2) * 100
print(f"Seq2 - Length: {len(seq2)}, GC%: {gc2:.1f}, AT%: {at2:.1f}")

# Sequence 3 analysis
gc3 = (seq3.count('G') + seq3.count('C')) / len(seq3) * 100
at3 = (seq3.count('A') + seq3.count('T')) / len(seq3) * 100
print(f"Seq3 - Length: {len(seq3)}, GC%: {gc3:.1f}, AT%: {at3:.1f}")

**After: Clean version with functions**

In [None]:
def calculate_gc_content(sequence):
    """Calculate GC content percentage."""
    gc_count = sequence.count('G') + sequence.count('C')
    return (gc_count / len(sequence)) * 100

def calculate_at_content(sequence):
    """Calculate AT content percentage."""
    at_count = sequence.count('A') + sequence.count('T')
    return (at_count / len(sequence)) * 100

def analyze_sequence(sequence, name):
    """Analyze and print statistics for a DNA sequence."""
    length = len(sequence)
    gc = calculate_gc_content(sequence)
    at = calculate_at_content(sequence)
    print(f"{name} - Length: {length}, GC%: {gc:.1f}, AT%: {at:.1f}")

# Now the main code is clean and readable!
sequences = [
    ('ATCGATCG', 'Seq1'),
    ('GCGCGCGC', 'Seq2'),
    ('AAAATTTT', 'Seq3')
]

for seq, name in sequences:
    analyze_sequence(seq, name)

**Benefits of the clean version:**
- ✅ No repeated code
- ✅ Easy to understand what each function does
- ✅ Easy to test each function separately
- ✅ Easy to add new sequences
- ✅ If we fix a bug, we fix it once
- ✅ Can reuse functions in other projects!

### Practice Example 6.1: Refactor This Spaghetti Code!

**Your turn!** This code calculates protein properties, but it's messy. Refactor it using functions!

**Messy code:**

In [None]:
# Calculate properties for three proteins - MESSY!
protein1 = "MKLAVTGAGA"
protein2 = "AGCFHILMNP"
protein3 = "QRSTVWY"

# Protein 1
mass1 = len(protein1) * 110
charge1 = protein1.count('K') + protein1.count('R') - protein1.count('D') - protein1.count('E')
hydrophobic1 = protein1.count('A') + protein1.count('V') + protein1.count('L') + protein1.count('I')
print(f"Protein1: Mass={mass1}, Charge={charge1}, Hydrophobic={hydrophobic1}")

# Protein 2
mass2 = len(protein2) * 110
charge2 = protein2.count('K') + protein2.count('R') - protein2.count('D') - protein2.count('E')
hydrophobic2 = protein2.count('A') + protein2.count('V') + protein2.count('L') + protein2.count('I')
print(f"Protein2: Mass={mass2}, Charge={charge2}, Hydrophobic={hydrophobic2}")

# Protein 3
mass3 = len(protein3) * 110
charge3 = protein3.count('K') + protein3.count('R') - protein3.count('D') - protein3.count('E')
hydrophobic3 = protein3.count('A') + protein3.count('V') + protein3.count('L') + protein3.count('I')
print(f"Protein3: Mass={mass3}, Charge={charge3}, Hydrophobic={hydrophobic3}")

**YOUR TASK:**

Refactor the code above by creating these functions:
1. `calculate_mass(protein)` - returns estimated mass
2. `calculate_charge(protein)` - returns net charge
3. `count_hydrophobic(protein)` - returns count of hydrophobic amino acids
4. `analyze_protein(protein, name)` - uses the above functions and prints results

Then use a loop to analyze all three proteins cleanly!

In [None]:
# YOUR CODE HERE: Write the functions





# YOUR CODE HERE: Use a loop to analyze all proteins


### Practice Example 6.2: Bigger Refactoring Challenge

This code processes experimental data, but it's a mess! Refactor it with functions.

**Messy code:**

In [None]:
# Process three samples - VERY MESSY!
sample1_od = [0.1, 0.15, 0.22, 0.35, 0.5]
sample2_od = [0.08, 0.12, 0.18, 0.28, 0.42]
sample3_od = [0.12, 0.18, 0.26, 0.40, 0.58]

# Sample 1
avg1 = sum(sample1_od) / len(sample1_od)
growth1 = sample1_od[-1] - sample1_od[0]
doubled1 = all(sample1_od[i+1] / sample1_od[i] >= 1.3 for i in range(len(sample1_od)-1))
print(f"Sample1: Avg OD={avg1:.3f}, Growth={growth1:.2f}, Doubling={'Yes' if doubled1 else 'No'}")

# Sample 2
avg2 = sum(sample2_od) / len(sample2_od)
growth2 = sample2_od[-1] - sample2_od[0]
doubled2 = all(sample2_od[i+1] / sample2_od[i] >= 1.3 for i in range(len(sample2_od)-1))
print(f"Sample2: Avg OD={avg2:.3f}, Growth={growth2:.2f}, Doubling={'Yes' if doubled2 else 'No'}")

# Sample 3
avg3 = sum(sample3_od) / len(sample3_od)
growth3 = sample3_od[-1] - sample3_od[0]
doubled3 = all(sample3_od[i+1] / sample3_od[i] >= 1.3 for i in range(len(sample3_od)-1))
print(f"Sample3: Avg OD={avg3:.3f}, Growth={growth3:.2f}, Doubling={'Yes' if doubled3 else 'No'}")

**YOUR TASK:**

Create these functions:
1. `calculate_average(measurements)` - returns average OD
2. `calculate_growth(measurements)` - returns total growth (last - first)
3. `is_doubling(measurements)` - returns True if each step increased by 1.3x or more
4. `analyze_sample(measurements, name)` - prints analysis using above functions

**Bonus:** Add docstrings to all functions!

In [None]:
# YOUR CODE HERE: Write the functions with docstrings





# YOUR CODE HERE: Analyze all samples


---

## Section 7: Default Parameters

### Guided Example 7.1: Functions with default values

Sometimes parameters have a common value. You can set defaults!

In [None]:
def calculate_dilution(initial_conc, dilution_factor=10):
    """Calculate diluted concentration.
    
    Parameters:
        initial_conc: Starting concentration
        dilution_factor: How much to dilute (default: 10)
    
    Returns:
        Final concentration after dilution
    """
    return initial_conc / dilution_factor

# Use the default (10x dilution)
print(calculate_dilution(100))  # 10.0

# Override the default
print(calculate_dilution(100, 5))  # 20.0
print(calculate_dilution(100, dilution_factor=20))  # 5.0 (named argument)

**Key points:**
- Default parameter: `dilution_factor=10`
- If you don't provide it, it uses 10
- You can override by providing a different value
- Can use named arguments for clarity: `dilution_factor=20`

### Practice Example 7.1: Function with defaults

Write a function `calculate_pcr_yield()` that:
- Takes `initial_copies` (required)
- Takes `cycles` (default: 30)
- Assumes each cycle doubles the DNA: `initial_copies * (2 ** cycles)`
- Returns the final number of copies

In [None]:
# YOUR CODE HERE: Write the function


# Test it:
# print(calculate_pcr_yield(100))  # Should use 30 cycles
# print(calculate_pcr_yield(100, 20))  # Should use 20 cycles

---

## Section 8: Challenge Problems

### Challenge 1: DNA Translation Function (Medium)

Write a function that translates DNA to amino acids using this simple codon table:
- ATG → M (Start)
- TGG → W
- TAA, TAG, TGA → * (Stop)
- Everything else → X (Unknown)

The function should:
- Take a DNA sequence
- Split it into codons (groups of 3)
- Translate each codon
- Return the amino acid sequence as a string
- Include a good docstring!

In [None]:
# YOUR CODE HERE


# Test it:
# print(translate_dna('ATGTGGTAG'))  # Should print: MW*

### Challenge 2: Quality Control Suite (Hard)

Create a complete quality control suite for DNA sequences. Write these functions:

1. `is_valid_dna(sequence)` - returns True if sequence only contains A, T, G, C
2. `has_ambiguous_bases(sequence)` - returns True if contains N or other ambiguous bases
3. `check_length(sequence, min_length=50, max_length=1000)` - returns True if length is in range
4. `gc_in_range(sequence, min_gc=40, max_gc=60)` - returns True if GC% is in range
5. `quality_control(sequence)` - uses ALL above functions and prints a report

**Bonus:** Add detailed docstrings to all functions!

In [None]:
# YOUR CODE HERE: Write all 5 functions





# Test with different sequences:
# quality_control('ATCGATCG')  # Too short
# quality_control('ATCGNTCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG')  # Has N
# quality_control('ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG')  # Should pass all

### Challenge 3: Refactor This Nightmare! (Hard)

This code calculates enzyme kinetics for multiple enzymes, but it's terrible! Refactor it completely.

**The nightmare:**

In [None]:
# Enzyme kinetics analysis - NIGHTMARE CODE!
e1_name = "Amylase"
e1_vmax = 100
e1_km = 5.0
e1_substrate = 10.0
e1_velocity = (e1_vmax * e1_substrate) / (e1_km + e1_substrate)
e1_efficiency = e1_vmax / e1_km
e1_saturation = (e1_substrate / e1_km) * 100
print(f"{e1_name}: V={e1_velocity:.2f}, Efficiency={e1_efficiency:.2f}, Saturation={e1_saturation:.1f}%")

e2_name = "Lipase"
e2_vmax = 150
e2_km = 8.0
e2_substrate = 20.0
e2_velocity = (e2_vmax * e2_substrate) / (e2_km + e2_substrate)
e2_efficiency = e2_vmax / e2_km
e2_saturation = (e2_substrate / e2_km) * 100
print(f"{e2_name}: V={e2_velocity:.2f}, Efficiency={e2_efficiency:.2f}, Saturation={e2_saturation:.1f}%")

e3_name = "Protease"
e3_vmax = 80
e3_km = 3.0
e3_substrate = 15.0
e3_velocity = (e3_vmax * e3_substrate) / (e3_km + e3_substrate)
e3_efficiency = e3_vmax / e3_km
e3_saturation = (e3_substrate / e3_km) * 100
print(f"{e3_name}: V={e3_velocity:.2f}, Efficiency={e3_efficiency:.2f}, Saturation={e3_saturation:.1f}%")

**YOUR TASK:**

Refactor this into clean functions. Suggested functions:
1. `michaelis_menten(vmax, km, substrate)` - calculates velocity
2. `calculate_efficiency(vmax, km)` - calculates catalytic efficiency
3. `calculate_saturation(substrate, km)` - calculates % saturation
4. `analyze_enzyme(name, vmax, km, substrate)` - complete analysis

Then store enzyme data in a list of tuples and loop through them!

**Bonus:** Add detailed docstrings with the actual formulas!

In [None]:
# YOUR CODE HERE: Refactor completely!





# YOUR CODE HERE: Create clean loop to analyze all enzymes


---

## Summary

Congratulations! You've learned how to write clean, reusable code with functions!

**Key concepts:**
- ✅ Functions eliminate repeated code
- ✅ Use `def` to define functions
- ✅ Parameters are inputs, arguments are actual values
- ✅ `return` sends values back (different from `print`!)
- ✅ Docstrings document what functions do
- ✅ Refactoring turns spaghetti code into clean, organized functions
- ✅ Default parameters make functions more flexible

**Good function habits:**
- 📝 Use descriptive names: `calculate_gc_content` not `calc`
- 📝 One function = one clear purpose
- 📝 Always include docstrings
- 📝 Return values instead of printing (usually)
- 📝 Keep functions short and focused

**Next steps:**
- Practice refactoring your own messy code
- Write functions for common tasks you do often
- Build a library of reusable biological functions!

Remember: **Good programmers are lazy** - they write functions once and reuse them everywhere! 🚀