# Lecture 2, Notebook 4: Debugging Biological Code

**Learning Objectives:**
- Identify and fix common syntax errors in biological code
- Debug type errors when working with biological data
- Resolve name errors and undefined variables
- Fix logic errors in biological calculations
- Use debugging strategies to solve problems systematically
- Apply debugging skills to lists, loops, conditionals, and dictionaries

## 🐛 Debug Challenge 1: DNA Base Counter (Syntax Error)

**Biology Context:** Count the frequency of each DNA base in a sequence.

**The Problem:** This code has a syntax error. Can you spot and fix it?

In [None]:
# Fix the syntax error in this DNA base counter
dna = "ATGGCATCGATC"
base_count = {'A': 0, 'T': 0, 'G': 0, 'C': 0}
for base in dna
    base_count[base] += 1
print(base_count)

## 🐛 Debug Challenge 2: Plant Height Measurements (Type Error)

**Biology Context:** Calculate average plant height from mixed data types.

**The Problem:** This code tries to mix incompatible data types.

In [None]:
# Fix the type error in this plant height calculator
plant_heights = [15.2, 12.8, "14.5", 16.1, 13.7]
total_height = 0
for height in plant_heights:
    total_height += height
average = total_height / len(plant_heights)
print(f"Average height: {average} cm")

## 🐛 Debug Challenge 3: Enzyme Activity Calculator (Name Error)

**Biology Context:** Calculate enzyme activity rate from substrate concentration.

**The Problem:** This code references an undefined variable.

In [None]:
# Fix the name error in this enzyme calculator
substrate_conc = 0.5  # mM
max_velocity = 10.0   # μmol/min
activity = (max_velocity * substrate_conc) / (km + substrate_conc)
print(f"Enzyme activity: {activity} μmol/min")

## 🐛 Debug Challenge 4: Gene Expression Analysis (Index Error)

**Biology Context:** Analyze gene expression levels across different conditions.

**The Problem:** This code tries to access a list element that doesn't exist.

In [None]:
# Fix the index error in this gene expression analyzer
gene_expression = [2.1, 3.5, 1.8, 4.2]
conditions = ["control", "heat_stress", "drought", "cold"]
for i in range(len(conditions) + 1):
    print(f"{conditions[i]}: {gene_expression[i]} fold change")

## 🐛 Debug Challenge 5: Cell Division Counter (Logic Error)

**Biology Context:** Count bacterial cell divisions over time.

**The Problem:** The logic is wrong - bacteria divide exponentially, not linearly!

In [None]:
# Fix the logic error in this cell division counter
initial_cells = 100
divisions = 3
# Each division doubles the cell count
final_cells = initial_cells + (divisions * 2)
print(f"After {divisions} divisions: {final_cells} cells")

## 🐛 Debug Challenge 6: Protein Molecular Weight (Dictionary Error)

**Biology Context:** Calculate protein molecular weight from amino acid composition.

**The Problem:** This code tries to access a dictionary key that doesn't exist.

In [None]:
# Fix the dictionary error in this molecular weight calculator
amino_weights = {'A': 89, 'R': 174, 'N': 132, 'D': 133}
protein_sequence = "ARNDX"  # X is an unknown amino acid
total_weight = 0
for amino_acid in protein_sequence:
    total_weight += amino_weights[amino_acid]
print(f"Protein weight: {total_weight} Da")

## 🐛 Debug Challenge 7: Phylogenetic Distance (Division by Zero)

**Biology Context:** Calculate genetic distance between species.

**The Problem:** This code divides by zero when sequences are identical.

In [None]:
# Fix the division by zero error in this distance calculator
species_a = "ATGC"
species_b = "ATGC"  # Same sequence!
differences = sum(a != b for a, b in zip(species_a, species_b))
distance = differences / differences  # Should be differences/total_positions
print(f"Genetic distance: {distance}")

## 🐛 Debug Challenge 8: Population Growth Model (Indentation Error)

**Biology Context:** Model exponential population growth in ecology.

**The Problem:** Python indentation is incorrect.

In [None]:
# Fix the indentation error in this population model
population = 50
growth_rate = 0.1
years = 5
for year in range(years):
population *= (1 + growth_rate)
    print(f"Year {year + 1}: {population:.0f} individuals")

## 🐛 Debug Challenge 9: Codon Translation (Loop Logic Error)

**Biology Context:** Translate DNA codons to amino acids.

**The Problem:** The loop doesn't process codons correctly.

In [None]:
# Fix the loop logic error in this codon translator
genetic_code = {'ATG': 'M', 'TGG': 'W', 'TTT': 'F', 'TAA': '*'}
dna_sequence = "ATGTGGTTTTAA"
protein = ""
for i in range(len(dna_sequence)):
    codon = dna_sequence[i:i+3]
    if codon in genetic_code:
        protein += genetic_code[codon]
print(f"Protein: {protein}")

## 🐛 Debug Challenge 10: Biodiversity Index (Conditional Logic Error)

**Biology Context:** Calculate Simpson's diversity index for species abundance.

**The Problem:** The conditional logic is backwards!

In [None]:
# Fix the conditional logic error in this diversity calculator
species_counts = [15, 8, 12, 5, 20]
total_organisms = sum(species_counts)
diversity_sum = 0
for count in species_counts:
    if count > 0:  # Only count species that are absent
        proportion = count / total_organisms
        diversity_sum += proportion ** 2
simpson_index = 1 - diversity_sum
print(f"Simpson's diversity index: {simpson_index:.3f}")

## 💡 Debugging Tips Summary

**When debugging biological code:**

1. **Read error messages carefully** - they tell you the line number and error type
2. **Check your syntax** - missing colons, parentheses, or quotes are common
3. **Verify data types** - make sure you're not mixing strings and numbers
4. **Watch for typos** - variable and function names must be spelled exactly right
5. **Test with simple data** - use small examples to verify your logic
6. **Print intermediate values** - see what your variables contain at each step
7. **Think about edge cases** - what happens with empty lists or zero values?

**Remember:** Every bioinformatics expert has spent countless hours debugging code. It's a normal part of programming!

## 🏆 Debugging Challenge Solutions

Try to solve the challenges above before looking at these solutions!

In [None]:
# Solution 1: Add missing colon after for loop
dna = "ATGGCATCGATC"
base_count = {'A': 0, 'T': 0, 'G': 0, 'C': 0}
for base in dna:  # Missing colon was here
    base_count[base] += 1
print(base_count)

In [None]:
# Solution 2: Convert string to float
plant_heights = [15.2, 12.8, "14.5", 16.1, 13.7]
total_height = 0
for height in plant_heights:
    total_height += float(height)  # Convert to float
average = total_height / len(plant_heights)
print(f"Average height: {average} cm")

In [None]:
# Solution 3: Define missing variable km
substrate_conc = 0.5  # mM
max_velocity = 10.0   # μmol/min
km = 1.0  # Missing Michaelis constant
activity = (max_velocity * substrate_conc) / (km + substrate_conc)
print(f"Enzyme activity: {activity} μmol/min")

In [None]:
# Solution 4: Fix range to not go beyond list length
gene_expression = [2.1, 3.5, 1.8, 4.2]
conditions = ["control", "heat_stress", "drought", "cold"]
for i in range(len(conditions)):  # Remove + 1
    print(f"{conditions[i]}: {gene_expression[i]} fold change")

In [None]:
# Solution 5: Use exponential growth (multiply, not add)
initial_cells = 100
divisions = 3
final_cells = initial_cells * (2 ** divisions)  # 2^divisions
print(f"After {divisions} divisions: {final_cells} cells")

In [None]:
# Solution 6: Use .get() for safe dictionary access
amino_weights = {'A': 89, 'R': 174, 'N': 132, 'D': 133}
protein_sequence = "ARNDX"
total_weight = 0
for amino_acid in protein_sequence:
    weight = amino_weights.get(amino_acid, 110)  # Use average weight for unknown
    total_weight += weight
print(f"Protein weight: {total_weight} Da")

In [None]:
# Solution 7: Calculate distance properly and handle identical sequences
species_a = "ATGC"
species_b = "ATGC"
differences = sum(a != b for a, b in zip(species_a, species_b))
total_positions = len(species_a)
distance = differences / total_positions if total_positions > 0 else 0
print(f"Genetic distance: {distance}")

In [None]:
# Solution 8: Fix indentation
population = 50
growth_rate = 0.1
years = 5
for year in range(years):
    population *= (1 + growth_rate)  # Proper indentation
    print(f"Year {year + 1}: {population:.0f} individuals")

In [None]:
# Solution 9: Step by 3 to read codons properly
genetic_code = {'ATG': 'M', 'TGG': 'W', 'TTT': 'F', 'TAA': '*'}
dna_sequence = "ATGTGGTTTTAA"
protein = ""
for i in range(0, len(dna_sequence), 3):  # Step by 3
    codon = dna_sequence[i:i+3]
    if len(codon) == 3 and codon in genetic_code:
        protein += genetic_code[codon]
print(f"Protein: {protein}")

In [None]:
# Solution 10: Fix conditional logic (we want species that ARE present)
species_counts = [15, 8, 12, 5, 20]
total_organisms = sum(species_counts)
diversity_sum = 0
for count in species_counts:
    if count > 0:  # Count species that ARE present
        proportion = count / total_organisms
        diversity_sum += proportion ** 2
simpson_index = 1 - diversity_sum
print(f"Simpson's diversity index: {simpson_index:.3f}")