# 🧬 Conditionals for Biological Decision Making

## Teaching Python to Make Choices

In biology, we constantly make decisions based on data:
- **Is this a start codon?** ATG = Yes, others = No
- **What type of mutation?** SNP, deletion, insertion
- **Is the sample contaminated?** Based on quality scores
- **Which treatment group?** Control vs. experimental
- **Pass quality control?** Based on multiple criteria

Python conditionals (`if`, `elif`, `else`) let your code make these decisions automatically!

## 🎯 Why Conditionals Matter in Biology

Conditional statements help you:
- **Filter data**: Keep only high-quality samples
- **Classify sequences**: Identify genes, regulatory regions, repeats
- **Make decisions**: Choose analysis parameters based on data
- **Handle errors**: Validate input and handle edge cases
- **Automate logic**: Replace manual decision-making with code

## 1️⃣ Basic if Statements - Single Decisions

The simplest conditional: do something only if a condition is True.

In [None]:
# Check if sequence has a start codon
dna_sequence = "ATGCGTAAATAG"

if "ATG" in dna_sequence:
    print("✓ Start codon found!")

if len(dna_sequence) > 10:
    print("✓ Sequence is long enough for analysis")

if dna_sequence.startswith("ATG"):
    print("✓ Sequence starts with ATG")

In [None]:
# Quality control for sequencing reads
read_length = 150
quality_score = 35
gc_content = 45.2

print("Quality Control Check:")
print("-" * 20)

if read_length >= 100:
    print("✓ Read length acceptable")

if quality_score > 30:
    print("✓ High quality score")

if 40 <= gc_content <= 60:
    print("✓ GC content in normal range")

## 2️⃣ if-else Statements - Binary Decisions

When you need to handle both True and False cases.

In [None]:
# Determine if codon is start or stop
codon = "ATG"

if codon == "ATG":
    codon_type = "Start codon"
    amino_acid = "Methionine (M)"
else:
    codon_type = "Not a start codon"
    amino_acid = "Unknown"

print(f"Codon: {codon}")
print(f"Type: {codon_type}")
print(f"Amino acid: {amino_acid}")

In [None]:
# Sample classification
temperature = 37.8

if temperature > 37.5:
    status = "FEVER - Check sample"
    urgent = True
else:
    status = "Normal temperature"
    urgent = False

print(f"Temperature: {temperature}°C")
print(f"Status: {status}")
print(f"Urgent: {urgent}")

## 3️⃣ elif Chains - Multiple Choices

When you have more than two options to choose from.

In [None]:
# Classify stop codons
codon = "TAG"

if codon == "TAA":
    stop_type = "Amber stop codon"
elif codon == "TAG":
    stop_type = "Opal stop codon"
elif codon == "TGA":
    stop_type = "Ochre stop codon"
else:
    stop_type = "Not a stop codon"

print(f"Codon: {codon}")
print(f"Classification: {stop_type}")

In [None]:
# pH classification for biological systems
ph_value = 6.8

if ph_value < 6.5:
    ph_category = "Acidic"
    recommendation = "May affect enzyme activity"
elif ph_value <= 7.5:
    ph_category = "Physiological"
    recommendation = "Optimal for most biological processes"
elif ph_value <= 8.5:
    ph_category = "Slightly alkaline"
    recommendation = "Check buffer system"
else:
    ph_category = "Highly alkaline"
    recommendation = "Likely experimental error"

print(f"pH: {ph_value}")
print(f"Category: {ph_category}")
print(f"Recommendation: {recommendation}")

## 4️⃣ Comparison Operators

Essential tools for building conditions.

In [None]:
# All comparison operators
concentration = 25.0
target = 20.0

print("Comparison Operators:")
print(f"concentration = {concentration}")
print(f"target = {target}")
print()

print(f"Equal to (==):           {concentration == target}")
print(f"Not equal to (!=):       {concentration != target}")
print(f"Greater than (>):        {concentration > target}")
print(f"Greater than/equal (>=): {concentration >= target}")
print(f"Less than (<):           {concentration < target}")
print(f"Less than/equal (<=):    {concentration <= target}")

In [None]:
# String comparisons
sequence1 = "ATCG"
sequence2 = "ATCG"
sequence3 = "atcg"

print("String Comparisons:")
print(f"'{sequence1}' == '{sequence2}': {sequence1 == sequence2}")
print(f"'{sequence1}' == '{sequence3}': {sequence1 == sequence3}")
print(f"'{sequence1}' == '{sequence3.upper()}': {sequence1 == sequence3.upper()}")

# Membership testing
print(f"\n'ATG' in '{sequence1}': {'ATG' in sequence1}")
print(f"'{sequence1}' startswith 'AT': {sequence1.startswith('AT')}")
print(f"'{sequence1}' endswith 'CG': {sequence1.endswith('CG')}")

## 5️⃣ Logical Operators - Combining Conditions

Use `and`, `or`, and `not` to combine multiple conditions.

In [None]:
# Quality control with multiple criteria
read_length = 150
quality_score = 35
gc_content = 45.0

# AND: All conditions must be True
high_quality = (read_length >= 100 and 
                quality_score > 30 and 
                40 <= gc_content <= 60)

print("Quality Control Results:")
print(f"Read length >= 100: {read_length >= 100}")
print(f"Quality score > 30: {quality_score > 30}")
print(f"GC content 40-60%: {40 <= gc_content <= 60}")
print(f"Overall high quality: {high_quality}")

if high_quality:
    print("✓ Sample passes quality control")
else:
    print("✗ Sample fails quality control")

In [None]:
# OR: At least one condition must be True
sequence = "TAGCGTAAA"

# Check if sequence has any stop codon
has_stop_codon = ("TAA" in sequence or 
                  "TAG" in sequence or 
                  "TGA" in sequence)

print(f"Sequence: {sequence}")
print(f"Contains TAA: {'TAA' in sequence}")
print(f"Contains TAG: {'TAG' in sequence}")
print(f"Contains TGA: {'TGA' in sequence}")
print(f"Has stop codon: {has_stop_codon}")

if has_stop_codon:
    print("✓ Translation would stop")
else:
    print("→ Translation continues")

In [None]:
# NOT: Reverse the condition
sequence = "GCGCGCGC"

has_start_codon = "ATG" in sequence
lacks_start_codon = not has_start_codon

print(f"Sequence: {sequence}")
print(f"Has start codon: {has_start_codon}")
print(f"Lacks start codon: {lacks_start_codon}")

if not has_start_codon:
    print("⚠️ No start codon found - not a protein-coding sequence")

## 6️⃣ Nested Conditionals - Complex Decision Trees

Sometimes you need conditionals inside other conditionals.

In [None]:
# Sequence analysis with nested conditions
sequence = "ATGCGTAAATAG"
min_length = 10

print(f"Analyzing sequence: {sequence}")
print(f"Length: {len(sequence)} bp")

if len(sequence) >= min_length:
    print("✓ Sequence meets minimum length")
    
    if sequence.startswith("ATG"):
        print("✓ Starts with ATG (start codon)")
        
        if "TAA" in sequence or "TAG" in sequence or "TGA" in sequence:
            print("✓ Contains stop codon")
            print("→ This could be a complete ORF!")
        else:
            print("⚠️ No stop codon found")
            print("→ Incomplete ORF or very long gene")
    else:
        print("✗ Does not start with ATG")
        print("→ Not a standard protein-coding sequence")
else:
    print(f"✗ Sequence too short (< {min_length} bp)")
    print("→ Cannot analyze for ORFs")

## 7️⃣ Real-World Example: Codon Analysis

Comprehensive analysis of codons using conditionals.

In [None]:
def analyze_codon(codon):
    """Analyze a codon and return its properties"""
    
    # Validate input
    if len(codon) != 3:
        return "Error: Codon must be exactly 3 bases"
    
    # Convert to uppercase for comparison
    codon = codon.upper()
    
    # Check for invalid bases
    valid_bases = set('ATCG')
    if not all(base in valid_bases for base in codon):
        return "Error: Codon contains invalid bases"
    
    # Analyze codon type
    if codon == "ATG":
        result = {
            'codon': codon,
            'type': 'Start codon',
            'amino_acid': 'Methionine (M)',
            'function': 'Initiates protein synthesis'
        }
    elif codon in ['TAA', 'TAG', 'TGA']:
        stop_names = {'TAA': 'Amber', 'TAG': 'Opal', 'TGA': 'Ochre'}
        result = {
            'codon': codon,
            'type': f'{stop_names[codon]} stop codon',
            'amino_acid': 'Stop (*)',
            'function': 'Terminates protein synthesis'
        }
    else:
        # Simplified amino acid assignment
        result = {
            'codon': codon,
            'type': 'Sense codon',
            'amino_acid': 'Various (see genetic code)',
            'function': 'Codes for amino acid'
        }
    
    return result

# Test different codons
test_codons = ['ATG', 'TAG', 'TGA', 'AAA', 'GCG', 'XYZ', 'AT']

for codon in test_codons:
    result = analyze_codon(codon)
    print(f"\nCodon: {codon}")
    
    if isinstance(result, dict):
        for key, value in result.items():
            print(f"  {key.title()}: {value}")
    else:
        print(f"  {result}")

## 8️⃣ Practical Example: Sequence Quality Filter

Filter sequences based on multiple quality criteria.

In [None]:
def quality_filter(sequence, min_length=50, max_n_percent=5, gc_range=(30, 70)):
    """Filter sequences based on quality criteria"""
    
    print(f"Analyzing: {sequence[:20]}{'...' if len(sequence) > 20 else ''}")
    print(f"Length: {len(sequence)} bp")
    
    # Initialize results
    passed = True
    reasons = []
    
    # Check length
    if len(sequence) < min_length:
        passed = False
        reasons.append(f"Too short (< {min_length} bp)")
    else:
        print(f"✓ Length check passed")
    
    # Check N content
    n_count = sequence.upper().count('N')
    n_percent = (n_count / len(sequence)) * 100 if sequence else 0
    
    if n_percent > max_n_percent:
        passed = False
        reasons.append(f"Too many Ns ({n_percent:.1f}% > {max_n_percent}%)")
    else:
        print(f"✓ N content check passed ({n_percent:.1f}%)")
    
    # Check GC content
    sequence_upper = sequence.upper()
    gc_count = sequence_upper.count('G') + sequence_upper.count('C')
    valid_bases = len(sequence) - n_count  # Exclude Ns from calculation
    
    if valid_bases > 0:
        gc_percent = (gc_count / valid_bases) * 100
        
        if not (gc_range[0] <= gc_percent <= gc_range[1]):
            passed = False
            reasons.append(f"GC content out of range ({gc_percent:.1f}% not in {gc_range[0]}-{gc_range[1]}%)")
        else:
            print(f"✓ GC content check passed ({gc_percent:.1f}%)")
    
    # Final result
    if passed:
        print("🎉 PASS: Sequence meets all quality criteria")
        return True
    else:
        print("❌ FAIL: " + "; ".join(reasons))
        return False

# Test sequences
test_sequences = [
    "ATCGATCGTAGCTAGCTAGCTAGCTAGCTAGCTAGCATCGATCGTAGCTAGC",  # Good
    "ATCGATCG",  # Too short
    "ATCGATCGNNNNNNNNNNTAGCTAGCTAGCTAGCTAGCTAGCTAGCATCG",  # Too many Ns
    "GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG",  # High GC
    "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"   # Low GC
]

for i, seq in enumerate(test_sequences, 1):
    print(f"\n{'='*60}")
    print(f"Test Sequence {i}:")
    quality_filter(seq)
    print()


## 🎯 Practice Exercises

### Exercise 1: Start Codon Detector

Create a function that identifies start codons and their positions.

In [None]:
def find_start_codons(sequence):
    """
    Find all start codons (ATG) in a sequence
    
    Args:
        sequence: DNA sequence to search
    
    Returns:
        List of positions where ATG occurs
    """
    positions = []
    
    # TODO: Implement start codon finding
    # Hint: Use a loop and check each position
    # Remember to check if there are enough bases left for a codon
    
    return positions

# Test cases
test_seq = "ATGCGTAAGATGTAG"
# positions = find_start_codons(test_seq)
# print(f"Sequence: {test_seq}")
# print(f"Start codons found at positions: {positions}")

### Exercise 2: pH Category Classifier

Classify pH values for biological systems.

In [None]:
def classify_ph(ph_value):
    """
    Classify pH for biological systems
    
    Categories:
    - Very acidic: pH < 3
    - Acidic: pH 3-6.5
    - Neutral/Physiological: pH 6.5-7.5
    - Basic: pH 7.5-9
    - Very basic: pH > 9
    
    Returns:
        String describing the pH category
    """
    # TODO: Implement pH classification
    # Use if-elif-else chain
    
    pass

# Test values
test_ph_values = [2.5, 5.0, 7.0, 7.4, 8.5, 10.0]

# for ph in test_ph_values:
#     category = classify_ph(ph)
#     print(f"pH {ph}: {category}")

### Exercise 3: Sequence Validator

Validate that sequences contain only appropriate characters.

In [None]:
def validate_dna_sequence(sequence):
    """
    Validate a DNA sequence
    
    Checks:
    1. Not empty
    2. Contains only A, T, C, G, N
    3. Length is multiple of 3 (for coding sequences)
    
    Returns:
        Dictionary with validation results
    """
    results = {
        'valid': True,
        'errors': []
    }
    
    # TODO: Implement validation checks
    # Check each criterion and add errors to the list
    # Set valid to False if any errors found
    
    return results

# Test sequences
test_sequences = [
    "ATCGATCG",      # Valid, not multiple of 3
    "ATCGATCGTAGC",  # Valid, multiple of 3
    "ATCGATCX",      # Invalid character
    "",              # Empty
    "ATCGATCGN"      # Contains N
]

# for seq in test_sequences:
#     result = validate_dna_sequence(seq)
#     print(f"Sequence: '{seq}'")
#     print(f"Valid: {result['valid']}")
#     if result['errors']:
#         print(f"Errors: {', '.join(result['errors'])}")
#     print()

### Exercise 4: ORF Classifier

Classify potential Open Reading Frames based on their characteristics.

In [None]:
def classify_orf(sequence):
    """
    Classify a potential ORF
    
    Classifications:
    - Complete ORF: Starts with ATG, ends with stop codon, length >= 300 bp
    - Partial ORF: Starts with ATG but no stop, or has stop but no ATG start
    - Short ORF: Complete but < 300 bp (might be peptide)
    - Not an ORF: Doesn't meet ORF criteria
    
    Returns:
        Dictionary with classification and details
    """
    result = {
        'sequence': sequence,
        'length': len(sequence),
        'classification': '',
        'details': {}
    }
    
    # TODO: Implement ORF classification
    # Check for:
    # - Starts with ATG
    # - Contains stop codon (TAA, TAG, TGA)
    # - Length considerations
    # Use nested conditionals to determine classification
    
    return result

# Test sequences
test_orfs = [
    "ATGCGTAAATAG",  # Short complete ORF
    "ATGCGTAAACGT" * 20 + "TAG",  # Long complete ORF
    "ATGCGTAAACGT" * 10,  # No stop codon
    "CGTAAACGTTAG",  # No start codon
    "CGTAAACGTCGT"   # Neither start nor stop
]

# for orf in test_orfs:
#     result = classify_orf(orf)
#     print(f"Sequence: {orf[:20]}{'...' if len(orf) > 20 else ''}")
#     print(f"Length: {result['length']} bp")
#     print(f"Classification: {result['classification']}")
#     print()

### Exercise 5: Sample Quality Control

Implement a comprehensive quality control system for biological samples.

In [None]:
def sample_qc(sample_data):
    """
    Perform quality control on biological sample data
    
    sample_data should contain:
    - 'concentration': DNA/RNA concentration (ng/μL)
    - 'purity_260_280': A260/A280 ratio
    - 'purity_260_230': A260/A230 ratio
    - 'volume': sample volume (μL)
    
    Quality criteria:
    - Concentration: >= 10 ng/μL
    - A260/A280: 1.8-2.2 (DNA) or 1.9-2.1 (RNA)
    - A260/A230: >= 2.0
    - Volume: >= 5 μL
    
    Returns:
        Dictionary with QC results and recommendations
    """
    qc_result = {
        'sample_id': sample_data.get('id', 'Unknown'),
        'passed': True,
        'warnings': [],
        'failures': [],
        'recommendations': []
    }
    
    # TODO: Implement QC logic
    # Check each parameter
    # Add warnings and failures as appropriate
    # Provide specific recommendations
    # Set passed to False if any failures
    
    return qc_result

# Test samples
samples = [
    {
        'id': 'Sample_001',
        'concentration': 25.5,
        'purity_260_280': 1.85,
        'purity_260_230': 2.15,
        'volume': 20
    },
    {
        'id': 'Sample_002', 
        'concentration': 8.2,
        'purity_260_280': 1.65,
        'purity_260_230': 1.85,
        'volume': 3
    }
]

# for sample in samples:
#     result = sample_qc(sample)
#     print(f"Sample: {result['sample_id']}")
#     print(f"Status: {'PASS' if result['passed'] else 'FAIL'}")
#     if result['warnings']:
#         print(f"Warnings: {'; '.join(result['warnings'])}")
#     if result['failures']:
#         print(f"Failures: {'; '.join(result['failures'])}")
#     if result['recommendations']:
#         print(f"Recommendations: {'; '.join(result['recommendations'])}")
#     print()

### Challenge Exercise: Comprehensive ORF Analyzer

Build a complete ORF analysis system using all conditional concepts.

In [None]:
def comprehensive_orf_analysis(sequence):
    """
    Perform comprehensive ORF analysis on a DNA sequence
    
    Analysis includes:
    1. Find all potential ORFs (ATG to stop codon)
    2. Classify each ORF by length
    3. Check for overlapping ORFs
    4. Identify the most likely coding sequence
    5. Calculate basic statistics
    
    Returns:
        Comprehensive analysis report
    """
    
    analysis = {
        'sequence_length': len(sequence),
        'total_orfs_found': 0,
        'orfs': [],
        'best_orf': None,
        'statistics': {},
        'warnings': []
    }
    
    # TODO: Implement comprehensive ORF analysis
    # This is a complex challenge that combines:
    # - Loops for finding ORFs
    # - Conditionals for classification
    # - Logic for determining the "best" ORF
    # - Statistical calculations
    
    # Hints:
    # 1. Find all ATG positions
    # 2. For each ATG, find the next stop codon in frame
    # 3. Classify ORFs by length (short <300, medium 300-1000, long >1000)
    # 4. Identify overlapping ORFs
    # 5. Choose the longest non-overlapping ORF as "best"
    
    return analysis

# Test with a complex sequence
complex_sequence = "ATGCGTAAATCGATGAAACCCTGATAGGGATGCCCAAATAG" * 5

# result = comprehensive_orf_analysis(complex_sequence)
# print(f"Sequence length: {result['sequence_length']} bp")
# print(f"ORFs found: {result['total_orfs_found']}")
# # Print detailed results...

## 🎉 Summary

You've mastered Python conditionals for biological decision making!

### Core Concepts Covered
✅ **Basic if statements**: Single condition decisions  
✅ **if-else**: Binary choice handling  
✅ **elif chains**: Multiple option selection  
✅ **Comparison operators**: ==, !=, <, >, <=, >=  
✅ **Logical operators**: and, or, not  
✅ **Nested conditionals**: Complex decision trees  
✅ **Input validation**: Error checking and handling  

### Biological Applications
✅ Sequence validation and quality control  
✅ Codon classification and analysis  
✅ pH and environmental parameter assessment  
✅ ORF identification and classification  
✅ Sample quality control systems  
✅ Multi-criteria decision making  

### 🔑 Key Takeaways

1. **Start simple**: Use basic if statements for single decisions
2. **Chain logically**: Use elif for multiple related conditions
3. **Combine conditions**: Use and/or to create complex criteria
4. **Validate input**: Always check data before processing
5. **Handle edge cases**: Think about unusual inputs
6. **Keep it readable**: Use clear variable names and comments

### 🚀 Next Steps

With conditionals mastered, you're ready to:
- Build more sophisticated **sequence analysis tools**
- Implement **quality control pipelines**
- Create **automated decision systems**
- Combine with **loops** for processing multiple sequences
- Use **functions** to organize your conditional logic

**Keep practicing!** Conditionals are the foundation of intelligent biological data processing. 🧬💻