# Getting Started with Bioinformatics Tools

This tutorial introduces the shared bioinformatics utilities in this project.

## Contents
1. Peptide Property Analysis
2. Antimicrobial Activity Prediction
3. Hemolysis Prediction
4. Protein Stability (DDG) Prediction
5. Primer Design
6. Uncertainty Quantification

In [None]:
# Setup path
import sys
from pathlib import Path

# Add deliverables to path
deliverables_path = Path.cwd().parent
sys.path.insert(0, str(deliverables_path))

# Verify imports work
from shared import (
    compute_peptide_properties,
    compute_ml_features,
    validate_sequence,
    HemolysisPredictor,
    PrimerDesigner,
)
print("Imports successful!")

## 1. Peptide Property Analysis

Compute biophysical properties of peptide sequences.

In [None]:
from shared import compute_peptide_properties, validate_sequence

# Example: Magainin 2 (well-known antimicrobial peptide)
magainin = "GIGKFLHSAKKFGKAFVGEIMNS"

# Validate sequence
is_valid, error = validate_sequence(magainin)
print(f"Sequence valid: {is_valid}")

# Compute properties
props = compute_peptide_properties(magainin)
print(f"\nMagainin 2 properties:")
print(f"  Length: {props['length']} amino acids")
print(f"  Net charge: {props['net_charge']:+.1f}")
print(f"  Hydrophobicity: {props['hydrophobicity']:.2f}")
print(f"  Hydrophobic ratio: {props['hydrophobic_ratio']:.1%}")
print(f"  Cationic ratio: {props['cationic_ratio']:.1%}")

In [None]:
from shared import compute_physicochemical_descriptors

# Extended properties
ext_props = compute_physicochemical_descriptors(magainin)
print("Extended properties:")
print(f"  Aromaticity: {ext_props['aromaticity']:.2%}")
print(f"  Aliphatic index: {ext_props['aliphatic_index']:.1f}")
print(f"  Polar ratio: {ext_props['polar_ratio']:.2%}")

## 2. Compare Multiple Peptides

In [None]:
peptides = {
    "Magainin 2": "GIGKFLHSAKKFGKAFVGEIMNS",
    "Melittin": "GIGAVLKVLTTGLPALISWIKRKRQQ",
    "LL-37": "LLGDFFRKSKEKIGKEFKRIVQRIKDFLRNLVPRTES",
    "Indolicidin": "ILPWKWPWWPWRR",
}

print(f"{'Peptide':<15} {'Length':<8} {'Charge':<8} {'Hydro':<8} {'Cationic%':<10}")
print("-" * 55)

for name, seq in peptides.items():
    props = compute_peptide_properties(seq)
    print(f"{name:<15} {props['length']:<8} {props['net_charge']:<+8.1f} {props['hydrophobicity']:<8.2f} {props['cationic_ratio']:<10.1%}")

## 3. Hemolysis Prediction

Predict hemolytic activity (toxicity to red blood cells) of peptides.

In [None]:
from shared import HemolysisPredictor

predictor = HemolysisPredictor()

print(f"{'Peptide':<15} {'HC50 (uM)':<12} {'Risk':<10} {'Probability':<12}")
print("-" * 55)

for name, seq in peptides.items():
    result = predictor.predict(seq)
    print(f"{name:<15} {result['hc50_predicted']:<12.1f} {result['risk_category']:<10} {result['hemolytic_probability']:<12.2f}")

In [None]:
# Compute therapeutic index for a peptide
magainin_ti = predictor.compute_therapeutic_index(
    "GIGKFLHSAKKFGKAFVGEIMNS",
    mic_value=10.0  # MIC against E. coli in uM
)

print("Therapeutic Index Analysis for Magainin 2:")
print(f"  HC50 (toxicity): {magainin_ti['hc50']:.1f} uM")
print(f"  MIC (activity): {magainin_ti['mic']:.1f} uM")
print(f"  Therapeutic Index: {magainin_ti['therapeutic_index']:.1f}")
print(f"  Interpretation: {magainin_ti['interpretation']}")

## 4. Primer Design

Design PCR primers for cloning peptide sequences.

In [None]:
from shared import PrimerDesigner

designer = PrimerDesigner()

# Design primers for a peptide
peptide = "GIGKFLHSAKKFGKAFVGEIMNS"
primers = designer.design_for_peptide(
    peptide,
    codon_optimization="ecoli",
    add_start_codon=True,
    add_stop_codon=True,
)

print(f"Primers for expressing {peptide}:")
print(f"\nForward primer: 5'-{primers.forward}-3'")
print(f"  Tm: {primers.forward_tm:.1f}C")
print(f"  GC: {primers.forward_gc:.1f}%")
print(f"\nReverse primer: 5'-{primers.reverse}-3'")
print(f"  Tm: {primers.reverse_tm:.1f}C")
print(f"  GC: {primers.reverse_gc:.1f}%")
print(f"\nExpected product size: {primers.product_size} bp")

In [None]:
# Show the DNA sequence
dna = designer.peptide_to_dna(peptide, codon_optimization="ecoli")
print(f"\nDNA sequence (E. coli optimized):")
print(f"5'-ATG{dna}TAA-3'")
print(f"\nLength: {len(dna) + 6} bp (including start/stop codons)")

## 5. ML Features for Predictions

In [None]:
from shared import compute_ml_features, compute_amino_acid_composition
import numpy as np

# Get ML feature vector
features = compute_ml_features(magainin)
print(f"ML feature vector shape: {features.shape}")
print(f"\nFirst 5 features (length, charge, hydro, hydro_ratio, cationic_ratio):")
print(f"  {features[:5].round(3)}")

# Amino acid composition
aa_comp = compute_amino_acid_composition(magainin)
print(f"\nAmino acid composition (top 5):")
aa_order = "ACDEFGHIKLMNPQRSTVWY"
sorted_idx = np.argsort(aa_comp)[::-1]
for i in sorted_idx[:5]:
    if aa_comp[i] > 0:
        print(f"  {aa_order[i]}: {aa_comp[i]:.1%}")

## 6. Using the Logging Framework

In [None]:
from shared import get_logger, setup_logging

# Setup logging
setup_logging(level="INFO", use_colors=False)

# Get logger for this notebook
logger = get_logger("tutorial")

# Log analysis results
logger.info("Starting peptide analysis")
logger.prediction("HC50", 105.4, confidence=0.85, peptide="Magainin 2")
logger.model_metrics("activity_predictor", {"rmse": 0.35, "r": 0.85})
logger.info("Analysis complete")

## Summary

This tutorial covered:

1. **Peptide Properties**: Computing charge, hydrophobicity, and other biophysical properties
2. **Hemolysis Prediction**: Predicting toxicity (HC50) and therapeutic index
3. **Primer Design**: Designing PCR primers for peptide cloning
4. **ML Features**: Generating feature vectors for machine learning
5. **Logging**: Using the standardized logging framework

For more advanced usage, see:
- `02_antimicrobial_activity.ipynb` - Training activity predictors
- `03_protein_stability.ipynb` - DDG prediction
- `04_vae_integration.ipynb` - Using the VAE for sequence generation