# Looping and Appending Data: Building Collections

When working with biological data, you often need to:
- Process multiple samples one by one
- Build up results lists gradually
- Collect data from multiple sources
- Accumulate measurements over time

This notebook teaches you how to combine **for loops** with **data appending** to build collections of results - essential for preparing data that will be written to files.

## Review: Lists and For Loops

Let's quickly review what we know:

In [None]:
# Review: Basic lists and loops
sample_names = ["Control", "Treatment_A", "Treatment_B", "Treatment_C"]
concentrations = [0, 5, 10, 25]

print("Sample processing:")
for i in range(len(sample_names)):
    name = sample_names[i]
    conc = concentrations[i]
    print(f"Processing {name} at {conc} μM")

## The Power of Appending: Building Results

Instead of just printing results, let's **collect them** for later use:

In [None]:
# Start with empty lists to collect results
processed_samples = []
calculated_values = []
status_messages = []

# Process samples and collect results
sample_names = ["Control", "Treatment_A", "Treatment_B", "Treatment_C"]
raw_values = [100, 87, 65, 42]

for i in range(len(sample_names)):
    name = sample_names[i]
    value = raw_values[i]
    
    # Calculate percentage relative to control
    percentage = (value / 100) * 100
    
    # Determine status
    if percentage > 80:
        status = "No effect"
    elif percentage > 50:
        status = "Moderate effect"
    else:
        status = "Strong effect"
    
    # Append results to our collection lists
    processed_samples.append(name)
    calculated_values.append(percentage)
    status_messages.append(status)
    
    print(f"{name}: {value} → {percentage:.1f}% ({status})")

print(f"\nCollected {len(processed_samples)} results!")
print(f"Samples: {processed_samples}")
print(f"Values: {calculated_values}")
print(f"Status: {status_messages}")

## Appending Complex Data: Lists of Lists

For more complex data, append entire rows of information:

In [None]:
# Collect complete experimental results
experiment_results = []  # This will hold lists of [sample, concentration, response, category]

# Sample data
samples = ["Control", "Drug_A", "Drug_A", "Drug_A", "Drug_B", "Drug_B", "Drug_B"]
concentrations = [0, 1, 5, 10, 1, 5, 10]
responses = [100, 95, 78, 45, 88, 69, 52]

print("Building experiment results table:")
print("-" * 50)

for i in range(len(samples)):
    sample = samples[i]
    conc = concentrations[i]
    response = responses[i]
    
    # Categorize response
    if response >= 80:
        category = "Minimal_inhibition"
    elif response >= 60:
        category = "Moderate_inhibition"
    else:
        category = "Strong_inhibition"
    
    # Create a complete result row
    result_row = [sample, conc, response, category]
    
    # Append the entire row
    experiment_results.append(result_row)
    
    print(f"{sample:10} {conc:2d}μM: {response:3d}% → {category}")

print(f"\nCollected {len(experiment_results)} complete results:")
for i, result in enumerate(experiment_results):
    print(f"Row {i+1}: {result}")

## Exercise 1: Building Gene Expression Results

Practice appending data by analyzing gene expression changes:

In [None]:
# Gene expression data
genes = ["BRCA1", "TP53", "MYC", "EGFR", "PTEN", "RB1"]
control_expression = [45.2, 78.9, 156.3, 67.8, 234.5, 123.4]
treatment_expression = [123.7, 234.1, 89.4, 145.2, 89.7, 267.8]

# YOUR TASK: Create lists to collect results
gene_names = []  # Collect gene names
fold_changes = []  # Collect calculated fold changes
regulation_status = []  # Collect up/down/unchanged status
complete_results = []  # Collect complete result rows

print("Gene Expression Analysis:")
print("=" * 40)

# YOUR CODE HERE: Loop through genes and:
# 1. Calculate fold change (treatment/control)
# 2. Determine regulation status:
#    - >2.0 = "Upregulated"
#    - <0.5 = "Downregulated" 
#    - else = "Unchanged"
# 3. Append results to your collection lists
# 4. Also append complete result row: [gene, control, treatment, fold_change, status]

for i in range(len(genes)):
    # YOUR CODE HERE
    pass

# Print summary
print(f"\nProcessed {len(gene_names)} genes")
print(f"Complete results collected: {len(complete_results)} rows")

## Using Functions with Looping and Appending

Combine functions with data collection for cleaner code:

In [None]:
# Our stock solution function from earlier lectures
def calculate_volume(molecular_weight, mass_mg, concentration_mM):
    """Calculate solvent volume needed for stock solution."""
    return mass_mg / (molecular_weight * concentration_mM)

def analyze_reagent(name, mw, mass, target_conc):
    """Analyze a reagent and return complete results."""
    volume = calculate_volume(mw, mass, target_conc)
    
    # Determine solvent based on compound properties
    if "water" in name.lower() or "salt" in name.lower():
        solvent = "Water"
    else:
        solvent = "DMSO"
    
    # Check if volume is reasonable
    if volume < 0.1:
        warning = "Very concentrated - check calculation"
    elif volume > 100:
        warning = "Very dilute - consider higher concentration"
    else:
        warning = "OK"
    
    return {
        'name': name,
        'mw': mw,
        'mass': mass,
        'volume': round(volume, 2),
        'solvent': solvent,
        'warning': warning
    }

# Reagent data
reagent_data = [
    ["MG132", 475.6, 89.5],
    ["Rapamycin", 914.2, 125.3],
    ["Cycloheximide", 281.4, 45.8],
    ["NaCl", 58.4, 15.0],
    ["Staurosporine", 466.5, 78.2]
]

# Collect results using functions
all_results = []
warnings_found = []

print("Stock Solution Analysis:")
print("=" * 60)
print(f"{'Reagent':<15} {'Volume':<8} {'Solvent':<8} {'Warning':<30}")
print("-" * 60)

for reagent in reagent_data:
    name, mw, mass = reagent
    
    # Analyze this reagent
    result = analyze_reagent(name, mw, mass, 10)  # 10mM target
    
    # Collect results
    all_results.append(result)
    
    # Collect warnings separately
    if result['warning'] != "OK":
        warnings_found.append([name, result['warning']])
    
    # Display results
    print(f"{result['name']:<15} {result['volume']:<8} {result['solvent']:<8} {result['warning']:<30}")

print(f"\nSummary:")
print(f"Total reagents processed: {len(all_results)}")
print(f"Warnings issued: {len(warnings_found)}")

if warnings_found:
    print("\nReagents requiring attention:")
    for name, warning in warnings_found:
        print(f"  {name}: {warning}")

## Exercise 2: PCR Analysis with Data Collection

Create a complete PCR analysis system:

In [None]:
def analyze_pcr_reaction(sample_name, ct_value, control_ct):
    """Analyze PCR results and return fold change."""
    # YOUR TASK: Complete this function
    # Calculate delta_ct = ct_value - control_ct
    # Calculate fold_change = 2 ** (-delta_ct)
    # Determine if significantly changed (>2-fold up or down)
    
    # YOUR CODE HERE
    delta_ct = 0  # Replace with calculation
    fold_change = 1  # Replace with calculation
    
    if fold_change > 2.0:
        significance = "Upregulated"
    elif fold_change < 0.5:
        significance = "Downregulated"
    else:
        significance = "No change"
    
    return {
        'sample': sample_name,
        'ct': ct_value,
        'delta_ct': round(delta_ct, 2),
        'fold_change': round(fold_change, 2),
        'significance': significance
    }

# PCR data: [sample_name, ct_value]
pcr_data = [
    ["Control", 25.4],
    ["Treatment_1h", 23.8],
    ["Treatment_2h", 22.1],
    ["Treatment_4h", 21.5],
    ["Treatment_8h", 27.2],
    ["Treatment_24h", 28.9]
]

control_ct = 25.4  # Use control as reference

# YOUR TASK: Collect results
pcr_results = []  # Complete analysis results
significant_changes = []  # Only samples with significant changes
upregulated_samples = []  # Only upregulated samples
downregulated_samples = []  # Only downregulated samples

print("PCR Analysis Results:")
print("-" * 70)
print(f"{'Sample':<15} {'Ct':<6} {'ΔCt':<6} {'Fold Change':<10} {'Significance'}")
print("-" * 70)

# YOUR CODE HERE: Loop through PCR data and:
# 1. Analyze each sample
# 2. Append to pcr_results
# 3. If significant, append to significant_changes
# 4. Append to upregulated_samples or downregulated_samples as appropriate

for sample_data in pcr_data:
    # YOUR CODE HERE
    pass

# Print summary statistics
print(f"\nSummary:")
print(f"Total samples: {len(pcr_results)}")
print(f"Significant changes: {len(significant_changes)}")
print(f"Upregulated: {len(upregulated_samples)}")
print(f"Downregulated: {len(downregulated_samples)}")

## Accumulating Numerical Data

Sometimes you need to accumulate totals, averages, or running calculations:

In [None]:
# Cell counting data from multiple fields
field_counts = [45, 52, 38, 61, 47, 55, 42, 58, 49, 53]

# Initialize accumulators
total_cells = 0
count_fields = 0
high_count_fields = []  # Fields with >50 cells
all_counts = []  # Store all counts for further analysis

print("Cell Counting Analysis:")
print("-" * 30)

for i, count in enumerate(field_counts):
    field_number = i + 1
    
    # Accumulate totals
    total_cells += count
    count_fields += 1
    
    # Collect all counts
    all_counts.append(count)
    
    # Track high-count fields
    if count > 50:
        high_count_fields.append([field_number, count])
    
    print(f"Field {field_number:2d}: {count:2d} cells")

# Calculate statistics
average_count = total_cells / count_fields
min_count = min(all_counts)
max_count = max(all_counts)

print(f"\nStatistics:")
print(f"Total cells counted: {total_cells}")
print(f"Average per field: {average_count:.1f}")
print(f"Range: {min_count} - {max_count} cells")
print(f"High-count fields (>50): {len(high_count_fields)}")

if high_count_fields:
    print("\nHigh-count field details:")
    for field_num, count in high_count_fields:
        print(f"  Field {field_num}: {count} cells")

## Exercise 3: Drug Dose Response Analysis

Create a complete dose-response analysis with data collection:

In [None]:
# Drug testing data: [drug_name, concentration_uM, viability_percent]
drug_data = [
    ["Control", 0, 100],
    ["Drug_A", 0.1, 98],
    ["Drug_A", 1.0, 87],
    ["Drug_A", 10.0, 65],
    ["Drug_A", 100.0, 23],
    ["Drug_B", 0.1, 95],
    ["Drug_B", 1.0, 82],
    ["Drug_B", 10.0, 71],
    ["Drug_B", 100.0, 58],
    ["Drug_C", 0.1, 89],
    ["Drug_C", 1.0, 76],
    ["Drug_C", 10.0, 45],
    ["Drug_C", 100.0, 12]
]

# YOUR TASK: Create collection lists for different analyses
complete_results = []  # All results with IC50 category
drug_summaries = {}    # Summary for each drug (we'll use this later)
toxic_concentrations = []  # Concentrations with <50% viability
effective_drugs = []   # Drugs that show significant effect

print("Drug Dose-Response Analysis:")
print("=" * 60)
print(f"{'Drug':<8} {'Conc (μM)':<10} {'Viability':<10} {'Toxicity'}")
print("-" * 60)

# YOUR CODE HERE: Process each drug test:
# 1. Determine toxicity level:
#    - >80%: "Non-toxic"
#    - 50-80%: "Moderately toxic" 
#    - <50%: "Highly toxic"
# 2. Append complete result: [drug, concentration, viability, toxicity]
# 3. If viability <50%, append to toxic_concentrations
# 4. Track which drugs show effect (any concentration <80% viability)

for drug_test in drug_data:
    drug, concentration, viability = drug_test
    
    # YOUR CODE HERE
    # Determine toxicity level
    # Append to appropriate collections
    
    # Example code structure:
    # if viability > 80:
    #     toxicity = "Non-toxic"
    # elif viability >= 50:
    #     toxicity = "Moderately toxic"
    # else:
    #     toxicity = "Highly toxic"
    
    pass

# Print summary statistics
print(f"\nSummary:")
print(f"Total tests performed: {len(complete_results)}")
print(f"Highly toxic concentrations: {len(toxic_concentrations)}")
print(f"Drugs showing effect: {len(set(effective_drugs))}")

## Building Data for File Output

The ultimate goal: prepare data in the format needed for writing to files:

In [None]:
# Example: Building data table for CSV export
def prepare_stock_solution_data():
    """Prepare complete stock solution data for file export."""
    
    # Input data
    reagents = [
        ["MG132", 475.6, 89.5],
        ["Rapamycin", 914.2, 125.3],
        ["Cycloheximide", 281.4, 45.8],
        ["Staurosporine", 466.5, 78.2],
        ["Wortmannin", 428.4, 65.0]
    ]
    
    # Prepare data for file writing
    # We'll create separate lists for each column
    names = []
    molecular_weights = []
    masses = []
    volumes = []
    solvents = []
    
    # Also create complete rows for another format
    complete_rows = []
    
    print("Preparing stock solution data:")
    print("-" * 50)
    
    for reagent in reagents:
        name, mw, mass = reagent
        
        # Calculate volume (10 mM stock)
        volume = calculate_volume(mw, mass, 10)
        
        # Determine solvent
        solvent = "DMSO"  # Default for organic compounds
        
        # Append to column lists
        names.append(name)
        molecular_weights.append(mw)
        masses.append(mass)
        volumes.append(round(volume, 2))
        solvents.append(solvent)
        
        # Append complete row
        complete_rows.append([name, mw, mass, round(volume, 2), solvent])
        
        print(f"{name:15}: {volume:6.2f} mL {solvent}")
    
    # Return both formats
    column_data = {
        'names': names,
        'molecular_weights': molecular_weights,
        'masses': masses,
        'volumes': volumes,
        'solvents': solvents
    }
    
    return column_data, complete_rows

# Prepare the data
columns, rows = prepare_stock_solution_data()

print(f"\nData prepared:")
print(f"Column format: {len(columns['names'])} entries per column")
print(f"Row format: {len(rows)} complete rows")

print(f"\nFirst few rows:")
for i, row in enumerate(rows[:3]):
    print(f"Row {i+1}: {row}")

print(f"\nColumn data sample:")
print(f"Names: {columns['names'][:3]}...")
print(f"Volumes: {columns['volumes'][:3]}...")

## Exercise 4: Complete Laboratory Data Pipeline

Create a complete data processing pipeline that prepares data for file export:

In [None]:
def process_microscopy_data(image_data):
    """Process microscopy cell counting data for export."""
    # YOUR TASK: Complete this function to:
    # 1. Process each image's cell count data
    # 2. Calculate statistics (mean, std dev if you want to challenge yourself)
    # 3. Prepare data in both column and row formats
    # 4. Return prepared data
    
    # Initialize collection lists
    image_names = []
    cell_counts = []
    cell_densities = []  # cells per unit area
    classifications = []  # low/medium/high density
    complete_rows = []
    
    # YOUR CODE HERE
    for image_info in image_data:
        # Extract: [image_name, cell_count, area_um2]
        # Calculate density = cell_count / area_um2
        # Classify: <0.1 = "Low", 0.1-0.3 = "Medium", >0.3 = "High"
        # Append to collections
        pass
    
    # Return data in format ready for file writing
    column_format = {
        'names': image_names,
        'counts': cell_counts,
        'densities': cell_densities,
        'classifications': classifications
    }
    
    return column_format, complete_rows

# Test data: [image_name, cell_count, area_um2]
microscopy_data = [
    ["Control_1.jpg", 45, 200],
    ["Control_2.jpg", 52, 200],
    ["Control_3.jpg", 38, 200],
    ["Treatment_1.jpg", 78, 200],
    ["Treatment_2.jpg", 92, 200],
    ["Treatment_3.jpg", 67, 200]
]

# Process the data
processed_columns, processed_rows = process_microscopy_data(microscopy_data)

print("Microscopy data processing complete:")
print(f"Processed {len(processed_rows)} images")
print("\nData ready for file export!")

## Common Patterns in Biological Data Processing

Here are some common patterns you'll use repeatedly:

In [None]:
# Pattern 1: Filter and collect
def filter_and_collect_pattern():
    all_samples = ["Sample_A", "Control_1", "Sample_B", "Control_2", "Sample_C"]
    
    # Collect only non-control samples
    experimental_samples = []
    control_samples = []
    
    for sample in all_samples:
        if "Control" in sample:
            control_samples.append(sample)
        else:
            experimental_samples.append(sample)
    
    return experimental_samples, control_samples

# Pattern 2: Transform and accumulate
def transform_and_accumulate_pattern():
    raw_values = [100, 87, 65, 42, 78, 55, 91]
    
    # Transform to percentages and collect statistics
    percentages = []
    total = 0
    count = 0
    
    for value in raw_values:
        percentage = (value / 100) * 100  # Already percentage, but could be different
        percentages.append(percentage)
        total += percentage
        count += 1
    
    average = total / count
    return percentages, average

# Pattern 3: Group and summarize
def group_and_summarize_pattern():
    # Data: [group, value]
    grouped_data = [
        ["Group_A", 45], ["Group_A", 52], ["Group_A", 38],
        ["Group_B", 67], ["Group_B", 71], ["Group_B", 58],
        ["Group_C", 82], ["Group_C", 89], ["Group_C", 76]
    ]
    
    # Group by category
    group_a_values = []
    group_b_values = []
    group_c_values = []
    
    for group, value in grouped_data:
        if group == "Group_A":
            group_a_values.append(value)
        elif group == "Group_B":
            group_b_values.append(value)
        elif group == "Group_C":
            group_c_values.append(value)
    
    # Calculate summaries
    summaries = {
        'Group_A': sum(group_a_values) / len(group_a_values),
        'Group_B': sum(group_b_values) / len(group_b_values),
        'Group_C': sum(group_c_values) / len(group_c_values)
    }
    
    return summaries

# Test the patterns
exp_samples, ctrl_samples = filter_and_collect_pattern()
print(f"Filter pattern: {len(exp_samples)} experimental, {len(ctrl_samples)} control")

percentages, avg = transform_and_accumulate_pattern()
print(f"Transform pattern: {len(percentages)} values, average = {avg:.1f}")

group_summary = group_and_summarize_pattern()
print(f"Group pattern: {len(group_summary)} groups summarized")
for group, average in group_summary.items():
    print(f"  {group}: {average:.1f}")

## Summary: Key Concepts

### Data Collection Patterns
- **Simple appending**: `results.append(value)`
- **Row appending**: `results.append([col1, col2, col3])`
- **Conditional collection**: Only append if condition is met
- **Multiple collections**: Separate lists for different categories

### Essential Techniques
- **Initialize empty lists** before your loop
- **Process inside loop** - calculate, transform, categorize
- **Append results** to appropriate collections
- **Summarize after loop** - totals, averages, counts

### Data Formats for File Writing
- **Column format**: Separate lists for each data column
- **Row format**: List of complete data rows
- **Mixed approach**: Collect both formats for flexibility

### Biological Applications
- **Experimental results**: Process measurements, categorize outcomes
- **Quality control**: Filter data, flag issues
- **Statistical analysis**: Calculate means, ranges, classifications
- **Data preparation**: Format for Excel/CSV export

## Next Up: Reading and Writing Files

Now that you can:
- ✅ **Find files** with paths and file system knowledge
- ✅ **Process data** with loops and build collections

You're ready to learn:
- **Reading data from CSV files** into your Python lists
- **Writing your results back to files** for sharing and storage
- **Complete automation** of laboratory data workflows

The pieces are all coming together!