# Reading and Writing Files: Connecting to Real Lab Data

This is where everything comes together! You'll learn to:
- **Read data from CSV files** (your lab instrument exports)
- **Process that data** using functions and loops
- **Write results back to files** for sharing and analysis
- **Automate your entire lab workflow**

No more manual data entry or copy-pasting results!

## Python's CSV Module

CSV (Comma Separated Values) files are the universal language of lab data:
- Excel can export to CSV
- Lab instruments often output CSV
- Python can easily read and write CSV
- Compatible with all analysis software

In [None]:
import csv
import os

# First, let's create a sample CSV file to work with
sample_data = """Reagent,MW,Mass_mg
MG132,475.6,89.5
Rapamycin,914.2,125.3
Cycloheximide,281.4,45.8
Staurosporine,466.5,78.2
Wortmannin,428.4,65.0"""

# Write this to a file
with open('reagents.csv', 'w') as file:
    file.write(sample_data)

print("Created sample file: reagents.csv")
print(f"File exists: {os.path.exists('reagents.csv')}")
print(f"File size: {os.path.getsize('reagents.csv')} bytes")

## Reading CSV Files: The Basics

Let's read our CSV file and see what's inside:

In [None]:
# Method 1: Read as raw text first (to see what we're working with)
print("Raw file contents:")
print("-" * 20)
with open('reagents.csv', 'r') as file:
    content = file.read()
    print(content)

print("\n" + "=" * 40 + "\n")

# Method 2: Read with CSV module
print("Reading with csv.reader():")
print("-" * 25)

with open('reagents.csv', 'r') as file:
    csv_reader = csv.reader(file)
    
    for row_number, row in enumerate(csv_reader):
        print(f"Row {row_number}: {row}")

## Reading CSV into Lists (Our Standard Method)

Convert CSV data into the list format we've been using:

In [None]:
# Read CSV data into separate lists (like in our lecture)
def read_reagent_data(filename):
    """Read reagent data from CSV file into separate lists."""
    names = []
    molecular_weights = []
    masses = []
    
    with open(filename, 'r') as file:
        csv_reader = csv.reader(file)
        next(csv_reader)  # Skip header row
        
        for row in csv_reader:
            # Each row is a list: ['MG132', '475.6', '89.5']
            name = row[0]
            mw = float(row[1])    # Convert string to number
            mass = float(row[2])  # Convert string to number
            
            names.append(name)
            molecular_weights.append(mw)
            masses.append(mass)
    
    return names, molecular_weights, masses

# Test our function
reagent_names, reagent_mws, reagent_masses = read_reagent_data('reagents.csv')

print(f"Loaded {len(reagent_names)} reagents:")
print(f"Names: {reagent_names}")
print(f"MWs: {reagent_mws}")
print(f"Masses: {reagent_masses}")

# Verify the data
print("\nFirst reagent details:")
print(f"Name: {reagent_names[0]}")
print(f"MW: {reagent_mws[0]} g/mol")
print(f"Mass: {reagent_masses[0]} mg")

## Processing the Data (Just Like Step 3!)

Now we can use our familiar functions and loops:

In [None]:
# Our stock solution function from previous lectures
def calculate_volume(molecular_weight, mass_mg, concentration_mM):
    """Calculate solvent volume needed for stock solution."""
    return mass_mg / (molecular_weight * concentration_mM)

# Process all reagents loaded from file
print("Stock Solution Preparation (10 mM):")
print("=" * 40)

# Lists to collect results
calculated_volumes = []
solvents = []

for i in range(len(reagent_names)):
    name = reagent_names[i]
    mw = reagent_mws[i]
    mass = reagent_masses[i]
    
    # Calculate volume
    volume = calculate_volume(mw, mass, 10)  # 10 mM stock
    
    # Determine solvent (simple logic)
    solvent = "DMSO"  # Most lab compounds dissolve in DMSO
    
    # Store results
    calculated_volumes.append(round(volume, 2))
    solvents.append(solvent)
    
    print(f"{name:15}: Add {volume:6.2f} mL {solvent}")

print(f"\nProcessed {len(reagent_names)} reagents from file!")

## Writing Results to CSV Files

Save your calculated results back to a new CSV file:

In [None]:
# Write results to a new CSV file
def write_stock_solutions(names, mws, masses, volumes, solvents, filename):
    """Write stock solution results to CSV file."""
    
    with open(filename, 'w', newline='') as file:
        csv_writer = csv.writer(file)
        
        # Write header row
        header = ['Reagent', 'MW_g_per_mol', 'Mass_mg', 'Volume_mL', 'Solvent']
        csv_writer.writerow(header)
        
        # Write data rows
        for i in range(len(names)):
            row = [names[i], mws[i], masses[i], volumes[i], solvents[i]]
            csv_writer.writerow(row)
    
    print(f"Results written to {filename}")

# Write our results
output_filename = 'stock_solutions.csv'
write_stock_solutions(
    reagent_names, 
    reagent_mws, 
    reagent_masses, 
    calculated_volumes, 
    solvents, 
    output_filename
)

# Verify the file was created
print(f"File created: {os.path.exists(output_filename)}")
print(f"File size: {os.path.getsize(output_filename)} bytes")

# Let's read it back to see what we created
print("\nContents of output file:")
print("-" * 25)
with open(output_filename, 'r') as file:
    content = file.read()
    print(content)

## Exercise 1: Complete File Processing Pipeline

Create your own complete data processing pipeline:

In [None]:
# First, create a sample experimental data file
experiment_data = """Sample,Treatment,Concentration_uM,Viability_percent
Control_1,Control,0,100
Control_2,Control,0,98
Control_3,Control,0,102
Drug_A_1,Drug_A,1,87
Drug_A_2,Drug_A,1,89
Drug_A_3,Drug_A,1,85
Drug_A_4,Drug_A,10,65
Drug_A_5,Drug_A,10,67
Drug_A_6,Drug_A,10,63
Drug_B_1,Drug_B,1,92
Drug_B_2,Drug_B,1,94
Drug_B_3,Drug_B,1,90
Drug_B_4,Drug_B,10,78
Drug_B_5,Drug_B,10,76
Drug_B_6,Drug_B,10,80"""

with open('experiment_data.csv', 'w') as file:
    file.write(experiment_data)

print("Created experiment_data.csv")

# YOUR TASK: Complete the pipeline
def process_experiment_data(input_filename, output_filename):
    """Read experiment data, analyze it, and write results."""
    
    # Step 1: Read data from CSV
    samples = []
    treatments = []
    concentrations = []
    viabilities = []
    
    # YOUR CODE HERE: Read the CSV file
    # Hint: Use csv.reader, skip header, extract each column
    
    with open(input_filename, 'r') as file:
        # YOUR CODE HERE
        pass
    
    # Step 2: Process and analyze data
    analyzed_samples = []
    analyzed_treatments = []
    analyzed_concentrations = []
    analyzed_viabilities = []
    toxicity_categories = []
    
    # YOUR CODE HERE: Loop through data and categorize toxicity
    # Toxicity: >80% = "Non-toxic", 60-80% = "Moderate", <60% = "Toxic"
    
    for i in range(len(samples)):
        # YOUR CODE HERE
        pass
    
    # Step 3: Write results to new CSV
    # YOUR CODE HERE: Write analyzed data with toxicity categories
    
    with open(output_filename, 'w', newline='') as file:
        # YOUR CODE HERE
        pass
    
    return len(samples)  # Return number of samples processed

# Test your pipeline
num_processed = process_experiment_data('experiment_data.csv', 'analyzed_results.csv')
print(f"\nProcessed {num_processed} samples")

# Check if output file was created
if os.path.exists('analyzed_results.csv'):
    print("✓ Analysis complete - results saved!")
else:
    print("✗ Something went wrong - check your code")

## Working with Multiple Files

In real labs, you often have multiple data files to process:

In [None]:
# Create multiple sample files
def create_sample_files():
    """Create multiple sample data files."""
    
    # File 1: Monday's experiment
    monday_data = """Sample,Count
Control_1,45
Control_2,52
Treatment_1,78
Treatment_2,82"""
    
    # File 2: Tuesday's experiment  
    tuesday_data = """Sample,Count
Control_1,48
Control_2,49
Treatment_1,85
Treatment_2,89"""
    
    # File 3: Wednesday's experiment
    wednesday_data = """Sample,Count
Control_1,46
Control_2,51
Treatment_1,92
Treatment_2,88"""
    
    files = {
        'monday_cells.csv': monday_data,
        'tuesday_cells.csv': tuesday_data,
        'wednesday_cells.csv': wednesday_data
    }
    
    for filename, data in files.items():
        with open(filename, 'w') as file:
            file.write(data)
    
    return list(files.keys())

# Create the files
data_files = create_sample_files()
print(f"Created {len(data_files)} data files:")
for filename in data_files:
    print(f"  {filename}")

# Process all files and combine results
def process_multiple_files(file_list):
    """Process multiple data files and combine results."""
    
    all_samples = []
    all_counts = []
    all_days = []
    
    for filename in file_list:
        # Extract day from filename
        day = filename.split('_')[0]  # 'monday' from 'monday_cells.csv'
        
        print(f"\nProcessing {filename} ({day})...")
        
        with open(filename, 'r') as file:
            csv_reader = csv.reader(file)
            next(csv_reader)  # Skip header
            
            for row in csv_reader:
                sample = row[0]
                count = int(row[1])
                
                all_samples.append(sample)
                all_counts.append(count)
                all_days.append(day)
                
                print(f"  {sample}: {count} cells")
    
    return all_samples, all_counts, all_days

# Process all files
samples, counts, days = process_multiple_files(data_files)

print(f"\nCombined data: {len(samples)} total measurements")
print(f"Days: {set(days)}")
print(f"Sample types: {set(samples)}")

## Exercise 2: Multi-File Analysis

Write the combined results and create a summary analysis:

In [None]:
# YOUR TASK: Analyze the combined data and write comprehensive results

def analyze_and_write_combined_data(samples, counts, days, output_file):
    """Analyze combined data and write comprehensive results."""
    
    # YOUR CODE HERE: 
    # 1. Write detailed results with all data
    # 2. Calculate averages for each sample type
    # 3. Write a summary file with averages
    
    # Write detailed results
    with open(output_file, 'w', newline='') as file:
        csv_writer = csv.writer(file)
        
        # Write header
        csv_writer.writerow(['Day', 'Sample', 'Count', 'Sample_Type'])
        
        # Write all data with sample type classification
        for i in range(len(samples)):
            sample = samples[i]
            count = counts[i]
            day = days[i]
            
            # Classify sample type
            if 'Control' in sample:
                sample_type = 'Control'
            else:
                sample_type = 'Treatment'
            
            csv_writer.writerow([day.title(), sample, count, sample_type])
    
    # Calculate and write summary
    # YOUR CODE HERE: Calculate averages for Control vs Treatment
    control_counts = []
    treatment_counts = []
    
    for i in range(len(samples)):
        if 'Control' in samples[i]:
            control_counts.append(counts[i])
        else:
            treatment_counts.append(counts[i])
    
    control_avg = sum(control_counts) / len(control_counts)
    treatment_avg = sum(treatment_counts) / len(treatment_counts)
    fold_change = treatment_avg / control_avg
    
    # Write summary file
    summary_file = 'experiment_summary.csv'
    with open(summary_file, 'w', newline='') as file:
        csv_writer = csv.writer(file)
        csv_writer.writerow(['Metric', 'Value'])
        csv_writer.writerow(['Total_samples', len(samples)])
        csv_writer.writerow(['Control_average', round(control_avg, 1)])
        csv_writer.writerow(['Treatment_average', round(treatment_avg, 1)])
        csv_writer.writerow(['Fold_change', round(fold_change, 2)])
        csv_writer.writerow(['Days_analyzed', len(set(days))])
    
    return control_avg, treatment_avg, fold_change

# Run the analysis
ctrl_avg, treat_avg, fold = analyze_and_write_combined_data(
    samples, counts, days, 'combined_results.csv'
)

print("Analysis Results:")
print(f"Control average: {ctrl_avg:.1f} cells")
print(f"Treatment average: {treat_avg:.1f} cells")
print(f"Fold change: {fold:.2f}x")

print("\nFiles created:")
if os.path.exists('combined_results.csv'):
    print("✓ combined_results.csv")
if os.path.exists('experiment_summary.csv'):
    print("✓ experiment_summary.csv")

## Error Handling: Safe File Operations

Real lab work requires robust error handling:

In [None]:
def safe_read_csv(filename):
    """Safely read CSV file with error handling."""
    
    # Check if file exists
    if not os.path.exists(filename):
        print(f"Error: File '{filename}' not found!")
        return None, None, None
    
    # Check if file is empty
    if os.path.getsize(filename) == 0:
        print(f"Error: File '{filename}' is empty!")
        return None, None, None
    
    try:
        names = []
        values1 = []
        values2 = []
        
        with open(filename, 'r') as file:
            csv_reader = csv.reader(file)
            
            # Skip header
            header = next(csv_reader)
            print(f"Reading file with columns: {header}")
            
            row_count = 0
            for row in csv_reader:
                if len(row) < 3:  # Ensure we have enough columns
                    print(f"Warning: Row {row_count+1} has insufficient data: {row}")
                    continue
                
                try:
                    name = row[0]
                    val1 = float(row[1])  # This might fail if not a number
                    val2 = float(row[2])  # This might fail if not a number
                    
                    names.append(name)
                    values1.append(val1)
                    values2.append(val2)
                    row_count += 1
                    
                except ValueError as e:
                    print(f"Warning: Could not convert data in row {row_count+1}: {row}")
                    print(f"  Error: {e}")
                    continue
            
            print(f"Successfully read {row_count} rows")
            return names, values1, values2
            
    except Exception as e:
        print(f"Error reading file '{filename}': {e}")
        return None, None, None

# Test with our good file
print("Testing with good file:")
names, mws, masses = safe_read_csv('reagents.csv')
if names:
    print(f"✓ Successfully loaded {len(names)} reagents")

print("\nTesting with non-existent file:")
result = safe_read_csv('nonexistent.csv')
if result[0] is None:
    print("✓ Error handling worked correctly")

# Create a file with bad data to test error handling
bad_data = """Name,Value1,Value2
Good_row,123.4,567.8
Bad_row,not_a_number,567.8
Another_good,999.9,111.1
Incomplete_row,123.4
Final_good,456.7,890.1"""

with open('bad_data.csv', 'w') as file:
    file.write(bad_data)

print("\nTesting with file containing errors:")
names, vals1, vals2 = safe_read_csv('bad_data.csv')
if names:
    print(f"✓ Recovered {len(names)} good rows despite errors")

## Exercise 3: Complete Laboratory Automation System

Build a complete system that could actually be used in your lab:

In [None]:
def lab_automation_system():
    """Complete laboratory data processing system."""
    
    print("🧪 Laboratory Data Processing System")
    print("=" * 40)
    
    # Step 1: Create sample lab data
    print("Step 1: Creating sample lab data...")
    
    # Protein quantification data
    protein_data = """Sample_ID,A595,Dilution_factor
Standard_0,0.000,1
Standard_25,0.125,1
Standard_50,0.250,1
Standard_100,0.485,1
Unknown_1,0.342,10
Unknown_2,0.198,10
Unknown_3,0.567,5
Unknown_4,0.423,10
Unknown_5,0.289,10"""
    
    with open('protein_assay.csv', 'w') as file:
        file.write(protein_data)
    
    # Step 2: Read and process data
    print("Step 2: Reading and processing data...")
    
    # YOUR TASK: Complete this analysis
    # 1. Read the protein assay data
    # 2. Calculate protein concentrations using standard curve
    # 3. Account for dilution factors
    # 4. Write results with analysis
    
    sample_ids = []
    absorbances = []
    dilution_factors = []
    
    # YOUR CODE HERE: Read the data
    with open('protein_assay.csv', 'r') as file:
        # YOUR CODE HERE
        pass
    
    # Process and calculate concentrations
    # Simple linear standard curve: Concentration = A595 * 206.2 (μg/mL)
    # This is a simplified calculation - real curves are more complex!
    
    calculated_concentrations = []
    final_concentrations = []
    sample_types = []
    
    # YOUR CODE HERE: Process each sample
    for i in range(len(sample_ids)):
        # YOUR CODE HERE:
        # 1. Calculate concentration from absorbance
        # 2. Apply dilution factor
        # 3. Classify as Standard or Unknown
        pass
    
    # Step 3: Write comprehensive results
    print("Step 3: Writing results...")
    
    # YOUR CODE HERE: Write detailed results file
    
    # Step 4: Generate summary report
    print("Step 4: Generating summary...")
    
    # YOUR CODE HERE: Create summary statistics
    
    print("\n✅ Laboratory automation complete!")
    print("Files generated:")
    print("  • protein_assay_results.csv (detailed results)")
    print("  • protein_assay_summary.csv (summary statistics)")
    
    return len(sample_ids)

# Run the complete system
# samples_processed = lab_automation_system()
# print(f"\nTotal samples processed: {samples_processed}")

# For now, just show the structure
print("Complete laboratory automation system ready!")
print("Uncomment the lines above to run the full analysis.")

## File Organization Best Practices

For real lab work, organize your files properly:

In [None]:
import datetime

def organize_lab_files():
    """Demonstrate good file organization practices."""
    
    # Create organized directory structure
    today = datetime.date.today()
    date_str = today.strftime("%Y_%m_%d")
    
    # Create directories
    directories = [
        f"Lab_Data_{date_str}",
        f"Lab_Data_{date_str}/Raw_Data",
        f"Lab_Data_{date_str}/Processed_Data",
        f"Lab_Data_{date_str}/Results",
        f"Lab_Data_{date_str}/Reports"
    ]
    
    for directory in directories:
        if not os.path.exists(directory):
            os.makedirs(directory)
    
    print(f"Created organized structure for {date_str}:")
    for directory in directories:
        print(f"  📁 {directory}")
    
    # Example of organized file naming
    file_examples = [
        f"Lab_Data_{date_str}/Raw_Data/protein_assay_raw_{date_str}.csv",
        f"Lab_Data_{date_str}/Processed_Data/protein_concentrations_{date_str}.csv",
        f"Lab_Data_{date_str}/Results/analysis_summary_{date_str}.csv",
        f"Lab_Data_{date_str}/Reports/experiment_report_{date_str}.txt"
    ]
    
    print("\nOrganized file naming examples:")
    for example in file_examples:
        print(f"  📄 {example}")
    
    return directories

# Demonstrate organization
created_dirs = organize_lab_files()
print(f"\n✅ Created {len(created_dirs)} organized directories")

# Best practices summary
print("\n📋 File Organization Best Practices:")
practices = [
    "Use dates in YYYY_MM_DD format for chronological sorting",
    "Separate raw data, processed data, and results",
    "Use descriptive filenames: 'protein_assay_2024_01_15.csv'",
    "Avoid spaces in filenames - use underscores instead",
    "Keep a consistent naming convention across projects",
    "Back up important data files regularly"
]

for i, practice in enumerate(practices, 1):
    print(f"  {i}. {practice}")

## Cleanup: Remove Practice Files

Clean up the files we created for practice:

In [None]:
import shutil

# List of files to clean up
practice_files = [
    'reagents.csv',
    'stock_solutions.csv',
    'experiment_data.csv',
    'analyzed_results.csv',
    'monday_cells.csv',
    'tuesday_cells.csv', 
    'wednesday_cells.csv',
    'combined_results.csv',
    'experiment_summary.csv',
    'bad_data.csv',
    'protein_assay.csv'
]

print("Cleaning up practice files...")
cleaned_files = 0

for filename in practice_files:
    if os.path.exists(filename):
        os.remove(filename)
        cleaned_files += 1
        print(f"  ✓ Removed {filename}")

# Clean up directories (be careful!)
import glob
lab_dirs = glob.glob('Lab_Data_*')
for directory in lab_dirs:
    if os.path.exists(directory):
        shutil.rmtree(directory)
        cleaned_files += 1
        print(f"  ✓ Removed directory {directory}")

print(f"\n🧹 Cleaned up {cleaned_files} items")
print("Workspace is clean and ready for your real lab data!")

## Summary: Your Complete Lab Automation Toolkit

### Reading CSV Files
- **`csv.reader(file)`**: Read CSV row by row
- **`next(csv_reader)`**: Skip header row
- **Convert data types**: `float(row[1])` for numerical data
- **Error handling**: Check file exists, handle bad data

### Processing Data
- **Use familiar patterns**: Functions, loops, and lists
- **Collect results**: Build lists for each output column
- **Calculate and categorize**: Apply your analysis logic
- **Handle edge cases**: Missing data, invalid values

### Writing CSV Files
- **`csv.writer(file)`**: Create CSV writer
- **`writerow([data])`**: Write single row
- **`writerows([rows])`**: Write multiple rows at once
- **Always include headers**: Make files self-documenting

### File Organization
- **Descriptive names**: Include dates and experiment details
- **Logical structure**: Raw data → Processed data → Results
- **Consistent naming**: Develop and stick to conventions
- **Error handling**: Always check files exist before reading

## Real-World Applications

You can now automate:
- **Stock solution calculations** from reagent inventories
- **Protein assay analysis** from plate reader exports
- **Cell counting data** from microscopy sessions
- **qPCR analysis** from instrument output files
- **Drug screening results** from multiple experiments
- **Any repetitive lab calculation** that involves data files

## The Complete Workflow

```python
# 1. Read data from CSV
data = read_csv_file('lab_data.csv')

# 2. Process with functions and loops
results = process_data(data)

# 3. Write results back to CSV
write_csv_file(results, 'analysis_results.csv')

# 4. Your lab workflow is now automated!
```

## Next Steps

You've completed the foundation of laboratory programming:
- ✅ **Variables and calculations**
- ✅ **Functions for reusable code**
- ✅ **Lists and loops for multiple data**
- ✅ **File paths and system navigation**
- ✅ **Data collection and organization**
- ✅ **Reading and writing CSV files**

**You're ready to automate your real lab work!**

In future lectures, you'll learn:
- **Advanced data structures** (dictionaries, sets)
- **Data analysis libraries** (pandas, numpy)
- **Visualization** (matplotlib, seaborn)
- **Statistical analysis** and machine learning

But with what you know now, you can already save hours of manual work every week!