# Episode 6: Analyzing Data from Multiple Files

Real research projects involve analyzing data from multiple files. In this notebook, we'll learn to work with files, process multiple datasets, and build automated analysis workflows for inflammation data.

## Learning Objectives
- Read and write files in Python
- Process multiple data files automatically
- Use glob patterns to find files
- Handle different file formats (CSV, text)
- Build file processing workflows
- Organize and save analysis results

## Introduction

In our inflammation study, we have data from multiple patients stored in separate files. Instead of analyzing each file manually, we'll learn to automate the process and analyze all files systematically.

## 1. Reading Files

Let's start with basic file operations:

In [None]:
# Basic file reading
import os

# Create a sample data file for demonstration
sample_data = """# Inflammation data for Patient P001
# Day, Inflammation Level
1, 0.0
2, 1.5
3, 3.2
4, 4.1
5, 2.8
6, 1.9
7, 0.8
8, 0.0
"""

# Write sample file
with open('sample_patient.txt', 'w') as f:
    f.write(sample_data)

print("Sample file created: sample_patient.txt")
print("File exists:", os.path.exists('sample_patient.txt'))
print("File size:", os.path.getsize('sample_patient.txt'), "bytes")

In [None]:
# Reading the entire file
with open('sample_patient.txt', 'r') as f:
    content = f.read()
    
print("File contents:")
print(content)
print(f"Content type: {type(content)}")
print(f"Content length: {len(content)} characters")

In [None]:
# Reading line by line
print("Reading line by line:")
with open('sample_patient.txt', 'r') as f:
    for line_number, line in enumerate(f, 1):
        print(f"Line {line_number}: {line.strip()}")

In [None]:
# Reading all lines into a list
with open('sample_patient.txt', 'r') as f:
    lines = f.readlines()

print(f"Read {len(lines)} lines:")
for i, line in enumerate(lines):
    print(f"  {i}: {repr(line)}")

# Clean up lines (remove newlines and whitespace)
clean_lines = [line.strip() for line in lines]
print(f"\nCleaned lines:")
for i, line in enumerate(clean_lines):
    print(f"  {i}: '{line}'")

## 2. Parsing Data Files

Extract meaningful data from files:

In [None]:
def parse_inflammation_file(filename):
    """Parse an inflammation data file and extract numeric data."""
    inflammation_data = []
    metadata = {}
    
    with open(filename, 'r') as f:
        for line_num, line in enumerate(f, 1):
            line = line.strip()
            
            # Skip empty lines
            if not line:
                continue
            
            # Handle comment lines (metadata)
            if line.startswith('#'):
                if 'Patient' in line:
                    # Extract patient ID from comment
                    parts = line.split()
                    for part in parts:
                        if part.startswith('P'):
                            metadata['patient_id'] = part
                continue
            
            # Parse data lines
            try:
                if ',' in line:
                    # CSV format: day, inflammation
                    day_str, inflammation_str = line.split(',')
                    day = int(day_str.strip())
                    inflammation = float(inflammation_str.strip())
                    inflammation_data.append((day, inflammation))
                else:
                    # Single value per line
                    inflammation = float(line)
                    inflammation_data.append(inflammation)
            except ValueError as e:
                print(f"Warning: Could not parse line {line_num}: '{line}' ({e})")
                continue
    
    return inflammation_data, metadata

# Test the parser
data, meta = parse_inflammation_file('sample_patient.txt')
print(f"Parsed data: {len(data)} readings")
print(f"Metadata: {meta}")
print(f"First few readings: {data[:5]}")

# Extract just the inflammation values
if data and isinstance(data[0], tuple):
    # Data contains (day, inflammation) tuples
    inflammation_values = [reading[1] for reading in data]
else:
    # Data contains just inflammation values
    inflammation_values = data

print(f"Inflammation values: {inflammation_values}")
print(f"Average inflammation: {sum(inflammation_values) / len(inflammation_values):.2f}")

## 3. Working with Multiple Files

Create and process multiple patient files:

In [None]:
# Create multiple sample patient files
import random
random.seed(42)  # For reproducible results

# Create a data directory
os.makedirs('inflammation_data', exist_ok=True)

def generate_inflammation_pattern(days=10, patient_id="P000"):
    """Generate a realistic inflammation pattern."""
    inflammation_data = []
    
    for day in range(1, days + 1):
        # Simulate inflammation pattern: starts low, peaks, then decreases
        if day <= days // 3:
            # Early phase: increasing
            base_level = day * 1.5
        elif day <= 2 * days // 3:
            # Peak phase: high and variable
            base_level = 4.0 + random.uniform(-1.0, 2.0)
        else:
            # Recovery phase: decreasing
            base_level = max(0.5, 5.0 - (day - 2 * days // 3) * 0.8)
        
        # Add some random variation
        inflammation = max(0.0, base_level + random.uniform(-0.5, 0.5))
        inflammation_data.append(round(inflammation, 1))
    
    return inflammation_data

# Generate files for multiple patients
patients = ['P001', 'P002', 'P003', 'P004', 'P005']
file_info = []

for patient_id in patients:
    # Generate data
    days = random.randint(8, 12)
    inflammation_data = generate_inflammation_pattern(days, patient_id)
    
    # Create filename
    filename = f'inflammation_data/inflammation_{patient_id.lower()}.csv'
    
    # Write file
    with open(filename, 'w') as f:
        f.write(f"# Inflammation study data\n")
        f.write(f"# Patient: {patient_id}\n")
        f.write(f"# Days monitored: {days}\n")
        f.write(f"# Format: Day, Inflammation Level\n")
        f.write("Day,Inflammation\n")  # Header
        
        for day, inflammation in enumerate(inflammation_data, 1):
            f.write(f"{day},{inflammation}\n")
    
    file_info.append({
        'patient_id': patient_id,
        'filename': filename,
        'days': days,
        'average': sum(inflammation_data) / len(inflammation_data)
    })
    
    print(f"Created {filename} for {patient_id} ({days} days, avg: {sum(inflammation_data) / len(inflammation_data):.2f})")

print(f"\nCreated {len(file_info)} patient files in inflammation_data/")

In [None]:
# List files in the directory
import os

data_dir = 'inflammation_data'
files = os.listdir(data_dir)

print(f"Files in {data_dir}:")
for filename in sorted(files):
    filepath = os.path.join(data_dir, filename)
    size = os.path.getsize(filepath)
    print(f"  {filename:<25} ({size} bytes)")

## 4. Using Glob to Find Files

Find files using patterns:

In [None]:
import glob

# Find all CSV files
csv_files = glob.glob('inflammation_data/*.csv')
print(f"Found {len(csv_files)} CSV files:")
for file in sorted(csv_files):
    print(f"  {file}")

# Find files with specific patterns
p00_files = glob.glob('inflammation_data/*p00*.csv')
print(f"\nFiles matching pattern '*p00*': {len(p00_files)}")
for file in sorted(p00_files):
    print(f"  {file}")

# More specific patterns
all_inflammation_files = glob.glob('inflammation_data/inflammation_*.csv')
print(f"\nInflammation files: {len(all_inflammation_files)}")
for file in sorted(all_inflammation_files):
    print(f"  {os.path.basename(file)}")

## 5. Processing Multiple Files

Automated analysis of all patient files:

In [None]:
def parse_csv_inflammation_file(filename):
    """Parse CSV inflammation file with header."""
    inflammation_data = []
    metadata = {'filename': filename}
    
    with open(filename, 'r') as f:
        header_found = False
        
        for line_num, line in enumerate(f, 1):
            line = line.strip()
            
            if not line:
                continue
            
            # Handle comment lines
            if line.startswith('#'):
                if 'Patient:' in line:
                    metadata['patient_id'] = line.split(':')[1].strip()
                elif 'Days monitored:' in line:
                    metadata['days_monitored'] = int(line.split(':')[1].strip())
                continue
            
            # Skip header row
            if 'Day' in line and 'Inflammation' in line:
                header_found = True
                continue
            
            # Parse data
            try:
                day_str, inflammation_str = line.split(',')
                day = int(day_str.strip())
                inflammation = float(inflammation_str.strip())
                inflammation_data.append(inflammation)
            except ValueError as e:
                print(f"Warning in {filename} line {line_num}: {e}")
                continue
    
    return inflammation_data, metadata

def analyze_inflammation_data(data):
    """Calculate statistics for inflammation data."""
    if not data:
        return None
    
    stats = {
        'count': len(data),
        'sum': sum(data),
        'average': sum(data) / len(data),
        'minimum': min(data),
        'maximum': max(data),
    }
    
    # Add standard deviation
    if len(data) > 1:
        mean = stats['average']
        variance = sum((x - mean)**2 for x in data) / (len(data) - 1)
        stats['std_dev'] = variance ** 0.5
    else:
        stats['std_dev'] = 0.0
    
    return stats

# Process all inflammation files
def process_all_inflammation_files(pattern='inflammation_data/*.csv'):
    """Process all inflammation files and return results."""
    files = glob.glob(pattern)
    results = []
    
    print(f"Processing {len(files)} files...")
    print("=" * 60)
    
    for filename in sorted(files):
        try:
            # Parse file
            data, metadata = parse_csv_inflammation_file(filename)
            
            # Analyze data
            stats = analyze_inflammation_data(data)
            
            if stats:
                result = {
                    'filename': os.path.basename(filename),
                    'patient_id': metadata.get('patient_id', 'Unknown'),
                    'data': data,
                    'statistics': stats,
                    'status': 'success'
                }
                
                # Print summary
                print(f"✅ {result['patient_id']:<6} | "
                      f"Days: {stats['count']:2d} | "
                      f"Avg: {stats['average']:5.2f} | "
                      f"Range: {stats['minimum']:4.1f}-{stats['maximum']:4.1f} | "
                      f"SD: {stats['std_dev']:4.2f}")
            else:
                result = {
                    'filename': os.path.basename(filename),
                    'patient_id': metadata.get('patient_id', 'Unknown'),
                    'status': 'no_data'
                }
                print(f"⚠️  {result['patient_id']:<6} | No valid data found")
            
            results.append(result)
            
        except Exception as e:
            result = {
                'filename': os.path.basename(filename),
                'patient_id': 'Unknown',
                'status': 'error',
                'error': str(e)
            }
            results.append(result)
            print(f"❌ {os.path.basename(filename):<20} | Error: {e}")
    
    print("=" * 60)
    return results

# Process all files
analysis_results = process_all_inflammation_files()

### Exercise 6.1
Create functions to:
1. Find the patient with the highest average inflammation
2. Identify patients with inflammation above a threshold
3. Calculate overall study statistics

In [None]:
# Exercise 6.1 - Your analysis functions
def find_highest_inflammation_patient(results):
    """Find patient with highest average inflammation."""
    # Your implementation here
    pass

def find_patients_above_threshold(results, threshold=3.0):
    """Find patients with average inflammation above threshold."""
    # Your implementation here
    pass

def calculate_study_statistics(results):
    """Calculate overall study statistics."""
    # Your implementation here
    pass

# Test your functions
# Add your test code here

## 6. Writing Analysis Results

Save analysis results to files:

In [None]:
def write_summary_report(results, output_filename='analysis_summary.txt'):
    """Write a comprehensive summary report."""
    
    successful_results = [r for r in results if r['status'] == 'success']
    
    if not successful_results:
        print("No successful analyses to report")
        return
    
    with open(output_filename, 'w') as f:
        # Header
        f.write("INFLAMMATION STUDY ANALYSIS REPORT\n")
        f.write("=" * 50 + "\n")
        f.write(f"Generated on: {__import__('datetime').datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
        f.write(f"Total patients analyzed: {len(successful_results)}\n\n")
        
        # Individual patient summaries
        f.write("INDIVIDUAL PATIENT SUMMARIES\n")
        f.write("-" * 30 + "\n")
        
        for result in successful_results:
            stats = result['statistics']
            f.write(f"\nPatient: {result['patient_id']}\n")
            f.write(f"  File: {result['filename']}\n")
            f.write(f"  Days monitored: {stats['count']}\n")
            f.write(f"  Average inflammation: {stats['average']:.3f}\n")
            f.write(f"  Range: {stats['minimum']:.1f} - {stats['maximum']:.1f}\n")
            f.write(f"  Standard deviation: {stats['std_dev']:.3f}\n")
            
            # Classification
            if stats['average'] > 4.0:
                classification = "High risk"
            elif stats['average'] > 2.5:
                classification = "Moderate risk"
            else:
                classification = "Low risk"
            f.write(f"  Risk level: {classification}\n")
        
        # Overall statistics
        all_averages = [r['statistics']['average'] for r in successful_results]
        overall_avg = sum(all_averages) / len(all_averages)
        overall_min = min(all_averages)
        overall_max = max(all_averages)
        
        f.write(f"\n\nOVERALL STUDY STATISTICS\n")
        f.write("-" * 25 + "\n")
        f.write(f"Study-wide average inflammation: {overall_avg:.3f}\n")
        f.write(f"Patient average range: {overall_min:.3f} - {overall_max:.3f}\n")
        
        # Risk distribution
        high_risk = sum(1 for avg in all_averages if avg > 4.0)
        moderate_risk = sum(1 for avg in all_averages if 2.5 < avg <= 4.0)
        low_risk = sum(1 for avg in all_averages if avg <= 2.5)
        
        f.write(f"\nRisk distribution:\n")
        f.write(f"  High risk (>4.0): {high_risk} patients ({high_risk/len(all_averages)*100:.1f}%)\n")
        f.write(f"  Moderate risk (2.5-4.0): {moderate_risk} patients ({moderate_risk/len(all_averages)*100:.1f}%)\n")
        f.write(f"  Low risk (<2.5): {low_risk} patients ({low_risk/len(all_averages)*100:.1f}%)\n")
        
        # Recommendations
        f.write(f"\nRECOMMENDATIONS\n")
        f.write("-" * 15 + "\n")
        if high_risk > 0:
            f.write(f"• {high_risk} patients require immediate follow-up\n")
        if moderate_risk > 0:
            f.write(f"• {moderate_risk} patients need continued monitoring\n")
        if overall_avg > 3.0:
            f.write(f"• Study-wide inflammation levels are elevated - investigate environmental factors\n")
        
        f.write(f"\nEnd of report.\n")
    
    print(f"Summary report written to: {output_filename}")
    return output_filename

# Generate summary report
report_file = write_summary_report(analysis_results)

# Read and display the report
print("\n" + "="*50)
print("GENERATED REPORT PREVIEW:")
print("="*50)
with open(report_file, 'r') as f:
    print(f.read())

In [None]:
# Write detailed CSV results for further analysis
def write_csv_results(results, output_filename='detailed_results.csv'):
    """Write detailed results in CSV format."""
    
    successful_results = [r for r in results if r['status'] == 'success']
    
    with open(output_filename, 'w') as f:
        # Write header
        f.write("Patient_ID,Filename,Days_Monitored,Average_Inflammation,"
               "Min_Inflammation,Max_Inflammation,Std_Dev,Risk_Level\n")
        
        # Write data
        for result in successful_results:
            stats = result['statistics']
            
            # Determine risk level
            if stats['average'] > 4.0:
                risk_level = "High"
            elif stats['average'] > 2.5:
                risk_level = "Moderate"
            else:
                risk_level = "Low"
            
            f.write(f"{result['patient_id']},"
                   f"{result['filename']},"
                   f"{stats['count']},"
                   f"{stats['average']:.3f},"
                   f"{stats['minimum']:.3f},"
                   f"{stats['maximum']:.3f},"
                   f"{stats['std_dev']:.3f},"
                   f"{risk_level}\n")
    
    print(f"Detailed CSV results written to: {output_filename}")
    return output_filename

csv_file = write_csv_results(analysis_results)

# Preview the CSV file
print("\nCSV file preview:")
with open(csv_file, 'r') as f:
    lines = f.readlines()
    for i, line in enumerate(lines):
        print(f"{i+1:2d}: {line.strip()}")

## 7. File Organization and Batch Processing

Organize files and create processing workflows:

In [None]:
# Create organized directory structure
import shutil
from datetime import datetime

def organize_analysis_files():
    """Organize analysis files into a structured directory."""
    
    # Create organized structure
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    analysis_dir = f"analysis_{timestamp}"
    
    directories = {
        'data': f"{analysis_dir}/data",
        'results': f"{analysis_dir}/results",
        'reports': f"{analysis_dir}/reports",
        'plots': f"{analysis_dir}/plots"
    }
    
    # Create directories
    for name, path in directories.items():
        os.makedirs(path, exist_ok=True)
        print(f"Created directory: {path}")
    
    # Copy data files
    data_files = glob.glob('inflammation_data/*.csv')
    for file in data_files:
        dest = os.path.join(directories['data'], os.path.basename(file))
        shutil.copy2(file, dest)
    print(f"Copied {len(data_files)} data files")
    
    # Move result files
    result_files = ['analysis_summary.txt', 'detailed_results.csv']
    for file in result_files:
        if os.path.exists(file):
            dest = os.path.join(directories['reports'], file)
            shutil.move(file, dest)
            print(f"Moved {file} to reports/")
    
    # Create analysis log
    log_file = os.path.join(directories['results'], 'analysis_log.txt')
    with open(log_file, 'w') as f:
        f.write(f"Inflammation Analysis Log\n")
        f.write(f"Analysis started: {datetime.now()}\n")
        f.write(f"Files processed: {len(data_files)}\n")
        f.write(f"Analysis directory: {analysis_dir}\n")
        f.write(f"\nFile structure:\n")
        for name, path in directories.items():
            f.write(f"  {name}: {path}\n")
    
    print(f"Created analysis log: {log_file}")
    return analysis_dir, directories

# Organize files
analysis_dir, dirs = organize_analysis_files()

print(f"\nAnalysis organized in: {analysis_dir}")
print("Directory structure:")
for name, path in dirs.items():
    files = os.listdir(path)
    print(f"  {name}: {len(files)} files")
    for file in files[:3]:  # Show first 3 files
        print(f"    - {file}")
    if len(files) > 3:
        print(f"    ... and {len(files) - 3} more")

In [None]:
# Complete batch processing workflow
def complete_inflammation_analysis_workflow(data_pattern, output_dir=None):
    """Complete workflow for inflammation data analysis."""
    
    print("🔬 INFLAMMATION DATA ANALYSIS WORKFLOW")
    print("=" * 50)
    
    # Step 1: Find files
    print("Step 1: Finding data files...")
    files = glob.glob(data_pattern)
    print(f"  Found {len(files)} files matching pattern: {data_pattern}")
    
    if not files:
        print("  ❌ No files found! Check the pattern.")
        return None
    
    # Step 2: Process files
    print("\nStep 2: Processing files...")
    results = process_all_inflammation_files(data_pattern)
    successful = [r for r in results if r['status'] == 'success']
    print(f"  ✅ Successfully processed: {len(successful)} files")
    print(f"  ❌ Failed: {len(results) - len(successful)} files")
    
    if not successful:
        print("  ❌ No successful analyses!")
        return None
    
    # Step 3: Generate reports
    print("\nStep 3: Generating reports...")
    summary_file = write_summary_report(results, 'workflow_summary.txt')
    csv_file = write_csv_results(results, 'workflow_results.csv')
    print(f"  📄 Summary report: {summary_file}")
    print(f"  📊 CSV results: {csv_file}")
    
    # Step 4: Calculate key insights
    print("\nStep 4: Key insights...")
    averages = [r['statistics']['average'] for r in successful]
    high_risk_count = sum(1 for avg in averages if avg > 4.0)
    study_avg = sum(averages) / len(averages)
    
    print(f"  📈 Study average inflammation: {study_avg:.2f}")
    print(f"  ⚠️  High-risk patients: {high_risk_count}/{len(successful)} ({high_risk_count/len(successful)*100:.1f}%)")
    print(f"  📊 Patient range: {min(averages):.2f} - {max(averages):.2f}")
    
    # Step 5: Organize results
    if output_dir:
        print(f"\nStep 5: Organizing results in {output_dir}...")
        # Move files to organized structure
        # Implementation would go here
    
    print("\n🎉 Workflow completed successfully!")
    print("=" * 50)
    
    return {
        'results': results,
        'successful_count': len(successful),
        'study_average': study_avg,
        'high_risk_count': high_risk_count,
        'summary_file': summary_file,
        'csv_file': csv_file
    }

# Run complete workflow
workflow_results = complete_inflammation_analysis_workflow(
    data_pattern=f"{analysis_dir}/data/*.csv"
)

### Exercise 6.2
Create a monitoring system that:
1. Watches for new data files
2. Automatically processes them
3. Updates summary statistics
4. Alerts on high-risk patients

In [None]:
# Exercise 6.2 - Your monitoring system
class InflammationMonitor:
    """Monitor for new inflammation data files and process them automatically."""
    
    def __init__(self, watch_directory, alert_threshold=4.0):
        # Your initialization here
        pass
    
    def scan_for_new_files(self):
        """Scan for files that haven't been processed yet."""
        # Your implementation here
        pass
    
    def process_new_file(self, filename):
        """Process a newly detected file."""
        # Your implementation here
        pass
    
    def check_for_alerts(self, result):
        """Check if patient needs immediate attention."""
        # Your implementation here
        pass
    
    def update_summary(self):
        """Update overall summary statistics."""
        # Your implementation here
        pass
    
    def run_monitoring_cycle(self):
        """Run one cycle of monitoring."""
        # Your implementation here
        pass

# Test your monitoring system
# Add test code here

## 8. Working with Different File Formats

Handle various data formats:

In [None]:
# Create sample files in different formats

# 1. Tab-separated values
with open('sample_data.tsv', 'w') as f:
    f.write("Day\tInflammation\tTemperature\n")
    f.write("1\t1.5\t36.8\n")
    f.write("2\t2.3\t37.2\n")
    f.write("3\t1.8\t36.9\n")

# 2. JSON format
import json
json_data = {
    "patient_id": "P999",
    "study_date": "2024-01-15",
    "measurements": [
        {"day": 1, "inflammation": 1.5, "temperature": 36.8},
        {"day": 2, "inflammation": 2.3, "temperature": 37.2},
        {"day": 3, "inflammation": 1.8, "temperature": 36.9}
    ]
}

with open('sample_data.json', 'w') as f:
    json.dump(json_data, f, indent=2)

# 3. Fixed-width format
with open('sample_data.txt', 'w') as f:
    f.write("Day Inflammation Temperature\n")
    f.write("  1        1.5       36.8\n")
    f.write("  2        2.3       37.2\n")
    f.write("  3        1.8       36.9\n")

print("Created sample files in different formats:")
for filename in ['sample_data.tsv', 'sample_data.json', 'sample_data.txt']:
    print(f"  {filename} ({os.path.getsize(filename)} bytes)")

In [None]:
# Universal file parser
def parse_any_format(filename):
    """Parse inflammation data from various file formats."""
    
    _, ext = os.path.splitext(filename.lower())
    
    try:
        if ext == '.json':
            return parse_json_format(filename)
        elif ext == '.tsv':
            return parse_tsv_format(filename)
        elif ext in ['.txt', '.dat']:
            return parse_text_format(filename)
        elif ext == '.csv':
            return parse_csv_inflammation_file(filename)
        else:
            # Try to auto-detect format
            return auto_detect_and_parse(filename)
    
    except Exception as e:
        print(f"Error parsing {filename}: {e}")
        return None, None

def parse_json_format(filename):
    """Parse JSON format inflammation data."""
    with open(filename, 'r') as f:
        data = json.load(f)
    
    measurements = data.get('measurements', [])
    inflammation_data = [m['inflammation'] for m in measurements]
    
    metadata = {
        'patient_id': data.get('patient_id', 'Unknown'),
        'study_date': data.get('study_date', 'Unknown'),
        'filename': filename
    }
    
    return inflammation_data, metadata

def parse_tsv_format(filename):
    """Parse tab-separated values format."""
    inflammation_data = []
    metadata = {'filename': filename}
    
    with open(filename, 'r') as f:
        lines = f.readlines()
        
        # Skip header if present
        start_line = 1 if 'Day' in lines[0] else 0
        
        for line in lines[start_line:]:
            parts = line.strip().split('\t')
            if len(parts) >= 2:
                try:
                    inflammation = float(parts[1])
                    inflammation_data.append(inflammation)
                except ValueError:
                    continue
    
    return inflammation_data, metadata

def parse_text_format(filename):
    """Parse fixed-width text format."""
    inflammation_data = []
    metadata = {'filename': filename}
    
    with open(filename, 'r') as f:
        lines = f.readlines()
        
        # Skip header if present
        start_line = 1 if any(word in lines[0].lower() for word in ['day', 'inflammation']) else 0
        
        for line in lines[start_line:]:
            # Split by whitespace and try to find inflammation value
            parts = line.strip().split()
            if len(parts) >= 2:
                try:
                    inflammation = float(parts[1])  # Assume inflammation is second column
                    inflammation_data.append(inflammation)
                except ValueError:
                    continue
    
    return inflammation_data, metadata

def auto_detect_and_parse(filename):
    """Auto-detect format and parse accordingly."""
    with open(filename, 'r') as f:
        first_line = f.readline().strip()
    
    if first_line.startswith('{'):
        return parse_json_format(filename)
    elif '\t' in first_line:
        return parse_tsv_format(filename)
    elif ',' in first_line:
        return parse_csv_inflammation_file(filename)
    else:
        return parse_text_format(filename)

# Test universal parser
test_files = ['sample_data.json', 'sample_data.tsv', 'sample_data.txt']

print("Testing universal parser:")
print("=" * 40)

for filename in test_files:
    data, metadata = parse_any_format(filename)
    if data:
        avg = sum(data) / len(data) if data else 0
        print(f"✅ {filename:<18} | {len(data):2d} readings | avg: {avg:.2f}")
        print(f"   Data: {data}")
    else:
        print(f"❌ {filename:<18} | Failed to parse")
    print()

## Summary

In this episode, we learned:
- **File operations**: Reading and writing files in Python
- **File parsing**: Extracting data from structured files
- **Multiple files**: Processing batches of files automatically
- **Glob patterns**: Finding files using wildcards
- **Data organization**: Creating structured analysis workflows
- **Different formats**: Handling CSV, JSON, TSV, and text files
- **Error handling**: Robust file processing with error recovery
- **Automation**: Building complete analysis pipelines

Working with files is essential for real-world data analysis projects!