# Episode 8: Creating Functions

Functions are reusable blocks of code that make programs more organized, readable, and maintainable. In this notebook, we'll learn how to create and use functions for inflammation data analysis.

## Learning Objectives
- Define functions with `def`
- Understand parameters and arguments
- Use return values effectively
- Work with default parameters
- Understand variable scope
- Create modular, reusable code
- Apply functions to inflammation data analysis

## Introduction

Instead of copying and pasting code for each patient's analysis, we can create functions to encapsulate our logic and reuse it efficiently.

## 1. Defining Basic Functions

Start with simple functions:

In [None]:
# Simple function with no parameters
def greet_researcher():
    print("Welcome to the Inflammation Analysis Lab!")
    print("Ready to analyze patient data.")

# Call the function
greet_researcher()

In [None]:
# Function with parameters
def analyze_inflammation(patient_id, reading):
    print(f"Patient {patient_id}: Inflammation level = {reading:.2f}")
    
    if reading < 1.0:
        status = "Low"
    elif reading < 3.0:
        status = "Moderate"
    elif reading < 6.0:
        status = "High"
    else:
        status = "Critical"
    
    print(f"Status: {status}")
    return status

# Test the function
result1 = analyze_inflammation("P001", 2.5)
result2 = analyze_inflammation("P002", 7.2)
print(f"\nResults: {result1}, {result2}")

## 2. Functions with Return Values

Functions that calculate and return results:

In [None]:
# Calculate basic statistics
def calculate_average(data):
    """Calculate the average of a list of numbers."""
    if not data:  # Handle empty list
        return 0
    return sum(data) / len(data)

def calculate_max(data):
    """Find the maximum value in a list."""
    if not data:
        return None
    return max(data)

def calculate_min(data):
    """Find the minimum value in a list."""
    if not data:
        return None
    return min(data)

# Test with inflammation data
patient_readings = [0.0, 1.5, 3.2, 4.1, 2.8, 1.9, 0.8, 0.0]

avg = calculate_average(patient_readings)
maximum = calculate_max(patient_readings)
minimum = calculate_min(patient_readings)

print(f"Patient inflammation statistics:")
print(f"Average: {avg:.2f}")
print(f"Maximum: {maximum:.2f}")
print(f"Minimum: {minimum:.2f}")

In [None]:
# More complex function returning multiple values
def inflammation_summary(readings):
    """Calculate comprehensive inflammation statistics."""
    if not readings:
        return None, None, None, None, None
    
    avg = sum(readings) / len(readings)
    maximum = max(readings)
    minimum = min(readings)
    total_days = len(readings)
    
    # Count days above average
    above_avg_days = sum(1 for reading in readings if reading > avg)
    
    return avg, maximum, minimum, total_days, above_avg_days

# Use the function
stats = inflammation_summary(patient_readings)
avg, max_val, min_val, days, above_avg = stats

print(f"Comprehensive Analysis:")
print(f"  Average: {avg:.2f}")
print(f"  Range: {min_val:.2f} - {max_val:.2f}")
print(f"  Days monitored: {days}")
print(f"  Days above average: {above_avg}")

# Alternative unpacking
avg2, max2, min2, days2, above2 = inflammation_summary(patient_readings)
print(f"\nAlternative unpacking: avg={avg2:.2f}, max={max2:.2f}")

### Exercise 8.1
Create functions to:
1. Calculate the standard deviation of inflammation readings
2. Find the day with peak inflammation
3. Count consecutive days above a threshold

In [None]:
# Exercise 8.1 - Your code here
import math

def calculate_std_dev(data):
    """Calculate standard deviation."""
    # Your implementation here
    pass

def find_peak_day(data):
    """Find the day (index) with maximum inflammation."""
    # Your implementation here
    pass

def count_consecutive_above_threshold(data, threshold):
    """Count longest streak of consecutive days above threshold."""
    # Your implementation here
    pass

# Test your functions
test_data = [1.0, 2.5, 4.2, 5.1, 4.8, 3.2, 1.8, 0.5]
# Add your test calls here

## 3. Default Parameters

Functions with optional parameters:

In [None]:
# Function with default parameters
def categorize_inflammation(reading, low_threshold=1.0, high_threshold=4.0):
    """Categorize inflammation level with customizable thresholds."""
    if reading < low_threshold:
        return "Low"
    elif reading < high_threshold:
        return "Moderate"
    else:
        return "High"

# Test with different parameter combinations
readings = [0.5, 2.3, 5.1, 1.8]

print("Using default thresholds (1.0, 4.0):")
for reading in readings:
    category = categorize_inflammation(reading)
    print(f"  {reading:.1f} → {category}")

print("\nUsing custom thresholds (2.0, 6.0):")
for reading in readings:
    category = categorize_inflammation(reading, low_threshold=2.0, high_threshold=6.0)
    print(f"  {reading:.1f} → {category}")

print("\nUsing positional arguments (1.5, 3.5):")
for reading in readings:
    category = categorize_inflammation(reading, 1.5, 3.5)
    print(f"  {reading:.1f} → {category}")

In [None]:
# More complex function with multiple defaults
def generate_report(patient_data, patient_id="Unknown", include_daily=True, 
                   precision=2, show_warnings=True):
    """Generate a comprehensive patient report."""
    if not patient_data:
        return "No data available"
    
    # Calculate statistics
    avg = sum(patient_data) / len(patient_data)
    maximum = max(patient_data)
    minimum = min(patient_data)
    
    # Build report
    report = []
    report.append(f"Patient Report: {patient_id}")
    report.append("=" * (15 + len(patient_id)))
    report.append(f"Average inflammation: {avg:.{precision}f}")
    report.append(f"Range: {minimum:.{precision}f} - {maximum:.{precision}f}")
    
    if include_daily:
        report.append("\nDaily readings:")
        for day, reading in enumerate(patient_data, 1):
            report.append(f"  Day {day:2d}: {reading:.{precision}f}")
    
    if show_warnings and maximum > 5.0:
        report.append("\n⚠️  WARNING: High inflammation detected!")
    
    return "\n".join(report)

# Test the function with different options
sample_data = [1.2, 2.8, 4.5, 6.2, 3.1, 1.9, 0.8]

print("Default report:")
print(generate_report(sample_data, "P001"))

print("\n" + "="*50)
print("Minimal report:")
print(generate_report(sample_data, "P002", include_daily=False, 
                     show_warnings=False, precision=1))

## 4. Variable Scope

Understanding local vs global variables:

In [None]:
# Global variables
study_name = "Inflammation Research Study 2024"
total_patients = 0

def process_patient(patient_data):
    """Process a single patient's data."""
    global total_patients  # Modify global variable
    
    # Local variables
    patient_average = sum(patient_data) / len(patient_data)
    local_status = "Processed"
    
    total_patients += 1  # Modify global counter
    
    print(f"Processing patient #{total_patients} in {study_name}")
    print(f"  Average inflammation: {patient_average:.2f}")
    print(f"  Status: {local_status}")
    
    return patient_average

def show_study_summary():
    """Display study summary using global variables."""
    print(f"\n{study_name}")
    print(f"Total patients processed: {total_patients}")

# Test scope
patient1_data = [1.5, 2.3, 1.8, 2.1]
patient2_data = [2.8, 3.5, 4.1, 3.2]

avg1 = process_patient(patient1_data)
avg2 = process_patient(patient2_data)

show_study_summary()

# Try to access local variable (this will fail)
try:
    print(local_status)  # This variable doesn't exist outside the function
except NameError as e:
    print(f"\nError accessing local variable: {e}")

In [None]:
# Better approach: avoid global variables when possible
def create_study_processor():
    """Create a study processor with encapsulated state."""
    study_data = {
        'name': 'Inflammation Research Study 2024',
        'patients_processed': 0,
        'total_inflammation': 0.0,
        'patient_averages': []
    }
    
    def process_patient(patient_data, patient_id):
        avg = sum(patient_data) / len(patient_data)
        study_data['patients_processed'] += 1
        study_data['total_inflammation'] += sum(patient_data)
        study_data['patient_averages'].append(avg)
        
        print(f"{patient_id}: Average = {avg:.2f}")
        return avg
    
    def get_summary():
        if study_data['patients_processed'] == 0:
            return "No patients processed yet"
        
        overall_avg = sum(study_data['patient_averages']) / len(study_data['patient_averages'])
        return {
            'study_name': study_data['name'],
            'patients': study_data['patients_processed'],
            'overall_average': overall_avg,
            'patient_averages': study_data['patient_averages'].copy()
        }
    
    return process_patient, get_summary

# Use the encapsulated approach
processor, get_summary = create_study_processor()

# Process patients
processor([1.5, 2.3, 1.8], "P001")
processor([2.8, 3.5, 4.1], "P002")
processor([0.8, 1.2, 1.5], "P003")

# Get summary
summary = get_summary()
print(f"\nStudy Summary:")
print(f"  Patients: {summary['patients']}")
print(f"  Overall average: {summary['overall_average']:.2f}")
print(f"  Individual averages: {[f'{avg:.2f}' for avg in summary['patient_averages']]}")

## 5. Advanced Function Features

More sophisticated function patterns:

In [None]:
# Variable number of arguments
def analyze_multiple_patients(*patient_datasets):
    """Analyze multiple patients' data simultaneously."""
    if not patient_datasets:
        return "No patient data provided"
    
    results = []
    for i, data in enumerate(patient_datasets, 1):
        if data:  # Check if data is not empty
            avg = sum(data) / len(data)
            max_val = max(data)
            results.append({
                'patient': f'P{i:03d}',
                'average': avg,
                'maximum': max_val,
                'days': len(data)
            })
    
    return results

# Test with multiple patients
patient1 = [1.2, 2.3, 1.8, 2.5]
patient2 = [2.8, 3.5, 4.1, 3.2, 2.9]
patient3 = [0.8, 1.5, 2.1]

results = analyze_multiple_patients(patient1, patient2, patient3)

print("Multi-patient analysis:")
for result in results:
    print(f"{result['patient']}: avg={result['average']:.2f}, "
          f"max={result['maximum']:.2f}, days={result['days']}")

In [None]:
# Keyword arguments and validation
def advanced_inflammation_analysis(data, **kwargs):
    """Advanced analysis with configurable options."""
    # Default configuration
    config = {
        'smoothing': False,
        'outlier_detection': True,
        'outlier_threshold': 2.0,
        'trend_analysis': True,
        'report_format': 'detailed'
    }
    
    # Update with user options
    config.update(kwargs)
    
    # Validate data
    if not data or not isinstance(data, list):
        return "Invalid data: must be a non-empty list"
    
    if not all(isinstance(x, (int, float)) and x >= 0 for x in data):
        return "Invalid data: all values must be non-negative numbers"
    
    # Basic statistics
    avg = sum(data) / len(data)
    maximum = max(data)
    minimum = min(data)
    
    results = {
        'basic_stats': {
            'average': avg,
            'minimum': minimum,
            'maximum': maximum,
            'range': maximum - minimum
        }
    }
    
    # Optional: outlier detection
    if config['outlier_detection']:
        threshold = config['outlier_threshold']
        outliers = [x for x in data if abs(x - avg) > threshold * avg]
        results['outliers'] = outliers
    
    # Optional: trend analysis
    if config['trend_analysis'] and len(data) > 1:
        # Simple trend: compare first half vs second half
        mid = len(data) // 2
        first_half_avg = sum(data[:mid]) / mid if mid > 0 else 0
        second_half_avg = sum(data[mid:]) / (len(data) - mid)
        
        if second_half_avg > first_half_avg * 1.1:
            trend = "Increasing"
        elif second_half_avg < first_half_avg * 0.9:
            trend = "Decreasing"
        else:
            trend = "Stable"
        
        results['trend'] = {
            'direction': trend,
            'first_half_avg': first_half_avg,
            'second_half_avg': second_half_avg
        }
    
    return results

# Test advanced analysis
test_data = [1.0, 1.5, 2.2, 3.8, 4.2, 3.5, 2.8, 2.1, 1.8, 1.2]

# Basic analysis
result1 = advanced_inflammation_analysis(test_data)
print("Basic analysis:")
print(f"  Average: {result1['basic_stats']['average']:.2f}")
print(f"  Range: {result1['basic_stats']['range']:.2f}")
print(f"  Outliers: {result1.get('outliers', [])}")
print(f"  Trend: {result1.get('trend', {}).get('direction', 'N/A')}")

# Custom analysis
result2 = advanced_inflammation_analysis(
    test_data, 
    outlier_threshold=1.5, 
    trend_analysis=False
)
print("\nCustom analysis:")
print(f"  Outliers (stricter): {result2.get('outliers', [])}")
print(f"  Trend analysis: {'disabled' if 'trend' not in result2 else 'enabled'}")

### Exercise 8.2
Create a comprehensive function that:
1. Takes multiple parameters with defaults
2. Validates input data
3. Returns multiple types of analysis
4. Handles edge cases gracefully

In [None]:
# Exercise 8.2 - Your comprehensive function
def comprehensive_patient_analysis(patient_data, patient_id="Unknown", 
                                 normal_range=(0, 3), critical_threshold=5.0,
                                 include_predictions=False, **options):
    """
    Comprehensive analysis of patient inflammation data.
    
    Parameters:
    - patient_data: list of daily inflammation readings
    - patient_id: identifier for the patient
    - normal_range: tuple of (min, max) for normal inflammation
    - critical_threshold: level above which inflammation is critical
    - include_predictions: whether to include future trend predictions
    - **options: additional analysis options
    
    Returns:
    Dictionary with comprehensive analysis results
    """
    # Your implementation here
    pass

# Test your function with various scenarios

## 6. Function Documentation and Testing

Best practices for documenting and testing functions:

In [None]:
def calculate_inflammation_statistics(readings):
    """
    Calculate comprehensive statistics for inflammation readings.
    
    This function computes various statistical measures including central tendency,
    dispersion, and distribution characteristics of inflammation data.
    
    Parameters:
    ----------
    readings : list of float
        Daily inflammation readings. Must contain at least one value.
        All values should be non-negative.
    
    Returns:
    -------
    dict
        Dictionary containing:
        - 'mean': arithmetic mean of readings
        - 'median': middle value when sorted
        - 'std_dev': standard deviation
        - 'min': minimum value
        - 'max': maximum value
        - 'count': number of readings
        - 'sum': total of all readings
    
    Raises:
    ------
    ValueError
        If readings is empty or contains negative values
    TypeError
        If readings is not a list or contains non-numeric values
    
    Examples:
    --------
    >>> data = [1.0, 2.5, 3.0, 2.0, 1.5]
    >>> stats = calculate_inflammation_statistics(data)
    >>> print(f"Mean: {stats['mean']:.2f}")
    Mean: 2.00
    
    >>> calculate_inflammation_statistics([5.0])['mean']
    5.0
    """
    # Input validation
    if not isinstance(readings, list):
        raise TypeError("readings must be a list")
    
    if not readings:
        raise ValueError("readings cannot be empty")
    
    # Check for numeric values and non-negative
    for i, reading in enumerate(readings):
        if not isinstance(reading, (int, float)):
            raise TypeError(f"reading at index {i} must be numeric, got {type(reading).__name__}")
        if reading < 0:
            raise ValueError(f"reading at index {i} cannot be negative: {reading}")
    
    # Calculate statistics
    n = len(readings)
    total = sum(readings)
    mean = total / n
    
    # Median
    sorted_readings = sorted(readings)
    if n % 2 == 0:
        median = (sorted_readings[n//2 - 1] + sorted_readings[n//2]) / 2
    else:
        median = sorted_readings[n//2]
    
    # Standard deviation
    if n > 1:
        variance = sum((x - mean)**2 for x in readings) / (n - 1)
        std_dev = variance ** 0.5
    else:
        std_dev = 0.0
    
    return {
        'mean': mean,
        'median': median,
        'std_dev': std_dev,
        'min': min(readings),
        'max': max(readings),
        'count': n,
        'sum': total
    }

# Test the documented function
test_data = [1.0, 2.5, 3.0, 2.0, 1.5]
stats = calculate_inflammation_statistics(test_data)

print("Inflammation Statistics:")
for key, value in stats.items():
    print(f"  {key}: {value:.3f}")

# Test error handling
try:
    calculate_inflammation_statistics([])
except ValueError as e:
    print(f"\nCaught expected error: {e}")

try:
    calculate_inflammation_statistics([1.0, -2.0, 3.0])
except ValueError as e:
    print(f"Caught expected error: {e}")

## 7. Practical Application: Complete Analysis Pipeline

Combining functions into a complete analysis system:

In [None]:
# Complete inflammation analysis pipeline
def load_patient_data(patient_id):
    """Simulate loading patient data from a database."""
    # In real scenario, this would read from a file or database
    import random
    random.seed(hash(patient_id) % 1000)  # Reproducible "random" data
    
    days = random.randint(5, 15)
    data = []
    
    for day in range(days):
        # Simulate inflammation pattern
        base = 2.0 + 2.0 * abs(day - days//2) / (days//2)
        noise = random.uniform(-0.5, 0.5)
        reading = max(0.0, base + noise)
        data.append(round(reading, 2))
    
    return data

def validate_data(data, patient_id):
    """Validate patient data quality."""
    issues = []
    
    if not data:
        issues.append("No data available")
        return False, issues
    
    if len(data) < 3:
        issues.append(f"Insufficient data: only {len(data)} readings")
    
    negative_count = sum(1 for x in data if x < 0)
    if negative_count > 0:
        issues.append(f"{negative_count} negative readings found")
    
    extreme_count = sum(1 for x in data if x > 10)
    if extreme_count > 0:
        issues.append(f"{extreme_count} extremely high readings (>10)")
    
    return len(issues) == 0, issues

def classify_patient_status(stats):
    """Classify patient based on inflammation statistics."""
    avg = stats['mean']
    maximum = stats['max']
    std = stats['std_dev']
    
    if maximum > 8.0:
        return "Critical", "Immediate medical attention required"
    elif avg > 4.0:
        return "High Risk", "Close monitoring recommended"
    elif avg > 2.0 or std > 1.5:
        return "Moderate Risk", "Regular monitoring suggested"
    else:
        return "Low Risk", "Continue standard care"

def generate_treatment_recommendation(status, stats):
    """Generate treatment recommendations based on analysis."""
    recommendations = []
    
    if status == "Critical":
        recommendations.extend([
            "Immediate anti-inflammatory medication",
            "Daily monitoring for next 7 days",
            "Consider hospitalization if no improvement"
        ])
    elif status == "High Risk":
        recommendations.extend([
            "Prescribe moderate anti-inflammatory treatment",
            "Monitor every 2-3 days",
            "Lifestyle modifications recommended"
        ])
    elif status == "Moderate Risk":
        recommendations.extend([
            "Mild anti-inflammatory as needed",
            "Weekly monitoring",
            "Dietary and exercise recommendations"
        ])
    else:
        recommendations.extend([
            "Continue current management",
            "Monthly check-ups sufficient",
            "Maintain healthy lifestyle"
        ])
    
    # Add specific recommendations based on statistics
    if stats['std_dev'] > 2.0:
        recommendations.append("High variability detected - investigate triggers")
    
    if stats['max'] > stats['mean'] * 3:
        recommendations.append("Investigate cause of peak inflammation")
    
    return recommendations

def process_patient(patient_id):
    """Complete patient analysis pipeline."""
    print(f"Processing Patient {patient_id}")
    print("=" * 40)
    
    # Step 1: Load data
    try:
        data = load_patient_data(patient_id)
        print(f"✓ Data loaded: {len(data)} readings")
    except Exception as e:
        print(f"✗ Failed to load data: {e}")
        return None
    
    # Step 2: Validate data
    is_valid, issues = validate_data(data, patient_id)
    if not is_valid:
        print(f"✗ Data validation failed:")
        for issue in issues:
            print(f"  - {issue}")
        return None
    print("✓ Data validation passed")
    
    # Step 3: Calculate statistics
    try:
        stats = calculate_inflammation_statistics(data)
        print(f"✓ Statistics calculated")
    except Exception as e:
        print(f"✗ Statistical analysis failed: {e}")
        return None
    
    # Step 4: Classify patient
    status, description = classify_patient_status(stats)
    print(f"✓ Patient classified: {status}")
    
    # Step 5: Generate recommendations
    recommendations = generate_treatment_recommendation(status, stats)
    print(f"✓ Recommendations generated")
    
    # Compile results
    result = {
        'patient_id': patient_id,
        'data': data,
        'statistics': stats,
        'status': status,
        'description': description,
        'recommendations': recommendations
    }
    
    return result

def print_patient_report(result):
    """Print a formatted patient report."""
    if not result:
        print("No results to display")
        return
    
    print(f"\n📋 PATIENT REPORT: {result['patient_id']}")
    print("=" * 50)
    
    # Basic statistics
    stats = result['statistics']
    print(f"📊 Statistics ({stats['count']} readings):")
    print(f"   Average: {stats['mean']:.2f}")
    print(f"   Range: {stats['min']:.2f} - {stats['max']:.2f}")
    print(f"   Std Dev: {stats['std_dev']:.2f}")
    
    # Status
    status_emoji = {"Critical": "🚨", "High Risk": "⚠️", "Moderate Risk": "📋", "Low Risk": "✅"}
    emoji = status_emoji.get(result['status'], "📋")
    print(f"\n{emoji} Status: {result['status']}")
    print(f"   {result['description']}")
    
    # Recommendations
    print(f"\n💡 Recommendations:")
    for i, rec in enumerate(result['recommendations'], 1):
        print(f"   {i}. {rec}")
    
    print("\n" + "=" * 50)

# Test the complete pipeline
patients = ["P001", "P042", "P123"]

for patient_id in patients:
    result = process_patient(patient_id)
    print_patient_report(result)
    print()

### Final Exercise 8.3
Design and implement a complete function-based system for:
1. Batch processing multiple patients
2. Generating summary statistics across patients
3. Identifying patients needing immediate attention
4. Creating different report formats (brief, detailed, summary)

In [None]:
# Final Exercise 8.3 - Your complete system
def batch_process_patients(patient_list, report_format="brief"):
    """Process multiple patients and generate batch reports."""
    # Your implementation here
    pass

def generate_study_summary(processed_results):
    """Generate summary statistics across all patients."""
    # Your implementation here
    pass

def identify_priority_patients(processed_results, criteria="status"):
    """Identify patients needing immediate attention."""
    # Your implementation here
    pass

# Test your system
test_patients = ["P001", "P002", "P003", "P004", "P005"]
# Add your test code here

## Summary

In this episode, we learned:
- **Function definition**: Using `def` to create reusable code blocks
- **Parameters and arguments**: Passing data to functions
- **Return values**: Getting results back from functions
- **Default parameters**: Making functions more flexible
- **Variable scope**: Understanding local vs global variables
- **Advanced features**: `*args`, `**kwargs`, and validation
- **Documentation**: Writing clear docstrings and handling errors
- **Practical application**: Building complete analysis pipelines

Functions are essential for creating maintainable, reusable, and organized code!