# 🔧 CLIF Configuration Loading Demo

This notebook demonstrates the **three ways** to load CLIF data using the new configuration file feature:

1. **Direct Parameters** - Traditional explicit parameter method (backward compatible)
2. **Config File Path** - Specify path to a configuration JSON file
3. **Auto-detect** - Automatically find `config.json` in the current directory

The configuration feature makes it easier to:
- 🎯 Maintain consistent settings across projects
- 🔄 Switch between different environments (dev/prod)
- 📝 Reduce boilerplate code
- ⚙️ Share configurations with team members

## 📋 Setup & Imports

In [None]:
# Environment setup
import sys
import json
from pathlib import Path
import pandas as pd

# Add repository root to sys.path for importing clifpy
project_root = Path().cwd().parent  # assuming we're in examples/ directory
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

# Import CLIF classes
from clifpy import Patient, Hospitalization, Labs, Vitals
from clifpy import ClifOrchestrator
from clifpy.utils.config import create_example_config, load_config

print("✅ Imports successful!")
print(f"📁 Project root: {project_root}")

In [None]:
# Helper function to show configuration details
def show_config_info(table_obj, method_name):
    """Display configuration details for a loaded table."""
    print(f"\n🔍 {method_name} Results:")
    print(f"   📊 Rows loaded: {len(table_obj.df)}")
    print(f"   📂 Data directory: {table_obj.data_directory}")
    print(f"   📄 File type: {table_obj.filetype}")
    print(f"   🕐 Timezone: {table_obj.timezone}")
    print(f"   📤 Output directory: {table_obj.output_directory}")
    return table_obj.get_summary()

## 📋 Check Available Configuration

Let's see what configuration files are available in our project.

In [None]:
# Check if config.json exists in project root
config_path = project_root / "config.json"
print(f"🔍 Looking for config at: {config_path}")

if config_path.exists():
    print("✅ Config file found!")
    config = load_config(str(config_path))
    print("📋 Current configuration:")
    for key, value in config.items():
        print(f"   {key}: {value}")
else:
    print("❌ No config file found. We'll create examples below.")

---
# Method 1: 📝 Direct Parameters (Traditional)

This is the **original way** of loading CLIF data - by providing all parameters explicitly. This method remains **fully backward compatible**.

In [None]:
# Method 1: Load Patient table with explicit parameters
print("🔄 Loading Patient table with direct parameters...")

patient_direct = Patient.from_file(
    data_directory="../clifpy/data/clif_demo",
    filetype="parquet",
    timezone="US/Eastern",
    sample_size=5
)

summary1 = show_config_info(patient_direct, "Method 1 - Direct Parameters")
print(f"\n📊 Sample of data:")
patient_direct.df.head(2)

In [None]:
# Load another table with different parameters
print("🔄 Loading Hospitalization table with different timezone...")

hosp_direct = Hospitalization.from_file(
    data_directory="../clifpy/data/clif_demo",
    filetype="parquet",
    timezone="UTC",  # Different timezone
    sample_size=3
)

show_config_info(hosp_direct, "Hospitalization - Direct Parameters")

---
# Method 2: 📄 Config File Path

Specify the path to a **configuration JSON file**. This method allows you to:
- Store configuration separately from code
- Use different configs for different environments
- Override config values with parameters when needed

In [None]:
# Create a custom configuration file for demo
custom_config_path = "./demo_config.json"

create_example_config(
    data_directory="../clifpy/data/clif_demo",
    filetype="parquet",
    timezone="US/Central",
    output_directory="./custom_output",
    config_path=custom_config_path
)

print(f"✅ Created custom config at: {custom_config_path}")

# Display the config contents
with open(custom_config_path, 'r') as f:
    config_contents = json.load(f)
print("📋 Custom config contents:")
for key, value in config_contents.items():
    print(f"   {key}: {value}")

In [None]:
# Method 2: Load Patient table using config file path
print("🔄 Loading Patient table from custom config file...")

patient_config = Patient.from_file(
    config_path=custom_config_path,
    sample_size=4
)

show_config_info(patient_config, "Method 2 - Config File Path")

In [None]:
# Method 2b: Use config file BUT override specific parameters
print("🔄 Loading Labs table with config + parameter overrides...")

labs_override = Labs.from_file(
    config_path=custom_config_path,
    timezone="UTC",  # Override timezone from config
    sample_size=6,
    columns=[ "hospitalization_id", "lab_category", "lab_value_numeric"]  # Select specific columns
)

show_config_info(labs_override, "Method 2b - Config + Overrides")
print(f"\n📊 Selected columns: {list(labs_override.df.columns)}")

---
# Method 3: 🔍 Auto-detect Config

The **simplest method** - automatically finds `config.json` in the current working directory. No parameters needed!

In [None]:
# Method 3: Load Vitals table with auto-detection
print("🔄 Loading Vitals table with auto-detected config...")
print("   (This will look for 'config.json' in the project root)")

try:
    vitals_auto = Vitals.from_file(
        sample_size=5
    )
    
    show_config_info(vitals_auto, "Method 3 - Auto-detect")
    
except Exception as e:
    print(f"⚠️  Auto-detection failed: {e}")
    print("   This might happen if config.json doesn't exist in project root.")

In [None]:
# Method 3b: Auto-detect with additional parameters
print("🔄 Loading Patient table with auto-detect + filters...")

try:
    patient_auto_filtered = Patient.from_file(
        sample_size=10,
        filters={"gender": "M"}  # Only male patients
    )
    
    show_config_info(patient_auto_filtered, "Method 3b - Auto-detect + Filters")
    print(f"\n📊 Gender distribution:")
    print(patient_auto_filtered.df['gender'].value_counts())
    
except Exception as e:
    print(f"⚠️  Loading failed: {e}")

---
# 🎯 ClifOrchestrator with Configuration

The `ClifOrchestrator` also supports all three configuration methods and provides convenient ways to manage multiple tables.

In [None]:
# Orchestrator Method 1: Using config_path parameter
print("🔄 Creating ClifOrchestrator with config path...")

orch_config = ClifOrchestrator(config_path=custom_config_path)
print(f"✅ Orchestrator created with:")
print(f"   📂 Data directory: {orch_config.data_directory}")
print(f"   📄 File type: {orch_config.filetype}")
print(f"   🕐 Timezone: {orch_config.timezone}")

In [None]:
# Orchestrator Method 2: Using from_config() class method
print("🔄 Creating ClifOrchestrator with from_config() method...")

orch_from_config = ClifOrchestrator.from_config(custom_config_path)
print(f"✅ Orchestrator created successfully!")

In [None]:
# Orchestrator Method 3: Auto-detect configuration
print("🔄 Creating ClifOrchestrator with auto-detect...")

try:
    orch_auto = ClifOrchestrator()  # Will look for config.json
    print(f"✅ Orchestrator auto-detected config successfully!")
    print(f"   📂 Data directory: {orch_auto.data_directory}")
    
except Exception as e:
    print(f"⚠️  Auto-detection failed: {e}")
    print("   Using the config path method instead...")
    orch_auto = orch_config  # Fallback to previous orchestrator

In [None]:
# Load multiple tables through the orchestrator
print("🔄 Loading multiple tables through orchestrator...")

# Load patient table
orch_config.load_table('patient', sample_size=3)
print(f"✅ Patient table: {len(orch_config.patient.df)} rows")

# Load hospitalization table with filters
orch_config.load_table('hospitalization', 
                      sample_size=5)
print(f"✅ Hospitalization table: {len(orch_config.hospitalization.df)} rows")

# Load labs table with specific columns
orch_config.load_table('labs', 
                      sample_size=10,
                      columns=['lab_category', 'lab_value_numeric'])
print(f"✅ Labs table: {len(orch_config.labs.df)} rows, {len(orch_config.labs.df.columns)} columns")

---
# 🚀 Advanced Usage Examples

Here are some advanced patterns for using the configuration system effectively.

In [None]:
# Advanced: Different configs for different environments
print("🔄 Creating environment-specific configurations...")

# Development config
create_example_config(
    data_directory="../clifpy/data/clif_demo",
    filetype="parquet",
    timezone="US/Eastern",
    output_directory="./dev_output",
    config_path="./config_dev.json"
)

# Production config (different paths, timezone)
create_example_config(
    data_directory="/prod/data/clif",  # Different path for production
    filetype="parquet",
    timezone="UTC",  # UTC for production
    output_directory="/prod/output",
    config_path="./config_prod.json"
)

print("✅ Created config_dev.json and config_prod.json")
print("   Switch between environments by changing the config_path parameter!")

In [None]:
# Advanced: Configuration with complex filters and column selection
print("🔄 Advanced usage: Config + complex parameters...")

try:
    # Load labs data with complex filtering
    labs_advanced = Labs.from_file(
        config_path="./config_dev.json",
        sample_size=20,
        columns=[
            "patient_id", 
            "hospitalization_id", 
            "lab_category", 
            "lab_value_numeric",
            "lab_collection_dttm",
            "lab_units"
        ],
        filters={
            "lab_category": ["chemistry", "hematology"]  # Multiple categories
        }
    )
    
    print(f"✅ Advanced labs loading successful!")
    print(f"   📊 Rows: {len(labs_advanced.df)}")
    print(f"   📋 Categories: {labs_advanced.df['lab_category'].unique()}")
    print(f"   📅 Date range: {labs_advanced.df['lab_collection_dttm'].min()} to {labs_advanced.df['lab_collection_dttm'].max()}")
    
except Exception as e:
    print(f"⚠️  Advanced loading failed: {e}")
    print("   This might happen if the data directory in config doesn't exist.")

---
# ⚠️ Error Handling & Troubleshooting

Let's demonstrate what happens when things go wrong and how to handle errors gracefully.

In [None]:
# Error handling: Missing config file
print("🔍 Testing error handling for missing config file...")

try:
    patient_missing_config = Patient.from_file(config_path="./nonexistent_config.json")
except FileNotFoundError as e:
    print(f"✅ Correctly caught FileNotFoundError:")
    print(f"   {str(e)[:100]}...")

In [None]:
# Error handling: Incomplete parameters without config
print("🔍 Testing error handling for incomplete parameters...")

try:
    # Move to a directory without config.json to test incomplete params
    patient_incomplete = Patient.from_file(
        data_directory="../clifpy/data/clif_demo",
        # Missing filetype and timezone
    )
except (ValueError, FileNotFoundError) as e:
    print(f"✅ Correctly caught error for incomplete parameters:")
    print(f"   {str(e)[:150]}...")

In [None]:
# Create an invalid config file to test JSON error handling
print("🔍 Testing error handling for invalid JSON...")

# Create invalid JSON file
with open("./invalid_config.json", "w") as f:
    f.write('{ "data_directory": "./data", "filetype": "parquet" missing_comma "timezone": "UTC" }')

try:
    invalid_config = load_config("./invalid_config.json")
except json.JSONDecodeError as e:
    print(f"✅ Correctly caught JSONDecodeError:")
    print(f"   {str(e)[:100]}...")
finally:
    # Clean up
    import os
    if os.path.exists("./invalid_config.json"):
        os.remove("./invalid_config.json")

---
# 📝 Best Practices & Tips

## ✅ Configuration Best Practices

1. **Version Control**: Add your config files to version control, but consider using environment-specific configs for sensitive paths

2. **Environment Separation**: Use different config files for development, staging, and production:
   ```python
   # Development
   orch = ClifOrchestrator(config_path="./config_dev.json")
   
   # Production  
   orch = ClifOrchestrator(config_path="./config_prod.json")
   ```

3. **Parameter Overrides**: Use config for defaults, override with parameters for specific needs:
   ```python
   # Use config defaults but override timezone
   table = Patient.from_file(config_path="./config.json", timezone="UTC")
   ```

4. **Auto-detection**: Place `config.json` in your project root for the simplest experience:
   ```python
   # No parameters needed!
   table = Patient.from_file()
   ```

## 🔄 Migration from Old Code

**Your existing code continues to work unchanged!** Gradually migrate by:

1. Create a `config.json` with your common settings
2. Remove redundant parameters from your code
3. Use parameter overrides only when needed

In [None]:
# Cleanup: Remove temporary config files created during demo
import os

files_to_cleanup = [
    "./demo_config.json",
    "./config_dev.json", 
    "./config_prod.json"
]

print("🧹 Cleaning up temporary config files...")
for file_path in files_to_cleanup:
    if os.path.exists(file_path):
        os.remove(file_path)
        print(f"   ✅ Removed {file_path}")

print("\n🎉 Demo completed successfully!")
print("\n📚 Summary of the 3 loading methods:")
print("   1️⃣  Direct Parameters: Patient.from_file(data_directory='...', filetype='...', timezone='...')")
print("   2️⃣  Config File Path:  Patient.from_file(config_path='./my_config.json')")
print("   3️⃣  Auto-detect:       Patient.from_file()  # Finds config.json automatically")
print("\n💡 Mix and match these methods based on your needs!")