# pyCLIF Demo Data - Getting Started

Welcome to pyCLIF! This notebook demonstrates how to use the built-in demo datasets to quickly get started with CLIF.

## What is pyCLIF?

pyCLIF is a Python library designed to work with CLIF format healthcare data. It provides:
- **Structured data loading** for ICU longitudinal datasets
- **Data validation** against CLIF 2.0.0 specifications
- **Easy data manipulation** and analysis tools
- **Wide dataset creation** for machine learning workflows

## Demo Data Overview

The CLIF Demo data is constructed using [MIMIC-IV Clinical Database Demo](https://physionet.org/content/mimic-iv-demo/2.2/), which is an openly-available demo of MIMIC-IV containing a subset of 100 patients

The demo datasets included with pyCLIF contain **anonymized synthetic ICU data** representing a small cohort of patients. This data follows the CLIF 2.0.0 standard and includes all major table types:

- **Patient** - Demographics and static patient information
- **Hospitalization** - Hospital encounter details
- **Labs** - Laboratory results (blood work, chemistry panels, etc.)
- **Vitals** - Vital signs (heart rate, blood pressure, temperature, etc.)
- **Respiratory Support** - Ventilator settings and measurements
- **Position** - Patient positioning (especially prone positioning)
- **ADT** - Admission, Discharge, Transfer events
- **Medication Admin Continuous** - Continuous medication infusions
- **Patient Assessments** - Clinical assessments (GCS, sedation scales, etc.)


## Quick Start Examples

In [1]:
import sys
import os
import pandas as pd

# Import the pyCLIF library
from pyclif import CLIF
from pyclif.data import load_demo_clif

## Explore Demo Dataset

In [2]:
from pyclif.data import get_demo_summary

# Print a nice summary of all demo datasets
get_demo_summary()

🏥 pyCLIF Demo Datasets Summary
patient                        |    100 rows | 11 cols |   8.3 KB
hospitalization                |    275 rows | 17 cols |  19.8 KB
labs                           | 43,419 rows | 14 cols | 399.3 KB
vitals                         | 89,085 rows |  6 cols | 474.9 KB
respiratory_support            |  3,232 rows | 25 cols |  68.8 KB
position                       |  4,742 rows |  4 cols |  46.2 KB
adt                            |  1,136 rows |  8 cols |  29.7 KB
medication_admin_continuous    |  6,810 rows | 12 cols | 121.2 KB
patient_assessments            | 30,803 rows |  8 cols | 119.8 KB
Total records                  | 179,602 rows

📖 Usage examples:
  from pyclif.data import load_demo_clif, load_demo_patient
  clif_demo = load_demo_clif()  # Load all tables
  patient_data = load_demo_patient()  # Load single table
  raw_df = load_demo_labs(return_raw=True)  # Get raw DataFrame


## Load Complete CLIF object with all demo tables

In [3]:
# Load complete CLIF object with all demo tables
clif_demo = load_demo_clif()

# Access individual tables
print(f"📊 Demo dataset contains:")
print(f"   Patients: {len(clif_demo.patient.df)} unique patients")

📊 Demo dataset loaded successfully!
   Tables: patient, hospitalization, labs, vitals, respiratory_support, position, adt, medication_admin_continuous, patient_assessments
   Patients: 100 unique patients
📊 Demo dataset contains:
   Patients: 100 unique patients


## Load Individual Tables

In [4]:
from pyclif.data import load_demo_patient, load_demo_labs

# Load specific table objects
patient_data = load_demo_patient()
labs_data = load_demo_labs()

# Or get raw DataFrames for direct analysis
patient_df = load_demo_patient(return_raw=True)
labs_df = load_demo_labs(return_raw=True)

Validation completed with 2 error(s). See `errors` attribute.
Validation completed with 19 error(s).
  - 3 schema validation error(s)
  - 16 reference unit error(s)
See `errors` and `unit_validation_errors` attributes for details.


In [None]:
# # Example 1: Load complete CLIF demo data
# from pyclif.data import load_demo_clif
# clif_demo = load_demo_clif()
# print(f"Demo patients: {len(clif_demo.patient.df)}")

# # Example 2: Load specific tables only
# clif_subset = load_demo_clif(tables=['patient', 'labs', 'vitals'])

# # Example 3: Load individual table objects
# from pyclif.data import load_demo_patient, load_demo_labs
# patient_data = load_demo_patient()
# labs_data = load_demo_labs()

# # Example 4: Get raw DataFrames
# patient_df = load_demo_patient(return_raw=True)
# labs_df = load_demo_labs(return_raw=True)

# # Example 5: Explore available datasets
# from pyclif.data import get_demo_summary, list_demo_datasets
# get_demo_summary()  # Pretty print summary
# datasets_info = list_demo_datasets()

# # Example 6: Quick analysis
# from pyclif.data import load_demo_clif
# demo = load_demo_clif()
# demo.create_wide_dataset()  # Test wide dataset creation

## 🔍 Why Use Demo Data?

**Perfect for:**
- 🎓 **Learning pyCLIF** - Understand the API without needing real data
- 🧪 **Testing workflows** - Validate your analysis pipelines
- 📚 **Documentation examples** - Reproducible examples in tutorials
- 🚀 **Quick prototyping** - Test ideas before working with full datasets
- 🎯 **Method development** - Develop new analysis methods

**Demo data characteristics:**
- ✅ **Realistic structure** - Follows actual CLIF data patterns
- ✅ **Complete coverage** - All major CLIF tables represented
- ✅ **Validated format** - Passes all CLIF 2.0.0 specifications
- ✅ **Synthetic & safe** - No real patient information
- ✅ **Lightweight** - Only ~1.3MB total, loads instantly