# pyCLIF Basic Usage

This notebook demonstrates the basic usage of the pyCLIF library for loading and working with CLIF (Critical Care Data Exchange Format) tables.

## Overview

The pyCLIF library provides two main ways to work with CLIF data:
1. **Main CLIF class** - Initialize once and load multiple tables
2. **Individual table classes** - Load specific tables independently

This notebook focuses on the main CLIF class approach.

## Setup and Imports

In [None]:
import sys
import os
import pandas as pd

# Import the pyCLIF library
from pyclif import CLIF

print(f"pyCLIF imported successfully!")
print(f"Python version: {sys.version}")
print(f"Pandas version: {pd.__version__}")

## Initialize CLIF Object

The CLIF class is the main entry point for working with CLIF data. It requires:
- `data_dir`: Path to your CLIF data directory
- `filetype`: Format of your data files ('csv' or 'parquet')
- `timezone`: Timezone for datetime conversion (e.g., 'US/Eastern', 'UTC')

In [None]:
# Set your data directory path - update this to your CLIF data location
DATA_DIR = "/Users/vaishvik/downloads/CLIF_MIMIC"

# Initialize CLIF object
clif = CLIF(
    data_dir=DATA_DIR,
    filetype='parquet',  # Your data is in parquet format
    timezone='US/Eastern'  # Your site timezone
)

print("CLIF object initialized successfully!")
print(f"Data directory: {clif.data_dir}")
print(f"File type: {clif.filetype}")
print(f"Timezone: {clif.timezone}")

## Loading Tables

Use the `initialize()` method to load specific tables. You can load one or multiple tables at once.

### Available Tables:
- `patient` - Patient demographics and basic information
- `hospitalization` - Hospital admission details
- `vitals` - Vital signs measurements
- `labs` - Laboratory results
- `adt` - Admission, Discharge, Transfer events
- `respiratory_support` - Respiratory support data
- `medication_admin_continuous` - Continuous medication administration

### Load Patient Table

In [None]:
# Load just the patient table
clif.initialize(tables=['patient'])

print("Patient table loaded!")
print(f"Patient data shape: {clif.patient.df.shape}")
print(f"Patient columns: {list(clif.patient.df.columns)}")

In [None]:
clif.patient.errors

In [None]:
# Display first few rows of patient data
print("First 5 patient records:")
clif.patient.df.head()

### Load Multiple Tables

In [None]:
# Load multiple tables at once
tables_to_load = ['patient', 'hospitalization', 'vitals']
clif.initialize(tables=tables_to_load)

print("Multiple tables loaded!")
print(f"Patient data: {clif.patient.df.shape if clif.patient else 'Not loaded'}")
print(f"Hospitalization data: {clif.hospitalization.df.shape if clif.hospitalization else 'Not loaded'}")
print(f"Vitals data: {clif.vitals.df.shape if clif.vitals else 'Not loaded'}")

## Data Validation

Each loaded table automatically validates against the CLIF schema specifications.

In [None]:
# Check validation status for patient table
if clif.patient:
    print(f"Patient table is valid: {clif.patient.isvalid()}")
    if not clif.patient.isvalid():
        print(f"Validation errors: {len(clif.patient.errors)}")
        for error in clif.patient.errors[:3]:  # Show first 3 errors
            print(f"  - {error}")

# Check validation for vitals table
if clif.vitals:
    print(f"\nVitals table is valid: {clif.vitals.isvalid()}")
    if not clif.vitals.isvalid():
        print(f"Schema validation errors: {len(clif.vitals.errors)}")
        print(f"Range validation errors: {len(clif.vitals.range_validation_errors)}")

## Basic Data Exploration

In [None]:
# Patient table summary
if clif.patient and clif.patient.df is not None:
    print("=== PATIENT TABLE SUMMARY ===")
    print(f"Total patients: {len(clif.patient.df)}")
    print(f"Unique patient IDs: {clif.patient.df['patient_id'].nunique() if 'patient_id' in clif.patient.df.columns else 'N/A'}")
    
    # Show column info
    print("\nColumn information:")
    print(clif.patient.df.info())

In [None]:
# Vitals table summary
if clif.vitals and clif.vitals.df is not None:
    print("=== VITALS TABLE SUMMARY ===")
    print(f"Total vital measurements: {len(clif.vitals.df)}")
    
    # Get vital categories
    vital_categories = clif.vitals.get_vital_categories()
    print(f"Vital categories available: {vital_categories}")
    
    # Get summary statistics
    summary_stats = clif.vitals.get_summary_stats()
    print(f"\nSummary statistics:")
    for key, value in summary_stats.items():
        if key != 'vital_value_stats':  # Skip detailed stats for now
            print(f"  {key}: {value}")

## Timezone Handling

The CLIF library automatically handles timezone conversion for datetime columns when loading data.

In [None]:
# Check datetime columns and their timezones
if clif.vitals and clif.vitals.df is not None:
    datetime_cols = [col for col in clif.vitals.df.columns if 'dttm' in col]
    print(f"DateTime columns in vitals: {datetime_cols}")
    
    for col in datetime_cols:
        if col in clif.vitals.df.columns:
            sample_datetime = clif.vitals.df[col].dropna().iloc[0] if not clif.vitals.df[col].dropna().empty else None
            if sample_datetime is not None:
                print(f"  {col}: {sample_datetime} (timezone: {getattr(sample_datetime, 'tz', 'naive')})")

## Next Steps

This notebook covered the basics of:
- Initializing the CLIF object
- Loading single and multiple tables
- Basic data validation
- Simple data exploration
- Timezone handling

### Explore Other Notebooks:
- `02_individual_tables.ipynb` - Working with individual table classes
- `03_data_validation.ipynb` - Advanced validation and error handling
- `04_vitals_analysis.ipynb` - Detailed vitals analysis
- `05_timezone_handling.ipynb` - Advanced timezone operations
- `06_data_filtering.ipynb` - Filtering and querying data