# pyCLIF Basic Usage

This notebook demonstrates the basic usage of the pyCLIF library for loading and working with CLIF (Critical Care Data Exchange Format) tables.

## Overview

The pyCLIF library provides two main ways to work with CLIF data:
1. **Main CLIF class** - Initialize once and load multiple tables
2. **Individual table classes** - Load specific tables independently

This notebook focuses on the main CLIF class approach.

## Setup and Imports

In [1]:
import sys
import os
import pandas as pd

# Import the pyCLIF library
from pyclif import CLIF

print(f"pyCLIF imported successfully!")
print(f"Python version: {sys.version}")
print(f"Pandas version: {pd.__version__}")

pyCLIF imported successfully!
Python version: 3.10.9 (main, Mar  1 2023, 12:20:14) [Clang 14.0.6 ]
Pandas version: 2.3.0


## Initialize CLIF Object

The CLIF class is the main entry point for working with CLIF data. It requires:
- `data_dir`: Path to your CLIF data directory
- `filetype`: Format of your data files ('csv' or 'parquet')
- `timezone`: Timezone for datetime conversion (e.g., 'US/Eastern', 'UTC')

In [6]:
# Set your data directory path - update this to your CLIF data location
DATA_DIR = "../src/pyclif/data/clif_demo/"

# Initialize CLIF object
clif = CLIF(
    data_dir=DATA_DIR,
    filetype='parquet',  # Your data is in parquet format
    timezone='US/Eastern'  # Your site timezone
)

print("CLIF object initialized successfully!")
print(f"Data directory: {clif.data_dir}")
print(f"File type: {clif.filetype}")
print(f"Timezone: {clif.timezone}")

CLIF Object Initialized.
CLIF object initialized successfully!
Data directory: ../src/pyclif/data/clif_demo/
File type: parquet
Timezone: US/Eastern


## Loading Tables

Use the `initialize()` method to load specific tables. You can load one or multiple tables at once.

### Available Tables:
- `patient` - Patient demographics and basic information
- `hospitalization` - Hospital admission details
- `vitals` - Vital signs measurements
- `labs` - Laboratory results
- `adt` - Admission, Discharge, Transfer events
- `respiratory_support` - Respiratory support data
- `medication_admin_continuous` - Continuous medication administration

### Load Patient Table

In [7]:
# Load just the patient table
clif.initialize(tables=['patient'])

print("Patient table loaded!")
print(f"Patient data shape: {clif.patient.df.shape}")
print(f"Patient columns: {list(clif.patient.df.columns)}")

Loading clif_patient.parquet
Data loaded successfully from clif_patient.parquet
Validation completed with 2 error(s). See `errors` attribute.
Patient table loaded!
Patient data shape: (100, 11)
Patient columns: ['patient_id', 'race_name', 'race_category', 'ethnicity_name', 'ethnicity_category', 'sex_name', 'sex_category', 'birth_date', 'death_dttm', 'language_name', 'language_category']


In [8]:
clif.patient.errors

[{'type': 'null_values', 'column': 'birth_date', 'count': 100},
 {'type': 'null_values', 'column': 'death_dttm', 'count': 85}]

In [9]:
# Display first few rows of patient data
print("First 5 patient records:")
clif.patient.df.head()

First 5 patient records:


Unnamed: 0,patient_id,race_name,race_category,ethnicity_name,ethnicity_category,sex_name,sex_category,birth_date,death_dttm,language_name,language_category
0,10002495,UNKNOWN,Unknown,UNKNOWN,Unknown,M,Male,NaT,NaT,ENGLISH,Unknown or NA
1,10012552,UNKNOWN,Unknown,UNKNOWN,Unknown,M,Male,NaT,NaT,ENGLISH,Unknown or NA
2,10015272,WHITE,White,WHITE,Non-Hispanic,F,Female,NaT,NaT,ENGLISH,Unknown or NA
3,10016810,UNKNOWN,Unknown,UNKNOWN,Unknown,F,Female,NaT,NaT,ENGLISH,Unknown or NA
4,10026406,WHITE,White,WHITE,Non-Hispanic,M,Male,NaT,NaT,ENGLISH,Unknown or NA


### Load Multiple Tables

In [10]:
# Load multiple tables at once
tables_to_load = ['patient', 'hospitalization', 'vitals']
clif.initialize(tables=tables_to_load)

print("Multiple tables loaded!")
print(f"Patient data: {clif.patient.df.shape if clif.patient else 'Not loaded'}")
print(f"Hospitalization data: {clif.hospitalization.df.shape if clif.hospitalization else 'Not loaded'}")
print(f"Vitals data: {clif.vitals.df.shape if clif.vitals else 'Not loaded'}")

Loading clif_patient.parquet
Data loaded successfully from clif_patient.parquet
Validation completed with 2 error(s). See `errors` attribute.
Loading clif_hospitalization.parquet
Data loaded successfully from clif_hospitalization.parquet
Validation completed successfully.
Loading clif_vitals.parquet
Data loaded successfully from clif_vitals.parquet
Validation completed with 5 error(s).
  - 5 range validation error(s)
See `errors` and `range_validation_errors` attributes for details.
Multiple tables loaded!
Patient data: (100, 11)
Hospitalization data: (275, 17)
Vitals data: (89085, 6)


## Data Validation

Each loaded table automatically validates against the CLIF schema specifications.

In [11]:
# Check validation status for patient table
if clif.patient:
    print(f"Patient table is valid: {clif.patient.isvalid()}")
    if not clif.patient.isvalid():
        print(f"Validation errors: {len(clif.patient.errors)}")
        for error in clif.patient.errors[:3]:  # Show first 3 errors
            print(f"  - {error}")

# Check validation for vitals table
if clif.vitals:
    print(f"\nVitals table is valid: {clif.vitals.isvalid()}")
    if not clif.vitals.isvalid():
        print(f"Schema validation errors: {len(clif.vitals.errors)}")
        print(f"Range validation errors: {len(clif.vitals.range_validation_errors)}")

Patient table is valid: False
Validation errors: 2
  - {'type': 'null_values', 'column': 'birth_date', 'count': 100}
  - {'type': 'null_values', 'column': 'death_dttm', 'count': 85}

Vitals table is valid: False
Schema validation errors: 0
Range validation errors: 5


## Basic Data Exploration

In [12]:
# Patient table summary
if clif.patient and clif.patient.df is not None:
    print("=== PATIENT TABLE SUMMARY ===")
    print(f"Total patients: {len(clif.patient.df)}")
    print(f"Unique patient IDs: {clif.patient.df['patient_id'].nunique() if 'patient_id' in clif.patient.df.columns else 'N/A'}")
    
    # Show column info
    print("\nColumn information:")
    print(clif.patient.df.info())

=== PATIENT TABLE SUMMARY ===
Total patients: 100
Unique patient IDs: 100

Column information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype              
---  ------              --------------  -----              
 0   patient_id          100 non-null    string             
 1   race_name           100 non-null    object             
 2   race_category       100 non-null    object             
 3   ethnicity_name      100 non-null    object             
 4   ethnicity_category  100 non-null    object             
 5   sex_name            100 non-null    object             
 6   sex_category        100 non-null    object             
 7   birth_date          0 non-null      datetime64[us]     
 8   death_dttm          15 non-null     datetime64[us, UTC]
 9   language_name       100 non-null    object             
 10  language_category   100 non-null    object             
dtypes: 

In [13]:
# Vitals table summary
if clif.vitals and clif.vitals.df is not None:
    print("=== VITALS TABLE SUMMARY ===")
    print(f"Total vital measurements: {len(clif.vitals.df)}")
    
    # Get vital categories
    vital_categories = clif.vitals.get_vital_categories()
    print(f"Vital categories available: {vital_categories}")
    
    # Get summary statistics
    summary_stats = clif.vitals.get_summary_stats()
    print(f"\nSummary statistics:")
    for key, value in summary_stats.items():
        if key != 'vital_value_stats':  # Skip detailed stats for now
            print(f"  {key}: {value}")

=== VITALS TABLE SUMMARY ===
Total vital measurements: 89085
Vital categories available: ['spo2', 'map', 'sbp', 'heart_rate', 'dbp', 'respiratory_rate', 'weight_kg', 'height_cm', 'temp_c']

Summary statistics:
  total_records: 89085
  unique_hospitalizations: 128
  vital_category_counts: {'map': 14368, 'sbp': 14356, 'dbp': 14351, 'heart_rate': 13913, 'respiratory_rate': 13913, 'spo2': 13540, 'temp_c': 3767, 'weight_kg': 806, 'height_cm': 71}
  date_range: {'earliest': Timestamp('2110-04-11 20:52:00+0000', tz='UTC'), 'latest': Timestamp('2201-12-13 23:00:00+0000', tz='UTC')}


## Timezone Handling

The CLIF library automatically handles timezone conversion for datetime columns when loading data.

In [14]:
# Check datetime columns and their timezones
if clif.vitals and clif.vitals.df is not None:
    datetime_cols = [col for col in clif.vitals.df.columns if 'dttm' in col]
    print(f"DateTime columns in vitals: {datetime_cols}")
    
    for col in datetime_cols:
        if col in clif.vitals.df.columns:
            sample_datetime = clif.vitals.df[col].dropna().iloc[0] if not clif.vitals.df[col].dropna().empty else None
            if sample_datetime is not None:
                print(f"  {col}: {sample_datetime} (timezone: {getattr(sample_datetime, 'tz', 'naive')})")

DateTime columns in vitals: ['recorded_dttm']
  recorded_dttm: 2137-08-25 14:00:00+00:00 (timezone: UTC)


## Next Steps

This notebook covered the basics of:
- Initializing the CLIF object
- Loading single and multiple tables
- Basic data validation
- Simple data exploration
- Timezone handling

### Explore Other Notebooks:
- `02_individual_tables.ipynb` - Working with individual table classes
- `03_data_validation.ipynb` - Advanced validation and error handling
- `04_vitals_analysis.ipynb` - Detailed vitals analysis
- `05_timezone_handling.ipynb` - Advanced timezone operations
- `06_data_filtering.ipynb` - Filtering and querying data