# MIMIC Patient Selection and Community Care Integration (Option C)

This notebook implements **Option C: Concurrent Care** - simulating veterans who receive care from both VA and community providers simultaneously.

## Concurrent Care Approach (Option C)

**Clinical Rationale**: Most realistic scenario for veterans who:
- Are dual-eligible for Medicare (age 65+) and use both VA and private providers
- Live in rural areas and use VA MISSION Act community care
- Seek emergency care at non-VA facilities

**Data Model**: VA and community medications active in overlapping timeframes (2025)
- **VA medications**: Throughout 2025 (from CDWWork database)
- **Community medications**: Throughout 2025 (from MIMIC-IV, dates shifted)
- **Result**: Concurrent polypharmacy with "Blind Spot" DDI scenarios

## Process

1. Load MIMIC-IV Parquet data from v1_raw/mimic/
2. Select 10 VA patients for concurrent community care
3. Map to MIMIC patients with similar demographics
4. Transform MIMIC dates to 2025 with temporal overlap
5. Create community care medication records (Sta3n=999)
6. Merge with existing VA medications
7. Validate concurrent care overlap
8. Write combined dataset to v1_raw/medications/

**Source**: `med-data/v1_raw/mimic/*.parquet`  
**Destination**: `med-data/v1_raw/medications/medications_combined.parquet` (updated)

In [1]:
# Import dependencies

import logging
import time
import numpy as np
import pandas as pd
import s3fs
from datetime import datetime, timedelta, date
from config import *

In [2]:
# Set up logging

# Clear any existing handlers to avoid duplicate logs
for handler in logging.root.handlers[:]:
    logging.root.removeHandler(handler)

# Configure logging with timestamp and level
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(message)s"
)

logging.info("=" * 80)
logging.info("MIMIC PATIENT SELECTION - OPTION C: CONCURRENT CARE")
logging.info("=" * 80)
logging.info("Approach: Veterans receiving care from both VA and community simultaneously")
logging.info("Timeline: VA and community medications both active throughout 2025")

2025-12-02 17:26:54,652 INFO MIMIC PATIENT SELECTION - OPTION C: CONCURRENT CARE
2025-12-02 17:26:54,653 INFO Approach: Veterans receiving care from both VA and community simultaneously
2025-12-02 17:26:54,653 INFO Timeline: VA and community medications both active throughout 2025


In [3]:
# Create S3FileSystem for MinIO access

logging.info(f"\nInitializing S3FileSystem for MinIO at {MINIO_ENDPOINT}")
s3 = s3fs.S3FileSystem(
    anon=False,
    key=MINIO_ACCESS_KEY,
    secret=MINIO_SECRET_KEY,
    client_kwargs={
        'endpoint_url': f"http://{MINIO_ENDPOINT}"
    }
)
logging.info("S3FileSystem created successfully")

2025-12-02 17:27:05,069 INFO 
Initializing S3FileSystem for MinIO at localhost:9000
2025-12-02 17:27:05,071 INFO S3FileSystem created successfully


## Load MIMIC-IV Data from v1_raw/mimic/

Load MIMIC Parquet files previously created by 01d_dataprep_mimic.ipynb.

In [4]:
# Load MIMIC Parquet files from MinIO

logging.info("\nLoading MIMIC data from v1_raw/mimic/...")
start_time = time.time()

# Load prescriptions (medication orders)
with s3.open(f's3://{DEST_BUCKET}/{V1_RAW_MIMIC_PREFIX}prescriptions.parquet', 'rb') as f:
    df_mimic_rx = pd.read_parquet(f)
logging.info(f"  ‚úì Loaded {len(df_mimic_rx):,} prescriptions")

# Load patients (demographics)
with s3.open(f's3://{DEST_BUCKET}/{V1_RAW_MIMIC_PREFIX}patients.parquet', 'rb') as f:
    df_mimic_patients = pd.read_parquet(f)
logging.info(f"  ‚úì Loaded {len(df_mimic_patients):,} patients")

# Load admissions (hospital stays)
with s3.open(f's3://{DEST_BUCKET}/{V1_RAW_MIMIC_PREFIX}admissions.parquet', 'rb') as f:
    df_mimic_admissions = pd.read_parquet(f)
logging.info(f"  ‚úì Loaded {len(df_mimic_admissions):,} admissions")

elapsed = time.time() - start_time
logging.info(f"MIMIC data loaded in {elapsed:.2f}s")

# Display summary
unique_patients = df_mimic_rx['subject_id'].nunique()
logging.info(f"\nMIMIC Dataset Summary:")
logging.info(f"  Unique patients with prescriptions: {unique_patients}")
logging.info(f"  Total prescriptions: {len(df_mimic_rx):,}")
logging.info(f"  Avg prescriptions per patient: {len(df_mimic_rx) / unique_patients:.1f}")

2025-12-02 17:27:32,040 INFO 
Loading MIMIC data from v1_raw/mimic/...
2025-12-02 17:27:32,161 INFO   ‚úì Loaded 18,087 prescriptions
2025-12-02 17:27:32,166 INFO   ‚úì Loaded 100 patients
2025-12-02 17:27:32,173 INFO   ‚úì Loaded 275 admissions
2025-12-02 17:27:32,173 INFO MIMIC data loaded in 0.13s
2025-12-02 17:27:32,174 INFO 
MIMIC Dataset Summary:
2025-12-02 17:27:32,174 INFO   Unique patients with prescriptions: 100
2025-12-02 17:27:32,174 INFO   Total prescriptions: 18,087
2025-12-02 17:27:32,174 INFO   Avg prescriptions per patient: 180.9


In [5]:
# Display sample MIMIC prescription data

print("\n" + "=" * 80)
print("SAMPLE MIMIC PRESCRIPTION DATA")
print("=" * 80)
display(df_mimic_rx[['subject_id', 'drug', 'starttime', 'stoptime', 'route']].head(10))


SAMPLE MIMIC PRESCRIPTION DATA


Unnamed: 0,subject_id,drug,starttime,stoptime,route
0,10027602,Fentanyl Citrate,2201-10-30 12:00:00,,
1,10027602,Fentanyl Citrate,2201-10-30 12:00:00,,
2,10027602,Lorazepam,2201-10-31 12:00:00,,
3,10027602,Midazolam,2201-10-30 12:00:00,,
4,10027602,Midazolam,2201-10-30 12:00:00,,
5,10023239,Insulin Pump (Self Administering Medication),2137-06-21 17:00:00,2137-06-22 19:00:00,SC
6,10023239,Insulin Pump (Self Administering Medication),2140-10-04 14:00:00,2140-10-06 16:00:00,SC
7,10023239,Insulin Pump (Self Administering Medication),2140-10-06 17:00:00,2140-10-08 19:00:00,SC
8,10027602,Propofol,2201-10-30 13:00:00,,
9,10020740,Insulin,2150-04-07 11:00:00,2150-04-08 22:00:00,SC


## Define VA to MIMIC Patient Mapping

Select 10 VA patients for concurrent community care and map to MIMIC patients.

### Selected VA Patients (Focus on Dual-Eligible and High-Risk)

| PatientSID | Age | Gender | Conditions | Rationale |
|------------|-----|--------|------------|----------|
| 1011 | 72 | M | AFib, CHF, HTN | Elderly, complex cardiac, high DDI risk |
| 1012 | 68 | F | Warfarin therapy | Bleeding risk, anticoagulation |
| 1013 | 77 | M | Polypharmacy | Highest med count, multiple conditions |
| 1014 | 70 | F | NSAID + anticoag | DDI scenario, dual-eligible |
| 1015 | 82 | M | Renal concerns | Oldest patient, medication sensitivity |
| 1016 | 28 | M | PTSD, Depression | Mental health, younger veteran |
| 1019 | 48 | F | Depression, HTN | Mid-age, chronic conditions |
| 1021 | 58 | F | Diabetes, CHF | Chronic disease management |
| 1023 | 67 | F | Depression, Diabetes | Multiple comorbidities |
| 1024 | 74 | M | Complex cardiac | Elderly, multiple medications |

**‚ö†Ô∏è ACTION REQUIRED**: After exploring MIMIC patient data, update MIMICSubjectID values below with actual MIMIC patient IDs that match VA patient profiles (age, gender, medication complexity).

In [6]:
# Explore MIMIC patients to find suitable matches

# Calculate prescription counts per patient
mimic_rx_counts = df_mimic_rx.groupby('subject_id').size().reset_index(name='prescription_count')
mimic_rx_counts = mimic_rx_counts.sort_values('prescription_count', ascending=False)

print("\n" + "=" * 80)
print("MIMIC PATIENTS WITH MOST PRESCRIPTIONS (Candidates for Mapping)")
print("=" * 80)
print(f"{'Subject ID':<12} {'Prescriptions':<15}")
print("-" * 80)

# Display top 20 patients with most prescriptions
for idx, row in mimic_rx_counts.head(20).iterrows():
    print(f"{row['subject_id']:<12} {row['prescription_count']:<15}")

print("\nüí° Suggestion: Select MIMIC subject_ids from above that have 5-15 prescriptions")
print("   to match VA patient medication complexity.")


MIMIC PATIENTS WITH MOST PRESCRIPTIONS (Candidates for Mapping)
Subject ID   Prescriptions  
--------------------------------------------------------------------------------
10014354     1180           
10035631     786            
10040025     715            
10003400     675            
10015860     644            
10039708     615            
10023117     523            
10019003     483            
10021487     478            
10018081     435            
10005817     435            
10037928     429            
10020740     417            
10005866     386            
10015931     356            
10007818     320            
10002428     320            
10007795     272            
10037861     265            
10032725     258            

üí° Suggestion: Select MIMIC subject_ids from above that have 5-15 prescriptions
   to match VA patient medication complexity.


In [7]:
# Define patient mapping (VA PatientSID ‚Üí MIMIC subject_id)
#
# ‚ö†Ô∏è IMPORTANT: Update MIMICSubjectID values below with actual MIMIC patient IDs
# from the exploration cell above. Select patients with similar medication counts
# to VA patients (typically 5-15 prescriptions for concurrent care scenarios).
#
# Current values are PLACEHOLDERS marked with [TBD]

patient_mapping = pd.DataFrame([
    {
        'VAPatientSID': 1011,
        'MIMICSubjectID': 10020944,  # Updated with actual MIMIC subject_id (integer)
        'Rationale': '72yo male, CHF, AFib, complex cardiac, ~8 VA meds'
    },
    {
        'VAPatientSID': 1012,
        'MIMICSubjectID': 10039997,
        'Rationale': '68yo female, anticoagulation therapy, ~7 VA meds'
    },
    {
        'VAPatientSID': 1013,
        'MIMICSubjectID': 10012552,
        'Rationale': '77yo male, polypharmacy, multiple cardiac, ~9 VA meds'
    },
    {
        'VAPatientSID': 1014,
        'MIMICSubjectID': 10027602,
        'Rationale': '70yo female, NSAID + anticoagulation, ~8 VA meds'
    },
    {
        'VAPatientSID': 1015,
        'MIMICSubjectID': 10025612,
        'Rationale': '82yo male, highest med count, renal concerns, ~10 VA meds'
    },
    {
        'VAPatientSID': 1016,
        'MIMICSubjectID': 10009035,
        'Rationale': '28yo male, mental health, PTSD, ~3 VA meds'
    },
    {
        'VAPatientSID': 1019,
        'MIMICSubjectID': 10002930,
        'Rationale': '48yo female, depression + HTN, ~5 VA meds'
    },
    {
        'VAPatientSID': 1021,
        'MIMICSubjectID': 10007928,
        'Rationale': '58yo female, diabetes, CHF, ~6 VA meds'
    },
    {
        'VAPatientSID': 1023,
        'MIMICSubjectID': 10031757,
        'Rationale': '67yo female, depression, diabetes, ~7 VA meds'
    },
    {
        'VAPatientSID': 1024,
        'MIMICSubjectID': 10005348,
        'Rationale': '74yo male, complex cardiac, CKD, ~8 VA meds'
    }
])

logging.info(f"\nPatient mapping defined for {len(patient_mapping)} VA patients")
logging.info("‚ö†Ô∏è  ACTION REQUIRED: Update MIMICSubjectID values with actual MIMIC patient IDs")
logging.info("   Use the exploration cell above to select suitable MIMIC patients.")

# Display mapping
print("\n" + "=" * 80)
print("VA TO MIMIC PATIENT MAPPING")
print("=" * 80)
display(patient_mapping)

2025-12-02 17:44:40,756 INFO 
Patient mapping defined for 10 VA patients
2025-12-02 17:44:40,757 INFO ‚ö†Ô∏è  ACTION REQUIRED: Update MIMICSubjectID values with actual MIMIC patient IDs
2025-12-02 17:44:40,757 INFO    Use the exploration cell above to select suitable MIMIC patients.



VA TO MIMIC PATIENT MAPPING


Unnamed: 0,VAPatientSID,MIMICSubjectID,Rationale
0,1011,10020944,"72yo male, CHF, AFib, complex cardiac, ~8 VA meds"
1,1012,10039997,"68yo female, anticoagulation therapy, ~7 VA meds"
2,1013,10012552,"77yo male, polypharmacy, multiple cardiac, ~9 VA meds"
3,1014,10027602,"70yo female, NSAID + anticoagulation, ~8 VA meds"
4,1015,10025612,"82yo male, highest med count, renal concerns, ~10 VA meds"
5,1016,10009035,"28yo male, mental health, PTSD, ~3 VA meds"
6,1019,10002930,"48yo female, depression + HTN, ~5 VA meds"
7,1021,10007928,"58yo female, diabetes, CHF, ~6 VA meds"
8,1023,10031757,"67yo female, depression, diabetes, ~7 VA meds"
9,1024,10005348,"74yo male, complex cardiac, CKD, ~8 VA meds"


## Extract and Transform MIMIC Medications (Option C: Concurrent)

Select medications for mapped patients and transform dates to 2025 with **concurrent overlap**.

### Date Transformation Strategy (Option C)

- **MIMIC source dates**: 2100-2200 range (privacy-shifted)
- **Target dates**: 2025 (same year as VA medications)
- **Patient staggering**: Start community care at different times throughout 2025
- **Duration**: 3-6 months of community care per patient
- **Result**: Temporal overlap with VA medications ‚Üí Concurrent care

In [8]:
# Date transformation function for Option C (Concurrent Care)

def shift_date_to_2025_concurrent(date_str, patient_offset_days=0):
    """
    Shift MIMIC dates (2100s) to 2025 for concurrent care simulation.
    
    Args:
        date_str: Original MIMIC date string
        patient_offset_days: Offset to stagger community care start by patient
                           (creates variety in concurrent care timing)
    
    Returns:
        datetime: Date in 2025 with patient-specific offset
    """
    if pd.isna(date_str):
        return None
    
    # Parse original MIMIC date
    dt = pd.to_datetime(date_str)
    
    # Replace year with 2025, keep month/day for seasonal distribution
    new_date = dt.replace(year=2025)
    
    # Add patient-specific offset to stagger community care periods
    # This creates variety: some patients start community care in Q1,
    # others in Q2, Q3, etc., all overlapping with VA care throughout 2025
    new_date = new_date + timedelta(days=patient_offset_days)
    
    return new_date

# Define patient-specific offsets (days) to stagger community care start dates
# This creates realistic variation: not all patients start community care on same date
patient_offsets = {
    1011: 0,     # Start Jan 2025
    1012: 30,    # Start ~Feb 2025
    1013: 60,    # Start ~Mar 2025
    1014: 90,    # Start ~Apr 2025
    1015: 120,   # Start ~May 2025
    1016: 0,     # Start Jan 2025
    1019: 45,    # Start ~mid-Feb 2025
    1021: 75,    # Start ~mid-Mar 2025
    1023: 105,   # Start ~mid-Apr 2025
    1024: 135    # Start ~mid-May 2025
}

logging.info("\nDate transformation function created (Option C: Concurrent Care)")
logging.info("Strategy: Shift MIMIC dates to 2025 with patient-specific offsets")
logging.info("Result: Community care overlaps temporally with VA care throughout 2025")

2025-12-02 17:45:36,491 INFO 
Date transformation function created (Option C: Concurrent Care)
2025-12-02 17:45:36,493 INFO Strategy: Shift MIMIC dates to 2025 with patient-specific offsets
2025-12-02 17:45:36,493 INFO Result: Community care overlaps temporally with VA care throughout 2025


In [9]:
# Extract medications for mapped MIMIC patients

# Get list of MIMIC subject_ids from mapping
mimic_subjects = patient_mapping['MIMICSubjectID'].tolist()

# Check if placeholder values still present
if '[TBD]' in mimic_subjects:
    logging.warning("\n‚ö†Ô∏è  WARNING: Patient mapping contains placeholder [TBD] values!")
    logging.warning("   Update MIMICSubjectID values before proceeding.")
    logging.warning("   For now, will demonstrate with sample data...\n")
    
    # Use top MIMIC patients as demonstration
    demo_subjects = mimic_rx_counts.head(10)['subject_id'].tolist()
    logging.info(f"Using demonstration MIMIC subjects: {demo_subjects}")
    df_selected = df_mimic_rx[df_mimic_rx['subject_id'].isin(demo_subjects)].copy()
else:
    # Use actual mapped subjects
    df_selected = df_mimic_rx[df_mimic_rx['subject_id'].isin(mimic_subjects)].copy()
    logging.info(f"Selected prescriptions for {len(mimic_subjects)} mapped patients")

logging.info(f"  ‚úì Extracted {len(df_selected):,} prescriptions for community care integration")

# Display summary by patient
patient_rx_counts = df_selected.groupby('subject_id').size()
logging.info(f"\nPrescriptions per patient:")
logging.info(f"  Min: {patient_rx_counts.min()}")
logging.info(f"  Max: {patient_rx_counts.max()}")
logging.info(f"  Mean: {patient_rx_counts.mean():.1f}")

2025-12-02 17:46:04,986 INFO Selected prescriptions for 10 mapped patients
2025-12-02 17:46:04,986 INFO   ‚úì Extracted 1,102 prescriptions for community care integration
2025-12-02 17:46:04,988 INFO 
Prescriptions per patient:
2025-12-02 17:46:04,988 INFO   Min: 59
2025-12-02 17:46:04,988 INFO   Max: 181
2025-12-02 17:46:04,989 INFO   Mean: 110.2


In [10]:
# Transform MIMIC dates to 2025 for concurrent care

logging.info("\nTransforming MIMIC dates to 2025 (concurrent care model)...")

# Create mapping of MIMIC subject_id to VA PatientSID
if '[TBD]' not in mimic_subjects:
    # Use actual mapping
    subject_to_patient = dict(zip(
        patient_mapping['MIMICSubjectID'],
        patient_mapping['VAPatientSID']
    ))
else:
    # Create demo mapping
    demo_subjects = df_selected['subject_id'].unique()[:10]
    va_patients = patient_mapping['VAPatientSID'].tolist()
    subject_to_patient = dict(zip(demo_subjects, va_patients))

# Map MIMIC subject_id to VA PatientSID
df_selected['VAPatientSID'] = df_selected['subject_id'].map(subject_to_patient)

# Remove any rows without mapping
df_selected = df_selected[df_selected['VAPatientSID'].notna()].copy()

# Transform dates using patient-specific offsets (for concurrent care)
df_selected['starttime_2025'] = df_selected.apply(
    lambda row: shift_date_to_2025_concurrent(
        row['starttime'],
        patient_offsets.get(row['VAPatientSID'], 0)
    ),
    axis=1
)

df_selected['stoptime_2025'] = df_selected.apply(
    lambda row: shift_date_to_2025_concurrent(
        row['stoptime'],
        patient_offsets.get(row['VAPatientSID'], 0)
    ),
    axis=1
)

logging.info(f"  ‚úì Transformed {len(df_selected):,} medication records to 2025")

# Display date range summary
if len(df_selected) > 0:
    start_min = df_selected['starttime_2025'].min()
    start_max = df_selected['starttime_2025'].max()
    logging.info(f"\nCommunity care date range (2025):")
    logging.info(f"  Earliest start: {start_min}")
    logging.info(f"  Latest start: {start_max}")
    logging.info(f"  Spans: {(start_max - start_min).days} days across 2025")

2025-12-02 17:46:23,229 INFO 
Transforming MIMIC dates to 2025 (concurrent care model)...
2025-12-02 17:46:23,475 INFO   ‚úì Transformed 1,102 medication records to 2025
2025-12-02 17:46:23,475 INFO 
Community care date range (2025):
2025-12-02 17:46:23,475 INFO   Earliest start: 2025-02-27 18:00:00
2025-12-02 17:46:23,476 INFO   Latest start: 2026-03-17 16:00:00
2025-12-02 17:46:23,476 INFO   Spans: 382 days across 2025


## Create Community Care Medication Records

Transform MIMIC prescriptions to match VA medication schema with community care identifiers.

In [None]:
# Create community care medication records in VA schema format

logging.info("\nCreating community care medication records...")

# Extract drug name without dose (for DDI matching)
# Many MIMIC drugs include dose in name (e.g., "Aspirin 81mg")
df_selected['DrugNameWithoutDose'] = df_selected['drug'].str.extract(r'([A-Za-z\s]+)')[0].str.strip()

# Build community care medication DataFrame matching VA schema
# Note: Some fields use defaults because MIMIC schema differs from VA
df_community_care = pd.DataFrame({
    # Patient identification
    'PatientSID': df_selected['VAPatientSID'],
    
    # Facility identification (999 = community care indicator)
    'Sta3n': COMMUNITY_CARE_STA3N,
    
    # Medication names
    'DrugNameWithoutDose': df_selected['DrugNameWithoutDose'],
    'DrugNameWithDose': df_selected['drug'],
    
    # Source system identifier
    'SourceSystem': COMMUNITY_CARE_SOURCE,
    
    # Date/time fields (using transformed 2025 dates for concurrent care)
    'MedicationDateTime': df_selected['starttime_2025'],
    'StartDate': df_selected['starttime_2025'],
    'EndDate': df_selected['stoptime_2025'],
    
    # Status
    'Status': 'ACTIVE',
    
    # Administration details (using available MIMIC columns or defaults)
    'Route': df_selected['route'] if 'route' in df_selected.columns else 'Unknown',
    'DosageOrdered': df_selected['dose_val_rx'].astype(str) if 'dose_val_rx' in df_selected.columns else 'See orders',
    'Frequency': df_selected['frequency'].fillna('As directed') if 'frequency' in df_selected.columns else 'As directed',
    
    # Prescription tracking (use subject_id if pharmacy_id not available)
    'PrescriptionNumber': 'CC-' + df_selected['subject_id'].astype(str),
    
    # Provider information
    'PharmacyName': 'Community Pharmacy',
    'ProviderType': 'Community Provider'
})

# Remove any rows with null critical fields
df_community_care = df_community_care.dropna(subset=['PatientSID', 'DrugNameWithDose', 'MedicationDateTime'])

logging.info(f"  ‚úì Created {len(df_community_care):,} community care medication records")

# Display date range
if len(df_community_care) > 0:
    start_min = df_community_care['StartDate'].min()
    start_max = df_community_care['StartDate'].max()
    logging.info(f"\nCommunity care medication date range:")
    logging.info(f"  {start_min.strftime('%Y-%m-%d')} to {start_max.strftime('%Y-%m-%d')}")
    logging.info(f"  All dates in 2025 (concurrent with VA care)")

# Display sample records
print("\n" + "=" * 80)
print("SAMPLE COMMUNITY CARE MEDICATION RECORDS")
print("=" * 80)
display(df_community_care[[
    'PatientSID', 'SourceSystem', 'DrugNameWithDose',
    'StartDate', 'EndDate', 'Route'
]].head(10))

## Load Existing VA Medications

Load VA medication data (RxOut + BCMA) to merge with community care.

In [None]:
# Load existing VA medications from v1_raw/medications/

logging.info("\nLoading existing VA medications...")
start_time = time.time()

va_meds_path = f's3://{DEST_BUCKET}/{V1_RAW_MEDICATIONS_PREFIX}medications_combined.parquet'
with s3.open(va_meds_path, 'rb') as f:
    df_va_meds = pd.read_parquet(f)

elapsed = time.time() - start_time
logging.info(f"  ‚úì Loaded {len(df_va_meds):,} VA medication records in {elapsed:.2f}s")

# Display VA medication summary
logging.info(f"\nVA Medication Summary:")
logging.info(f"  Total patients: {df_va_meds['PatientSID'].nunique()}")
logging.info(f"  Source systems:")
for source, count in df_va_meds['SourceSystem'].value_counts().items():
    logging.info(f"    {source}: {count:,} records")

# Display date range
if 'MedicationDateTime' in df_va_meds.columns:
    va_start = df_va_meds['MedicationDateTime'].min()
    va_end = df_va_meds['MedicationDateTime'].max()
    logging.info(f"  Date range: {va_start} to {va_end}")

## Merge VA and Community Care Medications (Option C)

Combine VA and community medications into unified dataset with concurrent care.

In [None]:
# Merge VA and community care medications

logging.info("\nMerging VA and community care medications (Option C: Concurrent)...")

# Combine DataFrames
df_combined = pd.concat([df_va_meds, df_community_care], ignore_index=True)

# Sort by patient and date
df_combined = df_combined.sort_values(['PatientSID', 'MedicationDateTime']).reset_index(drop=True)

logging.info(f"  ‚úì Combined dataset created")

# Display combined dataset summary
print("\n" + "=" * 80)
print("COMBINED DATASET SUMMARY (VA + COMMUNITY CARE)")
print("=" * 80)
print(f"Total medication records: {len(df_combined):,}")
print(f"Total patients: {df_combined['PatientSID'].nunique()}")
print(f"\nSource System Distribution:")
print(df_combined['SourceSystem'].value_counts())
print("=" * 80)

# Identify patients with concurrent care
community_patients = df_combined[
    df_combined['SourceSystem'] == COMMUNITY_CARE_SOURCE
]['PatientSID'].unique()

logging.info(f"\nPatients with concurrent community care: {len(community_patients)}")
logging.info(f"Patient IDs: {sorted(community_patients)}")

## Validate Concurrent Care Integration (Option C)

Verify that community care medications overlap temporally with VA medications.

In [None]:
# Validate concurrent care overlap

print("\n" + "=" * 80)
print("CONCURRENT CARE VALIDATION (OPTION C)")
print("=" * 80)
print("\nChecking for temporal overlap between VA and community medications...\n")

validation_passed = True
patients_with_overlap = 0

for patient in sorted(community_patients):
    # Get patient medications by source
    patient_meds = df_combined[df_combined['PatientSID'] == patient]
    community = patient_meds[patient_meds['SourceSystem'] == COMMUNITY_CARE_SOURCE]
    va = patient_meds[patient_meds['SourceSystem'].isin(['RxOut', 'BCMA'])]
    
    if len(community) > 0 and len(va) > 0:
        # Get date ranges
        comm_start = community['StartDate'].min()
        comm_end = community['EndDate'].max()
        va_start = va['MedicationDateTime'].min()
        va_end = va['MedicationDateTime'].max()
        
        # Check for temporal overlap
        # Overlap exists if NOT (comm_end < va_start OR va_end < comm_start)
        has_overlap = not (comm_end < va_start or va_end < comm_start)
        
        if has_overlap:
            patients_with_overlap += 1
            print(f"‚úì Patient {patient}: CONCURRENT CARE DETECTED")
            print(f"  Community: {comm_start.strftime('%Y-%m-%d')} to {comm_end.strftime('%Y-%m-%d')} ({len(community)} meds)")
            print(f"  VA:        {va_start.strftime('%Y-%m-%d')} to {va_end.strftime('%Y-%m-%d')} ({len(va)} meds)")
            print(f"  ‚Üí Medications from both sources active simultaneously in 2025")
        else:
            validation_passed = False
            print(f"‚ö†Ô∏è  Patient {patient}: NO OVERLAP (Sequential, not concurrent)")
            print(f"  Community: {comm_start.strftime('%Y-%m-%d')} to {comm_end.strftime('%Y-%m-%d')}")
            print(f"  VA:        {va_start.strftime('%Y-%m-%d')} to {va_end.strftime('%Y-%m-%d')}")
        print()

# Validation summary
print("=" * 80)
print("VALIDATION SUMMARY")
print("=" * 80)
print(f"Patients with concurrent care: {patients_with_overlap} / {len(community_patients)}")

if validation_passed and patients_with_overlap == len(community_patients):
    print("\n‚úÖ VALIDATION PASSED: Option C (Concurrent Care) successfully implemented")
    print("   All patients have overlapping VA and community medications in 2025")
    logging.info("\n‚úÖ Concurrent care validation passed")
else:
    print("\n‚ö†Ô∏è  VALIDATION WARNING: Some patients don't have concurrent overlap")
    print("   Review date transformation logic if this is unexpected")
    logging.warning("\n‚ö†Ô∏è  Concurrent care validation: not all patients have overlap")

## Write Combined Dataset to v1_raw/medications/

Save merged VA + community care dataset, replacing existing VA-only data.

In [None]:
# Write combined dataset to MinIO

logging.info("\nWriting combined dataset to v1_raw/medications/...")

dest_path = f's3://{DEST_BUCKET}/{V1_RAW_MEDICATIONS_PREFIX}medications_combined.parquet'
logging.info(f"Destination: {dest_path}")

start_time = time.time()

with s3.open(dest_path, 'wb') as f:
    df_combined.to_parquet(f, engine='pyarrow', index=False)

elapsed = time.time() - start_time
logging.info(f"  ‚úì Written {len(df_combined):,} medication records in {elapsed:.2f}s")
logging.info(f"\n‚úÖ Community care integration complete!")

## Final Summary

Community care integration (Option C: Concurrent Care) complete.

In [None]:
# Display final summary

print("\n" + "=" * 80)
print("COMMUNITY CARE INTEGRATION SUMMARY - OPTION C: CONCURRENT CARE")
print("=" * 80)
print()
print(f"Approach:     Concurrent care (VA + community active simultaneously)")
print(f"Timeline:     All medications active in 2025")
print()
print(f"Total medication records: {len(df_combined):,}")
print(f"Total patients:           {df_combined['PatientSID'].nunique()}")
print()
print("Source Distribution:")
for source, count in df_combined['SourceSystem'].value_counts().items():
    pct = (count / len(df_combined)) * 100
    print(f"  {source:<20} {count:>6,} records ({pct:>5.1f}%)")
print()
print(f"Patients with concurrent care: {len(community_patients)}")
print(f"Patient IDs: {sorted(community_patients)}")
print()
print(f"Data written to: s3://{DEST_BUCKET}/{V1_RAW_MEDICATIONS_PREFIX}medications_combined.parquet")
print()
print("Status: ‚úÖ Complete")
print("=" * 80)
print()
print("üìã Next Steps:")
print("   1. Re-run notebooks 02-06 to analyze concurrent care data")
print("   2. In 02_explore.ipynb: Verify 3 source systems present")
print("   3. In 06_analysis.ipynb: Focus on concurrent DDI detection across sources")
print("   4. Look for 'Blind Spot' scenarios where community + VA meds interact")
print()
print("üéØ Key Analysis Focus:")
print("   - Identify DDIs spanning VA and community care")
print("   - Analyze concurrent polypharmacy patterns")
print("   - Detect medication reconciliation opportunities")