# Unity Catalog 초기 설정 (P2T2)

> **Project:** Medical AI CDSS — Azure Databricks Pipeline
> **Purpose:** P2T2 카탈로그 내 스키마/테이블 DDL을 정의합니다.
>
> **카탈로그:** `P2T2`
> **스키마:** `bronze`, `silver`, `gold`, `ai_results`

## 1. Catalog & Schema 생성

In [None]:

# 카탈로그 설정 (이미 존재)
spark.sql("USE CATALOG P2T2")

# 스키마 생성
spark.sql("CREATE SCHEMA IF NOT EXISTS bronze COMMENT 'Raw 데이터 (CSV → Delta)'")
spark.sql("CREATE SCHEMA IF NOT EXISTS silver COMMENT '정제/필터링 데이터'")
spark.sql("CREATE SCHEMA IF NOT EXISTS gold COMMENT '집계/분석 데이터'")
spark.sql("CREATE SCHEMA IF NOT EXISTS ai_results COMMENT 'AI 추론 + Judge 결과'")

print("✅ P2T2 Catalog & Schema 생성 완료")

## 2. Bronze Layer 테이블

In [None]:

spark.sql("""
CREATE TABLE IF NOT EXISTS bronze.vital_signs (
    patient_id          STRING,
    timestamp           TIMESTAMP,
    heart_rate          DOUBLE,
    systolic_bp         DOUBLE,
    diastolic_bp        DOUBLE,
    spo2                DOUBLE,
    temperature         DOUBLE,
    respiratory_rate    DOUBLE,
    processed_at        TIMESTAMP
)
USING DELTA
COMMENT '환자 바이탈 원시 데이터 (48 rows per patient)'
""")

spark.sql("""
CREATE TABLE IF NOT EXISTS bronze.dicom_metadata (
    patient_id          STRING,
    modality            STRING,
    body_part           STRING,
    patient_age         STRING,
    patient_sex         STRING,
    institution         STRING,
    referring_physician STRING,
    study_description   STRING,
    finding_labels      STRING,
    sop_instance_uid    STRING,
    processed_at        TIMESTAMP
)
USING DELTA
COMMENT 'DICOM 영상 메타데이터 (Kaggle NIH CXR)'
""")

spark.sql("""
CREATE TABLE IF NOT EXISTS bronze.emergency_data (
    patient_id          STRING,
    arrival_time        TIMESTAMP,
    hospital            STRING,
    triage_level        INT,
    chief_complaint     STRING,
    initial_diagnosis   STRING,
    transport_type      STRING,
    processed_at        TIMESTAMP
)
USING DELTA
COMMENT '응급 내원 데이터'
""")

spark.sql("""
CREATE TABLE IF NOT EXISTS bronze.medical_history (
    patient_id          STRING,
    record_date         DATE,
    hospital            STRING,
    department          STRING,
    diagnosis           STRING,
    medication          STRING,
    notes               STRING,
    processed_at        TIMESTAMP
)
USING DELTA
COMMENT '과거 진료/투약 기록'
""")

print("✅ Bronze 4 tables created")

## 3. Silver Layer 테이블

In [None]:

spark.sql("""
CREATE TABLE IF NOT EXISTS silver.cleaned_vital_signs (
    patient_id          STRING,
    timestamp           TIMESTAMP,
    heart_rate          DOUBLE,
    systolic_bp         DOUBLE,
    diastolic_bp        DOUBLE,
    spo2                DOUBLE,
    temperature         DOUBLE,
    respiratory_rate    DOUBLE,
    risk_score          DOUBLE,
    processed_at        TIMESTAMP
)
USING DELTA
COMMENT '정제된 바이탈 (이상치 제거 + 위험도 스코어)'
""")

# cleaned_dicom_metadata, cleaned_emergency_data, cleaned_medical_history
# 는 Bronze와 동일 스키마 + processed_at (Bronze 적재 시 자동 생성)

print("✅ Silver tables created")

## 4. Gold Layer 테이블

In [None]:

spark.sql("""
CREATE TABLE IF NOT EXISTS gold.patient_clinical_summary (
    patient_id              STRING,
    avg_heart_rate          DOUBLE,
    avg_systolic_bp         DOUBLE,
    avg_diastolic_bp        DOUBLE,
    avg_spo2                DOUBLE,
    avg_temperature         DOUBLE,
    avg_respiratory_rate    DOUBLE,
    max_risk_score          DOUBLE,
    avg_risk_score          DOUBLE,
    vital_count             LONG,
    history_count           LONG,
    diagnoses               STRING,
    medications             STRING,
    aggregated_at           TIMESTAMP
)
USING DELTA
COMMENT '환자별 통합 임상 요약 (바이탈 집계 + 진료 기록)'
""")

print("✅ Gold table created")

## 5. AI Results 테이블

In [None]:

spark.sql("""
CREATE TABLE IF NOT EXISTS ai_results.biomedclip_results (
    patient_id      STRING,
    top_diagnosis   STRING,
    top_similarity  STRING,
    urgency_level   STRING,
    created_at      TIMESTAMP
)
USING DELTA
COMMENT 'BioMedCLIP 영상-텍스트 매칭 결과'
""")

spark.sql("""
CREATE TABLE IF NOT EXISTS ai_results.openai_soap_notes (
    patient_id      STRING,
    soap_note       STRING,
    model_version   STRING,
    tokens_used     STRING,
    created_at      TIMESTAMP
)
USING DELTA
COMMENT 'Azure OpenAI SOAP 노트 생성 결과'
""")

spark.sql("""
CREATE TABLE IF NOT EXISTS ai_results.judge_evaluation (
    patient_id      STRING,
    overall_score   STRING,
    confidence      STRING,
    pass_fail       STRING,
    evaluation_json STRING,
    judge_model     STRING,
    evaluated_at    TIMESTAMP
)
USING DELTA
COMMENT 'LLM-as-a-Judge 평가 결과'
""")

print("✅ AI Results 3 tables created")
print("🎉 P2T2 Unity Catalog 전체 초기 설정 완료!")

## 6. 설정 확인

In [None]:

print("=" * 60)
print("📋 P2T2 Unity Catalog 구조")
print("=" * 60)
for schema in ['bronze', 'silver', 'gold', 'ai_results']:
    tables = spark.sql(f"SHOW TABLES IN {schema}").collect()
    print(f"\n  {schema}/")
    for t in tables:
        print(f"    └── {t['tableName']}")
print("=" * 60)