# MONAI Lung CT Data Ingestion

This notebook downloads paired lung CT scans from a public dataset and uploads them to Snowflake stages.

## Overview
- **Data Source**: Zenodo paired lung CT dataset (~266MB)
- **Format**: NIfTI (.nii.gz) medical imaging format
- **Destination**: Encrypted Snowflake stages

## Workflow
1. **Install Dependencies** - Install MONAI library
2. **Initialize Session** - Connect to Snowflake
3. **Download Data** - Fetch paired CT scans from Zenodo
4. **Upload to Stages** - Store in encrypted Snowflake stages

## Step 1: Install MONAI

Install the MONAI (Medical Open Network for AI) library for downloading medical imaging datasets.

In [None]:
!pip install monai

## Step 2: Initialize Snowflake Session

Connect to Snowflake and configure the session with query tags for tracking.

In [None]:
from snowflake.snowpark.context import get_active_session
from monai.apps import download_url, download_and_extract
import os

# Get active session (automatically available in Container Runtime notebooks)
session = get_active_session()

# Set query tag for consumption tracking
session.query_tag = '{"origin":"sf_sit-is","name":"distributed_medical_image_processing_with_monai","version":{"major":1,"minor":0},"attributes":{"is_quickstart":1,"source":"notebook"}}'

# Database name - matches setup.sql
DATABASE_NAME = "MONAI_DB"
print(f"✅ Using database: {DATABASE_NAME}")

## Step 3: Download Paired Lung CT Scans

Download the paired lung CT dataset from Zenodo. This dataset contains:
- **Scans**: CT images at inspiration and expiration phases
- **Lung Masks**: Segmentation masks for the lung regions

The download is approximately **266MB** and may take a few minutes.

In [None]:
import tempfile

directory = '/tmp'
if directory is not None:
    os.makedirs(directory, exist_ok=True)
root_dir = tempfile.mkdtemp() if directory is None else directory
print(f"📁 Working directory: {root_dir}")

In [None]:
resource = "https://zenodo.org/record/3835682/files/training.zip"

compressed_file = os.path.join(root_dir, "paired_ct_lung.zip")
data_dir = os.path.join(root_dir, "paired_ct_lung")
if not os.path.exists(data_dir):
    download_and_extract(resource, compressed_file, root_dir)
    os.rename(os.path.join(root_dir, "training"), data_dir)
    print(f"✅ Downloaded and extracted to: {data_dir}")
else:
    print(f"📂 Data already exists at: {data_dir}")

## Step 4: Upload to Snowflake Stages

Upload the NIfTI files to encrypted Snowflake stages. The files are organized as:
- `lungMasksExp/` - Expiration lung masks
- `lungMasksInsp/` - Inspiration lung masks  
- `scansExp/` - Expiration CT scans
- `scansInsp/` - Inspiration CT scans

All files are stored with **Snowflake Server-Side Encryption (SSE)**.

In [None]:
session.use_database(DATABASE_NAME)
session.use_schema("UTILS")

print("📤 Uploading lung masks (expiration)...")
session.file.put(f"{data_dir}/lungMasks/*exp.nii.gz", "@monai_medical_images_stg/lungMasksExp", overwrite=True)

print("📤 Uploading lung masks (inspiration)...")
session.file.put(f"{data_dir}/lungMasks/*insp.nii.gz", "@monai_medical_images_stg/lungMasksInsp", overwrite=True)

print("📤 Uploading CT scans (expiration)...")
session.file.put(f"{data_dir}/scans/*exp.nii.gz", "@monai_medical_images_stg/scansExp", overwrite=True)

print("📤 Uploading CT scans (inspiration)...")
session.file.put(f"{data_dir}/scans/*insp.nii.gz", "@monai_medical_images_stg/scansInsp", overwrite=True)

print("✅ All files uploaded to @monai_medical_images_stg")

## Data Ingestion Complete!

The paired lung CT scans are now stored in Snowflake stages.

### Verify Upload
Run the cell below to list the uploaded files.

### Next Steps
Proceed to **02_model_training** to train the registration model.

In [None]:
# List uploaded files
session.sql("LIST @monai_medical_images_stg").show()