> **Copyright 2026 Snowflake Inc.**  
> SPDX-License-Identifier: Apache-2.0  
>  
> Licensed under the Apache License, Version 2.0 (the "License");  
> you may not use this file except in compliance with the License.  
> You may obtain a copy of the License at [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)  
>  
> Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

# MONAI Lung CT Registration - Data Ingestion

This notebook downloads the **Learn2Reg Challenge** lung CT dataset and uploads it to Snowflake internal stages.

## Dataset Overview
- **Source**: [Learn2Reg Challenge - CT Lung Registration](https://zenodo.org/record/3835682)
- **Data**: Paired inspiratory and expiratory lung CT scans with segmentation masks
- **Format**: NIfTI (.nii.gz) 3D medical images
- **License**: CC BY 4.0

## Workflow
1. Download training data from Zenodo
2. Extract paired CT scans and lung masks
3. Upload to Snowflake internal stages organized by type

In [None]:
!pip install monai

In [None]:
from snowflake.snowpark.context import get_active_session
from monai.apps import download_and_extract
import os

session = get_active_session()

## Step 1: Initialize Snowflake Session

In [None]:
session.sql("select current_user()").collect()

## Step 2: Setup Temporary Directory

In [None]:
import tempfile

directory = '/tmp'
if directory is not None:
    os.makedirs(directory, exist_ok=True)
root_dir = tempfile.mkdtemp() if directory is None else directory
print(root_dir)

## Step 3: Download Dataset from Zenodo

Downloads ~1.2GB of paired lung CT scans from the Learn2Reg Challenge.

In [None]:
resource = "https://zenodo.org/record/3835682/files/training.zip"

compressed_file = os.path.join(root_dir, "paired_ct_lung.zip")
data_dir = os.path.join(root_dir, "paired_ct_lung")
if not os.path.exists(data_dir):
    download_and_extract(resource, compressed_file, root_dir)
    os.rename(os.path.join(root_dir, "training"), data_dir)

## Step 4: Upload to Snowflake Stages

Uploads the medical images to internal stages:
- `@sf_clinical_medical_images_stg/lungMasksExp` - Expiratory lung masks
- `@sf_clinical_medical_images_stg/lungMasksInsp` - Inspiratory lung masks  
- `@sf_clinical_medical_images_stg/scansExp` - Expiratory CT scans
- `@sf_clinical_medical_images_stg/scansInsp` - Inspiratory CT scans

In [None]:
session.use_database("SF_CLINICAL_DB")
session.use_schema("UTILS")


session.file.put(f"{data_dir}/lungMasks/*exp.nii.gz", "@sf_clinical_medical_images_stg/lungMasksExp", overwrite=True)
session.file.put(f"{data_dir}/lungMasks/*insp.nii.gz", "@sf_clinical_medical_images_stg/lungMasksInsp", overwrite=True)
session.file.put(f"{data_dir}/scans/*exp.nii.gz", "@sf_clinical_medical_images_stg/scansExp", overwrite=True)
session.file.put(f"{data_dir}/scans/*insp.nii.gz", "@sf_clinical_medical_images_stg/scansInsp", overwrite=True)