# Download SCARF Demo Data and Model Files from Zenodo

This notebook demonstrates how to download example datasets and pretrained model files for **SCARF: A Single Cell ATAC-seq and RNA-seq Foundation Model**.  

We will:
1. Download and extract the **demo dataset** (`demo_hPBMC.tar.gz`) into the `data/` folder.  
2. Download and extract the **model files** (`model_files.tar.gz`), which contain:  
   - `weights/` → extracted into the existing `weights/` folder.  
   - `prior_data/` → extracted into the existing `prior_data/` folder.  


## Step 1: Setup
Create local folders for saving downloaded files and extracted contents.

In [5]:
import os

# Paths for saving and extraction
SAVE_PATH = "./downloads"
DATA_DIR = "./data"
WEIGHTS_DIR = "./weights"
PRIOR_DIR = "./prior_data"

# Create directories if they don't exist
os.makedirs(SAVE_PATH, exist_ok=True)
os.makedirs(DATA_DIR, exist_ok=True)
os.makedirs(WEIGHTS_DIR, exist_ok=True)
os.makedirs(PRIOR_DIR, exist_ok=True)

print("✅ Directories are ready.")


✅ Directories are ready.


## Step 2: Define download function
We use `requests` with a progress bar (`tqdm`) to download large files.


In [6]:
import requests
from tqdm import tqdm
import tarfile

def download_file(url, save_path):
    """Download a file with progress bar"""
    if os.path.exists(save_path):
        print(f"✅ Already exists: {save_path}")
        return

    response = requests.get(url, stream=True)
    total_size = int(response.headers.get("content-length", 0))

    with open(save_path, "wb") as f, tqdm(
        desc=f"Downloading {os.path.basename(save_path)}",
        total=total_size,
        unit="B",
        unit_scale=True,
        unit_divisor=1024,
    ) as bar:
        for chunk in response.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)
                bar.update(len(chunk))


## Step 3: Download and extract the demo dataset
This will download **`demo_hPBMC.tar.gz`** and extract it into the `./data/` folder.


In [None]:
URL_DEMO = "https://zenodo.org/records/16956913/files/demo_hPBMC.tar.gz"
demo_file = os.path.join(SAVE_PATH, "demo_hPBMC.tar.gz")

# Download
download_file(URL_DEMO, demo_file)

# Extract
with tarfile.open(demo_file, "r:gz") as tar_ref:
    tar_ref.extractall(DATA_DIR)
print(f"✅ Extracted demo dataset into: {DATA_DIR}")

## Step 4: Download and extract model files
This will download **`model_files.tar.gz`** and extract:
- `weights/` → into the existing `weights/` folder  
- `prior_data/` → into the existing `prior_data/` folder  


In [None]:
URL_MODEL = "https://zenodo.org/records/16956913/files/model_files.tar.gz"
model_file = os.path.join(SAVE_PATH, "model_files.tar.gz")

# Download
download_file(URL_MODEL, model_file)

# Extract selectively
with tarfile.open(model_file, "r:gz") as tar_ref:
    for member in tar_ref.getmembers():
        if member.name.startswith("weights"):
            tar_ref.extract(member, ".")
            print(f"Extracted {member.name} → {WEIGHTS_DIR}")
        elif member.name.startswith("prior_data"):
            tar_ref.extract(member, ".")
            print(f"Extracted {member.name} → {PRIOR_DIR}")

print("🎉 All files downloaded and extracted successfully!")
