# Download HuggingFace Models to Local Storage

This notebook demonstrates how to download model weights from HuggingFace Hub to local storage for later use. This is useful when you want to:
- Pre-download models for offline use
- Store models in a specific location for deployment
- Cache models to avoid repeated downloads
- Load models from disk in production environments

## Setup

First, let's import the necessary libraries. We only need `huggingface_hub` for downloading models.

In [None]:
import os
from pathlib import Path
from huggingface_hub import snapshot_download

## Download Function

This function downloads all model files from a HuggingFace repository to a local directory. It uses `snapshot_download` which:
- Downloads all files in the repository
- Preserves the directory structure
- Handles large files efficiently
- Supports authentication for gated models

In [None]:
def download_model(model_id: str, output_dir: str, use_auth_token: bool = False):
    """
    Download model weights from HuggingFace Hub.
    
    Args:
        model_id: HuggingFace model ID (e.g., 'stabilityai/stable-diffusion-3.5-medium')
        output_dir: Directory to save the model
        use_auth_token: Whether to use HuggingFace authentication token
    
    Returns:
        Path to the downloaded model directory
    """
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)
    
    print(f"Downloading model: {model_id}")
    print(f"Output directory: {output_path.absolute()}")
    
    try:
        # Download all model files
        snapshot_path = snapshot_download(
            repo_id=model_id,
            local_dir=str(output_path),
            local_dir_use_symlinks=False,  # Store actual files, not symlinks
            use_auth_token=use_auth_token
        )
        print(f"✓ Successfully downloaded model to {snapshot_path}")
        
        # List downloaded files
        files = list(output_path.rglob("*"))
        file_list = [f for f in files if f.is_file()]
        print(f"\nDownloaded {len(file_list)} files:")
        
        # Show first 10 files with sizes
        for f in sorted(file_list)[:10]:
            size_mb = f.stat().st_size / (1024 * 1024)
            print(f"  - {f.relative_to(output_path)} ({size_mb:.1f} MB)")
        
        if len(file_list) > 10:
            print(f"  ... and {len(file_list) - 10} more files")
        
        # Calculate total size
        total_size_gb = sum(f.stat().st_size for f in file_list) / (1024**3)
        print(f"\nTotal size: {total_size_gb:.2f} GB")
        
        return output_path
        
    except Exception as e:
        print(f"✗ Failed to download model: {e}")
        raise

## Example 1: Download a Small Test Model

Let's start with a small model for testing. The `segmind/tiny-sd` model is a compact version of Stable Diffusion that's great for quick tests.

In [None]:
# Download a small test model
model_path = download_model(
    model_id="segmind/tiny-sd",
    output_dir="../models/tiny-sd"
)

## Example 2: Download Stable Diffusion 3.5 Medium

For production use, you might want to download larger models like Stable Diffusion 3.5. Note that some models require authentication.

In [None]:
# Download Stable Diffusion 3.5 Medium
# Uncomment the following lines to download (requires ~10GB disk space)

# model_path = download_model(
#     model_id="stabilityai/stable-diffusion-3.5-medium",
#     output_dir="../models/sd3.5-medium",
#     use_auth_token=True  # May require authentication
# )

## Example 3: Download Fine-tuned Red Hat Dog Model

This is a custom fine-tuned Stable Diffusion 3 model trained on Red Hat dog images. It's useful for generating images with the Red Hat mascot.

In [None]:
# Download the Red Hat Dog fine-tuned SD3 model
# This model is fine-tuned for generating Red Hat mascot images

model_path = download_model(
    model_id="cfchase/redhat-dog-sd3",
    output_dir="../models/redhat-dog-sd3"
)

# You can then use this model with:
# pipeline = DiffusionPipeline.from_pretrained("./models/redhat-dog-sd3")

## Authentication for Gated Models

Some models on HuggingFace require authentication. Here's how to set up authentication:

In [None]:
def setup_auth():
    """
    Check and setup HuggingFace authentication.
    """
    # Check for token in environment
    token = os.environ.get("HF_TOKEN") or os.environ.get("HUGGING_FACE_HUB_TOKEN")
    
    if token:
        print("✓ Found HuggingFace token in environment")
        return True
    
    # Check for token file
    token_file = Path.home() / ".huggingface" / "token"
    if token_file.exists():
        print("✓ Found HuggingFace token file")
        return True
    
    print("⚠️ No HuggingFace token found")
    print("To authenticate:")
    print("1. Set HF_TOKEN environment variable, or")
    print("2. Run: huggingface-cli login")
    return False

# Check authentication status
has_auth = setup_auth()

## Loading Models from Disk

Once downloaded, you can load these models directly from disk without needing internet access. Here's how to use the downloaded models with Diffusers:

In [None]:
# Example: Loading a downloaded model with Diffusers
# Uncomment to test (requires diffusers and torch installed)

# from diffusers import DiffusionPipeline
# import torch
#
# # Load from local path instead of HuggingFace ID
# pipeline = DiffusionPipeline.from_pretrained(
#     "./models/tiny-sd",  # Local path to downloaded model
#     torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
# )
#
# # Move to GPU if available
# if torch.cuda.is_available():
#     pipeline = pipeline.to("cuda")
#
# print("Model loaded successfully from disk!")

## Batch Download Multiple Models

You can easily download multiple models in a batch:

In [None]:
# Define models to download
models_to_download = [
    {"id": "segmind/tiny-sd", "dir": "./models/tiny-sd"},
    # Add more models here as needed
    # {"id": "runwayml/stable-diffusion-v1-5", "dir": "./models/sd-v1.5"},
    # {"id": "stabilityai/stable-diffusion-2-1", "dir": "./models/sd-v2.1"},
]

# Download all models
for model in models_to_download:
    print(f"\n{'='*50}")
    try:
        download_model(model["id"], model["dir"])
    except Exception as e:
        print(f"Failed to download {model['id']}: {e}")

## Clean Up Downloaded Models

If you need to remove downloaded models to free up space:

In [None]:
import shutil

def remove_model(model_dir: str):
    """
    Remove a downloaded model directory.
    """
    model_path = Path(model_dir)
    if model_path.exists():
        shutil.rmtree(model_path)
        print(f"✓ Removed {model_path}")
    else:
        print(f"⚠️ Directory not found: {model_path}")

# Example: Remove a model (uncomment to use)
# remove_model("./models/tiny-sd")

## Tips and Best Practices

1. **Storage Location**: Choose a location with sufficient disk space. Large models can be 5-20GB+.

2. **Network**: Downloads can be large. Use a stable, fast internet connection.

3. **Authentication**: Some models require accepting terms of use on HuggingFace before downloading.

4. **Caching**: `snapshot_download` automatically caches downloads. Subsequent downloads of the same model will be faster.

5. **Version Control**: Don't commit large model files to git. Add your models directory to `.gitignore`.

6. **Deployment**: For production, consider using object storage (S3, GCS) or persistent volumes to store models.