# Hugging Face Hub - Setup by Downloading Models

## !! This notebook is not meant to be run, just to explain the setup !!

## Teaching LLM workflow by using open source models and Hugging Face Hub

The **Hugging Face Hub** is a platform that hosts thousands of pre-trained models, datasets, and demos. It's the go-to source for downloading quantized GGUF models that can run efficiently on CPU.

This info is as of the writing of this notebook in December 2025 and this info is changing rapidly.

### Why Hugging Face Hub?

- **Vast model selection**: Access to thousands of GGUF models, not limited to a curated subset
- **Standard format**: All models use the GGUF format compatible with llama.cpp and llama-cpp-python
- **Configurable caching**: Control where models are downloaded and stored
- **Simple Python API**: Easy-to-use `hf_hub_download` function

### Shared Filesystem

In the setup where I was teaching, I used this notebook to download models from Hugging Face and I put them in a shared-readwrite folder where the students could access them on JupyterHub. This was possible because I was using a JupyterHub for teaching that had a shared folder system.

Your use case may vary. It could look like...
- Shared read-write directory on JupyterHub
- Each student downloads their own models
- Download models to local machine

In [None]:
# Ensure that your python environment has huggingface_hub package installed.
try:
    from huggingface_hub import hf_hub_download
except ImportError:
    %pip install huggingface_hub
    from huggingface_hub import hf_hub_download

## Which model to download

In the use case for teaching on a JupyterHub with a CPU, I was looking for **small models**:
- ~1-2 billion parameters
- Quantized (Weights have 4 decimal places instead of 10)

You can explore the world of models at:
[Hugging Face Model List](https://huggingface.co/models?search=gguf)

When searching for GGUF models, look for:
- **Q4_K_M** or **Q4_0**: 4-bit quantization, good balance of size and quality
- **Q5_K_M**: Slightly larger but better quality
- **Q8_0**: 8-bit quantization, best quality but larger file size

### Recommended small models for teaching:

| Model | Repo ID | Filename | Size |
|-------|---------|----------|------|
| TinyLlama 1.1B | `TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF` | `tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf` | ~670 MB |
| Qwen2 1.5B | `Qwen/Qwen2-1.5B-Instruct-GGUF` | `qwen2-1_5b-instruct-q4_0.gguf` | ~900 MB |
| Llama 3.2 1B | `bartowski/Llama-3.2-1B-Instruct-GGUF` | `Llama-3.2-1B-Instruct-Q4_K_M.gguf` | ~700 MB |
| Phi-3 Mini | `bartowski/Phi-3-mini-4k-instruct-GGUF` | `Phi-3-mini-4k-instruct-Q4_K_M.gguf` | ~2.3 GB |
| DeepSeek R1 1.5B | `bartowski/DeepSeek-R1-Distill-Qwen-1.5B-GGUF` | `DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf` | ~1 GB |

## Let's check out our local filesystem path and where we will download the files

### Approach 1 - If a Shared Hub is being used

In [None]:
# Cal-ICOR workshop Hub specific path
!ls /home/jovyan/shared

### Approach 2 - If a local machine is being used

In [None]:
# This is my local path to a directory called shared-rw
!ls ../shared/

In [None]:
# or the full path (this is on my laptop)
!ls /Users/ericvandusen/Documents/GitHub/shared/

### Set the path where the models will download

In [None]:
# Path for Shared Hub - change this to match your JupyterHub's shared directory
# Examples: /home/jovyan/shared, /home/jovyan/shared_readwrite, /home/jovyan/_shared/course-name
shared_model_path = "/home/jovyan/shared"

In [None]:
# Path for Local
shared_model_path = "/Users/ericvandusen/Documents/GitHub/shared/"

## Downloading Models with Hugging Face Hub

The `hf_hub_download` function downloads a specific file from a Hugging Face repository.

### Key Parameters:

| Parameter | Description |
|-----------|-------------|
| `repo_id` | The repository identifier (e.g., `"TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"`) |
| `filename` | The specific file to download (e.g., `"tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"`) |
| `local_dir` | Directory where the file will be stored (for shared access) |
| `local_dir_use_symlinks` | Set to `False` to copy files instead of creating symlinks |

### Default Behavior vs Shared Repository

By default, `hf_hub_download` stores files in `~/.cache/huggingface/hub/`, which is user-specific. To enable shared access for students, we use the `local_dir` parameter to specify a shared directory.

### Download TinyLlama 1.1B (Recommended for teaching)

In [None]:
# Download TinyLlama model to shared directory
model_path = hf_hub_download(
    repo_id="TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
    filename="tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",
    local_dir=shared_model_path,
    local_dir_use_symlinks=False
)

print(f"Model downloaded to: {model_path}")

### Download Qwen2 1.5B Instruct

In [None]:
# Download Qwen2 model to shared directory
model_path = hf_hub_download(
    repo_id="Qwen/Qwen2-1.5B-Instruct-GGUF",
    filename="qwen2-1_5b-instruct-q4_0.gguf",
    local_dir=shared_model_path,
    local_dir_use_symlinks=False
)

print(f"Model downloaded to: {model_path}")

### Download Llama 3.2 1B Instruct

In [None]:
# Download Llama 3.2 model to shared directory
model_path = hf_hub_download(
    repo_id="bartowski/Llama-3.2-1B-Instruct-GGUF",
    filename="Llama-3.2-1B-Instruct-Q4_K_M.gguf",
    local_dir=shared_model_path,
    local_dir_use_symlinks=False
)

print(f"Model downloaded to: {model_path}")

### Download DeepSeek R1 Distill 1.5B

In [None]:
# Download DeepSeek model to shared directory
model_path = hf_hub_download(
    repo_id="bartowski/DeepSeek-R1-Distill-Qwen-1.5B-GGUF",
    filename="DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf",
    local_dir=shared_model_path,
    local_dir_use_symlinks=False
)

print(f"Model downloaded to: {model_path}")

## Let's now check which models we have

In [None]:
!ls -lh "{shared_model_path}"

## Testing the Downloaded Model with llama-cpp-python

Let's verify that our downloaded model works correctly by loading it with llama-cpp-python and generating a simple response.

In [None]:
# Ensure llama-cpp-python is installed
try:
    from llama_cpp import Llama
except ImportError:
    %pip install llama-cpp-python
    from llama_cpp import Llama

In [None]:
import os

# Path to our downloaded model
model_file = os.path.join(shared_model_path, "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf")

# Load the model
print(f"Loading model from: {model_file}")
llm = Llama(
    model_path=model_file,
    n_ctx=2048,
    verbose=True
)

print("\nâœ“ Model loaded successfully!")

In [None]:
# Test generation
response = llm.create_chat_completion(
    messages=[
        {"role": "user", "content": "Hello! Can you tell me a fun fact about llamas?"}
    ],
    max_tokens=100
)

print("Response:")
print(response["choices"][0]["message"]["content"])

## Bonus: Searching for Models on Hugging Face

You can also use the Hugging Face Hub API to search for models programmatically.

In [None]:
from huggingface_hub import HfApi, list_models

In [None]:
# Search for GGUF models
api = HfApi()

# Find models with "gguf" in the name, sorted by downloads
models = list(api.list_models(
    search="gguf",
    sort="downloads",
    limit=20
))

print("Top 20 GGUF models by downloads:")
print("-" * 60)
for model in models:
    print(f"{model.id}")

In [None]:
# List files in a specific repository to find available quantizations
from huggingface_hub import list_repo_files

repo_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
files = list_repo_files(repo_id)

print(f"Files in {repo_id}:")
print("-" * 60)
for f in files:
    if f.endswith(".gguf"):
        print(f)

## Summary

In this notebook, you learned how to:

1. **Install** the `huggingface_hub` package
2. **Download GGUF models** using `hf_hub_download`
3. **Configure shared storage** for classroom environments
4. **Test downloaded models** with llama-cpp-python
5. **Search for models** on Hugging Face programmatically

### Key Advantages of Hugging Face Hub:

- **Huge model selection**: Thousands of GGUF models available
- **Configurable caching**: Easy to set up shared directories for classrooms
- **Automatic versioning**: Models are versioned and can be pinned to specific commits
- **Simple API**: Just two parameters needed: `repo_id` and `filename`

### Next Steps:

- See `LlamaCpp_SmallLM_Demo.ipynb` for detailed usage of downloaded models
- Explore different quantization levels (Q4, Q5, Q8) for your use case
- Try models from different families (Llama, Qwen, Phi, etc.)