# Lab 2.5.1: Hugging Face Hub Exploration

**Module:** 2.5 - Hugging Face Ecosystem  
**Time:** 2 hours  
**Difficulty:** ⭐⭐ (Intermediate)

---

## Learning Objectives

By the end of this lab, you will:
- [ ] Navigate the Hugging Face Hub to discover models
- [ ] Understand model cards and evaluate model quality
- [ ] Load and test pre-trained models locally
- [ ] Document models systematically for your projects

---

## Prerequisites

- Completed: Module 2.4 (Efficient Architectures)
- Knowledge of: PyTorch basics, transformer architecture concepts

---

## Real-World Context

**The AI Model Marketplace**: Imagine you're building an AI-powered customer service bot. You need models for:
- Understanding customer sentiment (Are they happy or frustrated?)
- Extracting key information (What product are they asking about?)
- Generating helpful responses

Instead of training these from scratch (which would take months and millions of dollars), you can find pre-trained models on **Hugging Face Hub** - think of it as the "GitHub for AI models."

Today, we'll learn how to navigate this marketplace like a pro!

---

## ELI5: The Hugging Face Hub

> **Imagine you're building with LEGO...**
>
> Instead of making every brick from scratch, you go to a store where other builders share their pre-made structures. Someone already built a perfect castle tower? Great, you can use it in your medieval village!
>
> The Hugging Face Hub is exactly like that - a store where AI researchers share their "LEGO structures" (trained models). Each comes with:
> - **Instructions** (model cards) - how to use it
> - **Reviews** (downloads/likes) - how popular it is
> - **Demo** (inference API) - try before you download
>
> **In AI terms:** It's a platform hosting 500,000+ pre-trained models for tasks like text classification, image generation, speech recognition, and more. You can download any model with one line of code!

---

## Part 1: Setting Up

### 1.1 Install and Import Required Libraries

In [None]:
# Install required packages (if not already installed)
# !pip install transformers datasets huggingface_hub accelerate -q

import torch
from huggingface_hub import HfApi, hf_hub_download, list_models
from transformers import AutoTokenizer, AutoModel, AutoModelForSequenceClassification
from transformers import pipeline
import time
from datetime import datetime
from dataclasses import dataclass
from typing import List, Dict, Any, Optional
import json

# Check our environment
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

### 1.2 Initialize the Hugging Face API

The `HfApi` class is your gateway to programmatically interact with the Hub.

In [None]:
# Initialize the Hugging Face API client
api = HfApi()

# Test the connection by getting info about a popular model
model_info = api.model_info("bert-base-uncased")
print(f"Model ID: {model_info.id}")
print(f"Downloads: {model_info.downloads:,}")
print(f"Likes: {model_info.likes}")
print(f"Library: {model_info.library_name}")
print(f"\nConnection successful!")

### What Just Happened?

We just queried the Hugging Face Hub to get information about the `bert-base-uncased` model. This is one of the most downloaded models - a foundational language model from Google that's been used as a starting point for thousands of NLP applications.

---

## Part 2: Searching for Models

### 2.1 Understanding Model Categories

Models on the Hub are organized by **tasks**. Here are the main categories:

| Category | Tasks | Examples |
|----------|-------|----------|
| **NLP** | text-classification, ner, qa, summarization, translation | BERT, GPT, T5 |
| **Vision** | image-classification, object-detection, segmentation | ViT, YOLO, SAM |
| **Audio** | speech-recognition, text-to-speech | Whisper, Bark |
| **Multimodal** | image-to-text, text-to-image | CLIP, Stable Diffusion |
| **Generation** | text-generation, image-generation | Llama, Mistral, SDXL |

In [None]:
def search_models_by_task(
    task: str,
    limit: int = 10,
    sort: str = "downloads"
) -> List[Dict]:
    """
    Search for models by task type.
    
    Args:
        task: Task type (e.g., "text-classification", "text-generation")
        limit: Maximum number of results
        sort: Sort by "downloads" or "likes"
    
    Returns:
        List of model dictionaries
    """
    models = api.list_models(
        filter=task,
        sort=sort,
        direction=-1,  # Descending
        limit=limit
    )
    
    results = []
    for m in models:
        results.append({
            "id": m.id,
            "author": m.author,
            "downloads": m.downloads,
            "likes": m.likes,
            "tags": m.tags[:5] if m.tags else []
        })
    
    return results

# Search for sentiment analysis models
print("Top 10 Sentiment Analysis Models:")
print("=" * 70)
models = search_models_by_task("text-classification", limit=10)

for i, m in enumerate(models, 1):
    print(f"{i:2}. {m['id'][:50]:<50} | Downloads: {m['downloads']:>12,}")

### 2.2 Searching by Keywords

Sometimes you want to search by model name or specific features:

In [None]:
def search_models_by_keyword(
    keyword: str,
    author: Optional[str] = None,
    limit: int = 10
) -> List[Dict]:
    """
    Search for models by keyword.
    
    Args:
        keyword: Search term
        author: Optional author/organization filter
        limit: Maximum results
    
    Returns:
        List of model dictionaries
    """
    kwargs = {
        "search": keyword,
        "sort": "downloads",
        "direction": -1,
        "limit": limit
    }
    
    if author:
        kwargs["author"] = author
    
    models = api.list_models(**kwargs)
    
    return [
        {
            "id": m.id,
            "downloads": m.downloads,
            "pipeline_tag": m.pipeline_tag
        }
        for m in models
    ]

# Search for Llama models from Meta
print("\nLlama models from Meta:")
print("=" * 70)
llama_models = search_models_by_keyword("llama", author="meta-llama", limit=5)

for m in llama_models:
    print(f"  {m['id']:<45} | {m['pipeline_tag'] or 'N/A':<20}")

# Search for models optimized for DGX Spark (smaller, efficient models)
print("\nMistral models (great for DGX Spark):")
print("=" * 70)
mistral_models = search_models_by_keyword("mistral", author="mistralai", limit=5)

for m in mistral_models:
    print(f"  {m['id']:<45} | {m['pipeline_tag'] or 'N/A':<20}")

### Try It Yourself: Find Your Own Models

Search for models related to a task you're interested in:

<details>
<summary>Hint</summary>
Try tasks like: "question-answering", "summarization", "translation", "image-classification"
</details>

In [None]:
# YOUR CODE HERE
# Search for models in a task that interests you

my_task = "question-answering"  # Change this!

# Search and print the top 5 models
my_models = search_models_by_task(my_task, limit=5)

print(f"\nTop 5 {my_task} models:")
for i, m in enumerate(my_models, 1):
    print(f"{i}. {m['id']}")

---

## Part 3: Understanding Model Cards

### 3.1 What is a Model Card?

A **Model Card** is like a nutritional label for AI models. It tells you:
- What the model does
- How it was trained
- What data was used
- Known limitations and biases
- How to use it

Let's create a comprehensive model documentation function:

In [None]:
@dataclass
class ModelDocumentation:
    """Structured documentation for a Hugging Face model."""
    model_id: str
    author: str
    task: str
    downloads: int
    likes: int
    library: str
    tags: List[str]
    created_at: str
    last_modified: str
    model_size_gb: float = 0.0
    notes: str = ""
    tested_locally: bool = False
    local_test_result: str = ""
    
    def to_dict(self) -> Dict:
        return {
            "model_id": self.model_id,
            "author": self.author,
            "task": self.task,
            "downloads": self.downloads,
            "likes": self.likes,
            "library": self.library,
            "tags": self.tags,
            "created_at": self.created_at,
            "last_modified": self.last_modified,
            "model_size_gb": self.model_size_gb,
            "notes": self.notes,
            "tested_locally": self.tested_locally,
            "local_test_result": self.local_test_result
        }
    
    def __str__(self):
        return f"""
========================================
MODEL: {self.model_id}
========================================
Author: {self.author}
Task: {self.task}
Library: {self.library}

Popularity:
  Downloads: {self.downloads:,}
  Likes: {self.likes}

Tags: {', '.join(self.tags[:5])}

Size: {self.model_size_gb:.2f} GB (estimated)

Tested Locally: {'Yes' if self.tested_locally else 'No'}
{f'Test Result: {self.local_test_result}' if self.tested_locally else ''}

Notes: {self.notes or 'None'}
========================================
"""


def document_model(model_id: str) -> ModelDocumentation:
    """
    Create comprehensive documentation for a model.
    
    Args:
        model_id: Hugging Face model identifier
    
    Returns:
        ModelDocumentation object
    """
    info = api.model_info(model_id)
    
    # Estimate model size from siblings (files)
    total_size = 0
    if info.siblings:
        for sibling in info.siblings:
            if hasattr(sibling, 'size') and sibling.size:
                total_size += sibling.size
    
    return ModelDocumentation(
        model_id=model_id,
        author=info.author or "unknown",
        task=info.pipeline_tag or "unknown",
        downloads=info.downloads or 0,
        likes=info.likes or 0,
        library=info.library_name or "unknown",
        tags=info.tags[:10] if info.tags else [],
        created_at=str(info.created_at) if info.created_at else "unknown",
        last_modified=str(info.last_modified) if info.last_modified else "unknown",
        model_size_gb=total_size / 1e9
    )

# Document a popular model
doc = document_model("distilbert-base-uncased-finetuned-sst-2-english")
print(doc)

### 3.2 Reading the Model README

The README (model card content) contains the most detailed information:

In [None]:
def get_model_readme(model_id: str, max_chars: int = 2000) -> str:
    """
    Download and return the README content for a model.
    
    Args:
        model_id: Model identifier
        max_chars: Maximum characters to return
    
    Returns:
        README content (truncated if needed)
    """
    try:
        readme_path = hf_hub_download(repo_id=model_id, filename="README.md")
        with open(readme_path, 'r', encoding='utf-8') as f:
            content = f.read()
        
        if len(content) > max_chars:
            return content[:max_chars] + f"\n\n... [Truncated - {len(content):,} total characters]"
        return content
    except Exception as e:
        return f"Could not fetch README: {e}"

# Get the README for DistilBERT sentiment model
readme = get_model_readme("distilbert-base-uncased-finetuned-sst-2-english")
print(readme)

---

## Part 4: Testing Models Locally

### 4.1 Loading and Testing Classification Models

Now let's actually load and test some models on our DGX Spark!

In [None]:
def test_classification_model(
    model_id: str,
    test_texts: List[str],
    device: str = "cuda" if torch.cuda.is_available() else "cpu"
) -> Dict[str, Any]:
    """
    Load and test a classification model.
    
    Args:
        model_id: Model identifier
        test_texts: List of texts to classify
        device: Device to use
    
    Returns:
        Dictionary with results and timing
    """
    result = {
        "model_id": model_id,
        "success": False,
        "load_time_seconds": 0,
        "inference_time_ms": 0,
        "memory_gb": 0,
        "predictions": [],
        "error": None
    }
    
    try:
        # Clear memory
        torch.cuda.empty_cache() if device == "cuda" else None
        initial_memory = torch.cuda.memory_allocated() / 1e9 if device == "cuda" else 0
        
        # Load model with timing
        print(f"Loading {model_id}...")
        start = time.time()
        
        # Use pipeline for simplicity
        classifier = pipeline(
            "text-classification",
            model=model_id,
            device=0 if device == "cuda" else -1,
            torch_dtype=torch.bfloat16 if device == "cuda" else torch.float32
        )
        
        result["load_time_seconds"] = time.time() - start
        result["memory_gb"] = (torch.cuda.memory_allocated() / 1e9 - initial_memory) if device == "cuda" else 0
        
        print(f"  Loaded in {result['load_time_seconds']:.2f}s, using {result['memory_gb']:.2f}GB")
        
        # Run inference
        print("  Running inference...")
        start = time.time()
        predictions = classifier(test_texts)
        result["inference_time_ms"] = (time.time() - start) * 1000
        
        result["predictions"] = predictions
        result["success"] = True
        
        # Cleanup
        del classifier
        torch.cuda.empty_cache() if device == "cuda" else None
        
        print(f"  Inference took {result['inference_time_ms']:.1f}ms for {len(test_texts)} samples")
        
    except Exception as e:
        result["error"] = str(e)
        print(f"  ERROR: {e}")
    
    return result

# Test texts for sentiment analysis
test_texts = [
    "This product is amazing! Best purchase I've ever made!",
    "Terrible experience. Complete waste of money.",
    "It's okay, nothing special but gets the job done.",
    "I'm absolutely thrilled with this service!",
    "Disappointed. Would not recommend."
]

# Test the DistilBERT sentiment model
result = test_classification_model(
    "distilbert-base-uncased-finetuned-sst-2-english",
    test_texts
)

if result["success"]:
    print("\nResults:")
    print("-" * 60)
    for text, pred in zip(test_texts, result["predictions"]):
        sentiment = "POS" if pred["label"] == "POSITIVE" else "NEG"
        conf = pred["score"] * 100
        print(f"[{sentiment}] {conf:5.1f}% | {text[:50]}...")

### 4.2 Testing Text Generation Models

Let's also test a text generation model - this is where DGX Spark's 128GB memory really shines!

In [None]:
def test_generation_model(
    model_id: str,
    prompt: str,
    max_new_tokens: int = 50,
    device: str = "cuda" if torch.cuda.is_available() else "cpu"
) -> Dict[str, Any]:
    """
    Load and test a text generation model.
    """
    result = {
        "model_id": model_id,
        "success": False,
        "load_time_seconds": 0,
        "inference_time_ms": 0,
        "memory_gb": 0,
        "generated_text": "",
        "error": None
    }
    
    try:
        torch.cuda.empty_cache() if device == "cuda" else None
        initial_memory = torch.cuda.memory_allocated() / 1e9 if device == "cuda" else 0
        
        print(f"Loading {model_id}...")
        start = time.time()
        
        generator = pipeline(
            "text-generation",
            model=model_id,
            device=0 if device == "cuda" else -1,
            torch_dtype=torch.bfloat16 if device == "cuda" else torch.float32
        )
        
        result["load_time_seconds"] = time.time() - start
        result["memory_gb"] = (torch.cuda.memory_allocated() / 1e9 - initial_memory) if device == "cuda" else 0
        
        print(f"  Loaded in {result['load_time_seconds']:.2f}s, using {result['memory_gb']:.2f}GB")
        
        print("  Generating...")
        start = time.time()
        
        output = generator(
            prompt,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.7,
            pad_token_id=generator.tokenizer.eos_token_id
        )
        
        result["inference_time_ms"] = (time.time() - start) * 1000
        result["generated_text"] = output[0]["generated_text"]
        result["success"] = True
        
        del generator
        torch.cuda.empty_cache() if device == "cuda" else None
        
    except Exception as e:
        result["error"] = str(e)
        print(f"  ERROR: {e}")
    
    return result

# Test with GPT-2 (small, fast, good for demonstration)
result = test_generation_model(
    "gpt2",
    "The future of artificial intelligence is",
    max_new_tokens=50
)

if result["success"]:
    print("\nGenerated Text:")
    print("-" * 60)
    print(result["generated_text"])

---

## Part 5: Documenting Your Model Selection

### 5.1 Creating a Model Catalog

For your assignment, you need to document 10 models. Here's a structured way to do it:

In [None]:
# Models to explore (mix of different tasks)
models_to_explore = [
    # Sentiment/Classification
    "distilbert-base-uncased-finetuned-sst-2-english",
    "cardiffnlp/twitter-roberta-base-sentiment-latest",
    
    # Named Entity Recognition
    "dslim/bert-base-NER",
    
    # Question Answering
    "deepset/roberta-base-squad2",
    
    # Summarization
    "facebook/bart-large-cnn",
    
    # Text Generation
    "gpt2",
    "microsoft/phi-2",  # Great for DGX Spark!
    
    # Embeddings
    "sentence-transformers/all-MiniLM-L6-v2",
    
    # Translation
    "Helsinki-NLP/opus-mt-en-de",
    
    # Zero-shot Classification
    "facebook/bart-large-mnli"
]

# Document all models
print("Documenting 10 models...")
print("=" * 70)

model_catalog = []

for model_id in models_to_explore:
    try:
        doc = document_model(model_id)
        model_catalog.append(doc)
        print(f"[OK] {model_id}")
    except Exception as e:
        print(f"[FAIL] {model_id}: {e}")

print(f"\nSuccessfully documented {len(model_catalog)} models!")

In [None]:
# Display catalog summary
print("\nMODEL CATALOG SUMMARY")
print("=" * 90)
print(f"{'Model ID':<45} | {'Task':<20} | {'Downloads':>12}")
print("-" * 90)

for doc in sorted(model_catalog, key=lambda x: x.downloads, reverse=True):
    model_short = doc.model_id[:44] if len(doc.model_id) > 44 else doc.model_id
    task_short = doc.task[:19] if len(doc.task) > 19 else doc.task
    print(f"{model_short:<45} | {task_short:<20} | {doc.downloads:>12,}")

### 5.2 Testing Your Top 3 Models Locally

Now test the 3 models you find most interesting:

In [None]:
# Test 3 models locally
models_to_test = [
    ("distilbert-base-uncased-finetuned-sst-2-english", "classification"),
    ("dslim/bert-base-NER", "ner"),
    ("gpt2", "generation")
]

test_results = []

for model_id, task_type in models_to_test:
    print(f"\n{'='*60}")
    print(f"Testing: {model_id}")
    print(f"{'='*60}")
    
    if task_type == "classification":
        result = test_classification_model(
            model_id,
            ["This is wonderful!", "This is terrible."]
        )
    elif task_type == "generation":
        result = test_generation_model(
            model_id,
            "Once upon a time",
            max_new_tokens=30
        )
    elif task_type == "ner":
        # NER test
        try:
            ner = pipeline(
                "ner",
                model=model_id,
                aggregation_strategy="simple",
                device=0 if torch.cuda.is_available() else -1
            )
            output = ner("Apple CEO Tim Cook announced new products in Cupertino.")
            result = {"success": True, "output": output}
            print("  Entities found:")
            for ent in output:
                print(f"    - {ent['entity_group']}: '{ent['word']}' ({ent['score']:.2%})")
            del ner
            torch.cuda.empty_cache()
        except Exception as e:
            result = {"success": False, "error": str(e)}
    
    test_results.append({"model_id": model_id, "result": result})

print("\n" + "="*60)
print("All tests complete!")

### 5.3 Save Your Catalog

Save your model catalog for future reference:

In [None]:
import json
from pathlib import Path

# Convert catalog to JSON
catalog_data = {
    "created_at": datetime.now().isoformat(),
    "num_models": len(model_catalog),
    "models": [doc.to_dict() for doc in model_catalog]
}

# Save to file
output_path = Path("../data/model_catalog.json")
output_path.parent.mkdir(parents=True, exist_ok=True)

with open(output_path, 'w') as f:
    json.dump(catalog_data, f, indent=2)

print(f"Catalog saved to {output_path}")
print(f"Contains {len(model_catalog)} models")

---

## DGX Spark Model Capacity Reference

When selecting models, keep these DGX Spark limits in mind:

| Scenario | Maximum Model Size | Notes |
|----------|-------------------|-------|
| BF16 Inference | 50-55B | Native Blackwell support |
| FP8 Inference | 90-100B | Reduced precision |
| NVFP4 Inference | ~200B | Blackwell exclusive |
| Full Fine-Tuning (FP16) | 12-16B | With gradient checkpointing |
| QLoRA Fine-Tuning | 100-120B | 4-bit quantized + adapters |

---

## Common Mistakes

### Mistake 1: Not Checking Model Size Before Loading

```python
# Wrong: Trying to load a 70B model in BF16 without checking capacity
# DGX Spark BF16 limit is 50-55B!
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-70b-hf",
    torch_dtype=torch.bfloat16
)
# OOM Error!

# Right: Check model size first, use quantization for larger models
info = api.model_info("meta-llama/Llama-2-70b-hf")
print(f"Model size: {sum(s.size for s in info.siblings if s.size) / 1e9:.1f} GB")
# 70B requires FP8 or NVFP4 quantization - see Module 3.2
```

### Mistake 2: Forgetting to Use BFloat16 on DGX Spark

```python
# Wrong: Default FP32 wastes memory
model = AutoModel.from_pretrained("bert-base-uncased")

# Right: Use BF16 for Blackwell GB10 (native support)
model = AutoModel.from_pretrained(
    "bert-base-uncased",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
```

### Mistake 3: Not Clearing GPU Memory Between Models

```python
# Wrong: Memory keeps accumulating
model1 = load_model("model1")
model2 = load_model("model2")  # OOM!

# Right: Clean up between models (standard pattern)
model1 = load_model("model1")
# ... use model1 ...
del model1
gc.collect()
torch.cuda.empty_cache()
model2 = load_model("model2")  # Works!
```

---

## Checkpoint

You've learned:
- How to search the Hugging Face Hub by task and keyword
- How to read and understand model cards
- How to load and test models locally on DGX Spark
- How to document models systematically

---

## Challenge (Optional)

**Advanced Exercise**: Create a model comparison tool that:
1. Takes a task type as input
2. Finds the top 5 models for that task
3. Tests each model on the same inputs
4. Compares accuracy, speed, and memory usage
5. Generates a recommendation

<details>
<summary>Hint</summary>
Combine the search and test functions we created, then add comparison logic.
</details>

In [None]:
# YOUR CHALLENGE CODE HERE
# ...


---

## Further Reading

- [Hugging Face Hub Documentation](https://huggingface.co/docs/hub)
- [Model Cards Guide](https://huggingface.co/docs/hub/model-cards)
- [Transformers Quick Tour](https://huggingface.co/docs/transformers/quicktour)
- [DGX Spark Model Loading Best Practices](https://developer.nvidia.com/dgx-spark)

---

## Cleanup

In [None]:
# Clean up GPU memory
import gc

gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print(f"GPU memory allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
    print(f"GPU memory reserved: {torch.cuda.memory_reserved() / 1e9:.2f} GB")

print("\nLab complete!")