# Lab 2.4.1: Hugging Face Hub Exploration

**Module:** 2.4 - Hugging Face Ecosystem  
**Time:** 2 hours  
**Difficulty:** ⭐⭐ (Beginner-Intermediate)

---

## Learning Objectives

By the end of this notebook, you will:
- [ ] Navigate the Hugging Face Hub to discover models and datasets
- [ ] Understand model cards and how to evaluate model suitability
- [ ] Load pre-trained models using Auto classes
- [ ] Test models locally on your DGX Spark
- [ ] Document and compare different models for a task

---

## Prerequisites

- Completed: Module 8 (NLP & Transformers basics)
- Knowledge of: Basic Python, PyTorch tensors
- Setup: NGC container running, Hugging Face account created

---

## Real-World Context

Imagine you're a chef who just walked into the world's largest ingredient warehouse. You could spend months exploring every aisle, or you could learn to quickly find exactly what you need for your recipe.

The Hugging Face Hub is exactly that warehouse for AI - over 500,000 models and 100,000 datasets! Companies like Google, Meta, Microsoft, and thousands of researchers share their work there. Knowing how to navigate it efficiently is a superpower.

**Real examples:**
- A startup needs a sentiment analysis model → finds `cardiffnlp/twitter-roberta-base-sentiment`
- A researcher needs multilingual embeddings → discovers `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`
- A developer needs fast text generation → chooses between `gpt2`, `distilgpt2`, and `microsoft/phi-2`

---

## ELI5: What is the Hugging Face Hub?

> **Imagine you collect trading cards**, but instead of sports players, each card is a trained AI brain.
>
> Each card tells you:
> - What the AI is good at (playing chess, translating languages, detecting spam)
> - How big it is (small cards fit in your pocket, big ones need a backpack)
> - Who made it (Google's cards are usually really good!)
> - How to use it (simple instructions on the back)
>
> The Hugging Face Hub is like a giant trading card shop where:
> - Most cards are **FREE** to take home
> - You can try cards before taking them (online demos)
> - You can share your own cards
> - There are reviews telling you which cards work best
>
> **In AI terms:** The Hub hosts pre-trained models (the "brains") that anyone can download and use. Instead of training models from scratch (which costs millions), you grab a pre-trained one and fine-tune it for your specific task.

---

## Part 1: Setting Up Your Environment

In [None]:
# Install required packages
# Note: These packages are pre-installed in the NGC PyTorch container.
# Running pip install ensures you have compatible versions.
# If NOT using NGC container, ensure you have ARM64-compatible packages for DGX Spark.

!pip install -q "transformers>=4.35.0" "huggingface_hub>=0.19.0" "datasets>=2.14.0" "accelerate>=0.24.0"

print("Packages ready!")

In [None]:
# Import the libraries we'll use throughout this notebook
import torch
from transformers import (
    AutoModel, 
    AutoTokenizer, 
    AutoModelForSequenceClassification,
    AutoModelForCausalLM,
    AutoModelForQuestionAnswering
)
from huggingface_hub import HfApi
import warnings
warnings.filterwarnings('ignore')

# Check our hardware
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

### What Just Happened?

We imported the core building blocks:
- **`AutoModel`** and friends: Smart loaders that automatically pick the right model class
- **`HfApi`**: Python interface to search and interact with the Hub (use `api.list_models()` to search for models programmatically)

On your DGX Spark, you should see the Blackwell GPU with ~128GB of unified memory. This is your secret weapon!

---

## Part 2: Exploring the Hub Programmatically

While you can browse [huggingface.co](https://huggingface.co) in a web browser, real power users search programmatically. Let's learn how!

In [None]:
# Initialize the Hugging Face API
api = HfApi()

# Search for sentiment analysis models
# This is like searching "sentiment analysis" on the website
models = list(api.list_models(
    filter="text-classification",
    sort="downloads",
    direction=-1,  # Descending (most downloads first)
    limit=10
))

print("Top 10 Text Classification Models by Downloads:\n")
print(f"{'Rank':<5} {'Model Name':<50} {'Downloads':<15}")
print("=" * 70)
for i, model in enumerate(models, 1):
    downloads = model.downloads if hasattr(model, 'downloads') else 'N/A'
    print(f"{i:<5} {model.id:<50} {downloads:<15}")

### Understanding the Results

Notice how the most downloaded models often come from:
- **Research labs**: `cardiffnlp/`, `facebook/`, `google/`
- **Companies**: `microsoft/`, `distilbert-base-uncased`
- **Community**: Individual researchers sharing their work

Downloads indicate popularity, but not necessarily quality for YOUR task. Always test!

In [None]:
# Let's search for different types of models
# This shows the variety available on the Hub

task_types = [
    "text-generation",
    "text-classification",
    "question-answering",
    "summarization",
    "translation",
    "fill-mask",
    "token-classification",  # NER
    "image-classification",
    "object-detection",
    "automatic-speech-recognition"
]

print("Model counts by task type:\n")
print(f"{'Task':<35} {'Top Model':<40}")
print("=" * 75)

for task in task_types:
    try:
        top_models = list(api.list_models(filter=task, sort="downloads", direction=-1, limit=1))
        top_model = top_models[0].id if top_models else "None found"
        print(f"{task:<35} {top_model:<40}")
    except Exception as e:
        print(f"{task:<35} Error: {str(e)[:30]}")

---

## Part 3: Reading Model Cards

Model cards are like nutrition labels for AI. They tell you:
- What the model does
- How it was trained
- Known limitations and biases
- How to use it

**Good model cards are a sign of a responsible model creator!**

In [None]:
# Let's examine a popular model's metadata
model_name = "distilbert-base-uncased-finetuned-sst-2-english"

# Get detailed model information
info = api.model_info(model_name)

print(f"Model: {info.id}")
print(f"\nAuthor: {info.author}")
print(f"Downloads (last month): {info.downloads:,}")
print(f"Likes: {info.likes}")
print(f"\nTags: {info.tags}")
print(f"\nPipeline tag: {info.pipeline_tag}")
print(f"\nLibrary: {info.library_name}")

In [None]:
# Get the model card content (README)
from huggingface_hub import hf_hub_download
import os

try:
    # Download just the README
    readme_path = hf_hub_download(
        repo_id=model_name,
        filename="README.md"
    )
    
    with open(readme_path, 'r') as f:
        readme_content = f.read()
    
    # Show first 2000 characters
    print("=" * 60)
    print("MODEL CARD PREVIEW")
    print("=" * 60)
    print(readme_content[:2000])
    print("\n... [truncated] ...")
except Exception as e:
    print(f"Could not fetch README: {e}")

### What to Look for in a Model Card

**Essential sections:**
1. **Model Description**: What does it do?
2. **Intended Use**: What's it designed for?
3. **Training Data**: What was it trained on? (affects biases!)
4. **Limitations**: What does it NOT do well?
5. **How to Use**: Code examples
6. **Evaluation Results**: Benchmark scores

**Red flags:**
- No model card at all
- No information about training data
- No mention of limitations
- Very few downloads + no documentation

---

## Part 4: Loading and Testing Models

Now for the fun part - let's actually load and run some models!

### The Auto Classes Magic

Hugging Face provides "Auto" classes that automatically figure out:
- Which architecture to use (BERT, GPT, T5, etc.)
- Which weights to load
- How to configure the model

In [None]:
# Helper function to monitor memory
def print_memory_usage(label=""):
    """Print current GPU memory usage"""
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated() / 1e9
        reserved = torch.cuda.memory_reserved() / 1e9
        print(f"[{label}] GPU Memory - Allocated: {allocated:.2f} GB, Reserved: {reserved:.2f} GB")

print_memory_usage("Before loading")

### Model 1: Sentiment Analysis (Text Classification)

In [None]:
# Load a sentiment analysis model
sentiment_model_name = "distilbert-base-uncased-finetuned-sst-2-english"

print(f"Loading {sentiment_model_name}...")

# AutoTokenizer knows how to load the right tokenizer
tokenizer = AutoTokenizer.from_pretrained(sentiment_model_name)

# AutoModelForSequenceClassification knows this is a classification model
model = AutoModelForSequenceClassification.from_pretrained(
    sentiment_model_name,
    torch_dtype=torch.bfloat16  # Use bfloat16 on DGX Spark!
)

# Move to GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
model.eval()  # Set to evaluation mode

print(f"Model loaded on {device}")
print_memory_usage("After loading sentiment model")

In [None]:
# Test the sentiment model
test_texts = [
    "I absolutely loved this movie! The acting was superb.",
    "This was a complete waste of time. Terrible.",
    "It was okay, nothing special but not bad either.",
    "The DGX Spark is an amazing piece of hardware!",
    "I'm not sure how I feel about this product."
]

print("Sentiment Analysis Results:\n")
print(f"{'Text':<55} {'Sentiment':<10} {'Confidence':<10}")
print("=" * 80)

for text in test_texts:
    # Tokenize
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    # Inference
    with torch.no_grad():
        outputs = model(**inputs)
    
    # Get prediction
    probs = torch.softmax(outputs.logits, dim=1)
    prediction = torch.argmax(probs, dim=1).item()
    confidence = probs[0][prediction].item()
    
    sentiment = "POSITIVE" if prediction == 1 else "NEGATIVE"
    
    # Truncate text for display
    display_text = text[:52] + "..." if len(text) > 55 else text
    print(f"{display_text:<55} {sentiment:<10} {confidence:.2%}")

### What Just Happened?

1. **Tokenization**: Text → numbers (tokens) that the model understands
2. **Forward Pass**: Model processes tokens through layers
3. **Logits**: Raw scores for each class (positive/negative)
4. **Softmax**: Convert logits to probabilities (sum to 1)
5. **Argmax**: Pick the class with highest probability

Notice how the model is confident about clear sentiments but less certain about neutral or ambiguous text!

### Model 2: Question Answering

In [None]:
# Clear previous model to free memory
del model, tokenizer
torch.cuda.empty_cache()

print_memory_usage("After cleanup")

In [None]:
# Load a question-answering model
qa_model_name = "distilbert-base-cased-distilled-squad"

print(f"Loading {qa_model_name}...")

qa_tokenizer = AutoTokenizer.from_pretrained(qa_model_name)
qa_model = AutoModelForQuestionAnswering.from_pretrained(
    qa_model_name,
    torch_dtype=torch.bfloat16
).to(device)
qa_model.eval()

print(f"Model loaded!")
print_memory_usage("After loading QA model")

In [None]:
# Test question answering
context = """
The NVIDIA DGX Spark is a revolutionary desktop AI computer powered by the 
Blackwell GB10 Superchip. It features 128GB of unified LPDDR5X memory shared 
between the CPU and GPU, eliminating the need for data transfers. The system 
delivers up to 1 PFLOP of FP4 compute and approximately 209 TFLOPS at FP8 
precision. With 6,144 CUDA cores and 192 fifth-generation Tensor Cores, 
the DGX Spark can run 70B parameter models locally without cloud dependencies.
"""

questions = [
    "How much memory does the DGX Spark have?",
    "What chip powers the DGX Spark?",
    "How many CUDA cores does it have?",
    "What size models can it run?"
]

print("Question Answering Results:\n")
print("Context:", context[:100], "...\n")

for question in questions:
    # Tokenize question and context together
    inputs = qa_tokenizer(
        question, 
        context, 
        return_tensors="pt",
        truncation=True,
        max_length=512
    )
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    # Get answer
    with torch.no_grad():
        outputs = qa_model(**inputs)
    
    # Find start and end positions of answer
    answer_start = torch.argmax(outputs.start_logits)
    answer_end = torch.argmax(outputs.end_logits) + 1
    
    # Decode the answer
    answer_tokens = inputs["input_ids"][0][answer_start:answer_end]
    answer = qa_tokenizer.decode(answer_tokens)
    
    print(f"Q: {question}")
    print(f"A: {answer}\n")

### Model 3: Text Generation (The Fun One!)

In [None]:
# Clear previous model
del qa_model, qa_tokenizer
torch.cuda.empty_cache()

print_memory_usage("After cleanup")

In [None]:
# Load a text generation model
# GPT-2 is a classic, reliable choice
gen_model_name = "gpt2"

print(f"Loading {gen_model_name}...")

gen_tokenizer = AutoTokenizer.from_pretrained(gen_model_name)
gen_model = AutoModelForCausalLM.from_pretrained(
    gen_model_name,
    torch_dtype=torch.bfloat16
).to(device)
gen_model.eval()

# GPT-2 doesn't have a padding token by default
gen_tokenizer.pad_token = gen_tokenizer.eos_token

print(f"Model loaded!")
print_memory_usage("After loading GPT-2")

In [None]:
# Generate some text!
prompts = [
    "The future of artificial intelligence is",
    "Once upon a time in a land of GPUs,",
    "The best way to learn programming is"
]

print("Text Generation Results:\n")

for prompt in prompts:
    inputs = gen_tokenizer(prompt, return_tensors="pt").to(device)
    
    with torch.no_grad():
        outputs = gen_model.generate(
            **inputs,
            max_new_tokens=50,
            num_return_sequences=1,
            temperature=0.7,
            do_sample=True,
            pad_token_id=gen_tokenizer.eos_token_id
        )
    
    generated_text = gen_tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    print(f"Prompt: {prompt}")
    print(f"Generated: {generated_text}")
    print("-" * 60 + "\n")

### Generation Parameters Explained

| Parameter | What it does | Analogy |
|-----------|--------------|----------|
| `max_new_tokens` | Maximum words to generate | "Write at most 50 words" |
| `temperature` | Randomness (0=deterministic, 1+=creative) | "How wild can you get?" |
| `do_sample` | Whether to sample or pick best | "Roll dice vs. pick favorite" |
| `top_k` | Only consider top K options | "Choose from top 50 words" |
| `top_p` | Nucleus sampling threshold | "Consider words until 90% probability" |

---

## Part 5: Documenting Models

### Exercise: Create Your Own Model Documentation

For this task, you need to document **10 models** from the Hub. Here's a template and example.

In [None]:
# Model Documentation Template
import json
from datetime import datetime

def document_model(model_id: str) -> dict:
    """Create documentation for a Hugging Face model."""
    try:
        info = api.model_info(model_id)
        
        doc = {
            "model_id": model_id,
            "author": info.author,
            "task": info.pipeline_tag,
            "downloads": info.downloads,
            "likes": info.likes,
            "library": info.library_name,
            "tags": info.tags[:10] if info.tags else [],  # First 10 tags
            "documented_at": datetime.now().isoformat(),
            "notes": "",  # Add your own notes!
            "tested_locally": False,
            "local_test_results": ""
        }
        return doc
    except Exception as e:
        return {"model_id": model_id, "error": str(e)}

# Example: Document a model
example_doc = document_model("bert-base-uncased")
print(json.dumps(example_doc, indent=2))

### Try It Yourself: Document 10 Models

Find and document 10 models across different tasks:
- 2 text classification models
- 2 text generation models  
- 2 question answering models
- 2 named entity recognition models
- 2 of your choice!

<details>
<summary>Hint: How to find models</summary>

```python
# Search for specific tasks
text_gen_models = list(api.list_models(filter="text-generation", limit=5))
ner_models = list(api.list_models(filter="token-classification", limit=5))

# Or search by keyword
sentiment_models = list(api.list_models(search="sentiment", limit=5))
```
</details>

In [None]:
# YOUR CODE HERE: Document 10 models

my_model_docs = []

# Example models to get you started (replace with your own discoveries!)
models_to_document = [
    # Text Classification
    "distilbert-base-uncased-finetuned-sst-2-english",
    # TODO: Add 1 more text classification model
    
    # Text Generation  
    "gpt2",
    # TODO: Add 1 more text generation model
    
    # Question Answering
    "distilbert-base-cased-distilled-squad",
    # TODO: Add 1 more QA model
    
    # Named Entity Recognition
    # TODO: Add 2 NER models
    
    # Your choice!
    # TODO: Add 2 models of any type
]

for model_id in models_to_document:
    doc = document_model(model_id)
    my_model_docs.append(doc)
    print(f"Documented: {model_id}")

print(f"\nTotal models documented: {len(my_model_docs)}")

---

## Part 6: Testing Models Locally

For the deliverable, you need to **test 3 models locally**. Let's create a testing framework.

In [None]:
import time

def test_model_locally(model_id: str, task: str, test_input: str) -> dict:
    """
    Test a model locally and return results.
    
    Args:
        model_id: HuggingFace model identifier
        task: Type of task (classification, generation, qa)
        test_input: Sample input to test with
    
    Returns:
        Dictionary with test results
    """
    results = {
        "model_id": model_id,
        "task": task,
        "test_input": test_input,
        "success": False,
        "output": None,
        "load_time_seconds": 0,
        "inference_time_ms": 0,
        "memory_used_gb": 0,
        "error": None
    }
    
    try:
        # Clear memory first
        torch.cuda.empty_cache()
        initial_memory = torch.cuda.memory_allocated() / 1e9
        
        # Time model loading
        start_load = time.time()
        
        tokenizer = AutoTokenizer.from_pretrained(model_id)
        
        # Load appropriate model class based on task
        if task == "classification":
            model = AutoModelForSequenceClassification.from_pretrained(
                model_id, torch_dtype=torch.bfloat16
            ).to(device).eval()
        elif task == "generation":
            model = AutoModelForCausalLM.from_pretrained(
                model_id, torch_dtype=torch.bfloat16
            ).to(device).eval()
            if tokenizer.pad_token is None:
                tokenizer.pad_token = tokenizer.eos_token
        elif task == "qa":
            model = AutoModelForQuestionAnswering.from_pretrained(
                model_id, torch_dtype=torch.bfloat16
            ).to(device).eval()
        else:
            raise ValueError(f"Unknown task: {task}")
        
        results["load_time_seconds"] = time.time() - start_load
        results["memory_used_gb"] = (torch.cuda.memory_allocated() / 1e9) - initial_memory
        
        # Time inference
        start_inference = time.time()
        
        inputs = tokenizer(test_input, return_tensors="pt", truncation=True, max_length=512)
        inputs = {k: v.to(device) for k, v in inputs.items()}
        
        with torch.no_grad():
            if task == "classification":
                outputs = model(**inputs)
                probs = torch.softmax(outputs.logits, dim=1)
                pred = torch.argmax(probs, dim=1).item()
                conf = probs[0][pred].item()
                results["output"] = f"Class {pred} (confidence: {conf:.2%})"
            elif task == "generation":
                outputs = model.generate(
                    **inputs, max_new_tokens=30, 
                    do_sample=True, temperature=0.7,
                    pad_token_id=tokenizer.eos_token_id
                )
                results["output"] = tokenizer.decode(outputs[0], skip_special_tokens=True)
            elif task == "qa":
                # For QA, test_input should be "question ||| context"
                if "|||" in test_input:
                    q, c = test_input.split("|||")
                    inputs = tokenizer(q.strip(), c.strip(), return_tensors="pt", truncation=True)
                    inputs = {k: v.to(device) for k, v in inputs.items()}
                outputs = model(**inputs)
                start_idx = torch.argmax(outputs.start_logits)
                end_idx = torch.argmax(outputs.end_logits) + 1
                answer = tokenizer.decode(inputs["input_ids"][0][start_idx:end_idx])
                results["output"] = answer
        
        results["inference_time_ms"] = (time.time() - start_inference) * 1000
        results["success"] = True
        
        # Cleanup
        del model, tokenizer
        torch.cuda.empty_cache()
        
    except Exception as e:
        results["error"] = str(e)
    
    return results

In [None]:
# Test 3 models locally
tests = [
    {
        "model_id": "distilbert-base-uncased-finetuned-sst-2-english",
        "task": "classification",
        "test_input": "This product exceeded all my expectations!"
    },
    {
        "model_id": "gpt2",
        "task": "generation",
        "test_input": "Machine learning is"
    },
    {
        "model_id": "distilbert-base-cased-distilled-squad",
        "task": "qa",
        "test_input": "What is the capital of France? ||| France is a country in Europe. Its capital is Paris, a beautiful city known for the Eiffel Tower."
    }
]

print("Local Model Testing Results\n")
print("=" * 70)

all_test_results = []
for test in tests:
    print(f"\nTesting: {test['model_id']}")
    print(f"Task: {test['task']}")
    
    result = test_model_locally(**test)
    all_test_results.append(result)
    
    if result["success"]:
        print(f"Status: SUCCESS")
        print(f"Load time: {result['load_time_seconds']:.2f}s")
        print(f"Inference time: {result['inference_time_ms']:.2f}ms")
        print(f"Memory used: {result['memory_used_gb']:.2f} GB")
        print(f"Output: {result['output'][:100]}..." if len(str(result['output'])) > 100 else f"Output: {result['output']}")
    else:
        print(f"Status: FAILED")
        print(f"Error: {result['error']}")
    
    print("-" * 70)

---

## Common Mistakes

### Mistake 1: Not Using the Right Auto Class

In [None]:
# WRONG: Using AutoModel for sequence classification
# This loads the base model without the classification head!

# model = AutoModel.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
# This won't have the classification layers!

# CORRECT: Use task-specific Auto class
# model = AutoModelForSequenceClassification.from_pretrained(
#     "distilbert-base-uncased-finetuned-sst-2-english"
# )

print("Auto class cheat sheet:")
print("- AutoModelForSequenceClassification → sentiment, topic classification")
print("- AutoModelForTokenClassification → NER, POS tagging")
print("- AutoModelForQuestionAnswering → extractive QA")
print("- AutoModelForCausalLM → text generation (GPT-style)")
print("- AutoModelForSeq2SeqLM → translation, summarization (T5-style)")
print("- AutoModelForMaskedLM → fill-in-the-blank (BERT-style)")

### Mistake 2: Forgetting to Set Evaluation Mode

In [None]:
# WRONG: Forgetting model.eval()
# Dropout and batch norm behave differently in training vs eval!

# model = AutoModel.from_pretrained("bert-base-uncased")
# output = model(**inputs)  # Still in training mode!

# CORRECT:
# model = AutoModel.from_pretrained("bert-base-uncased")
# model.eval()  # Set to evaluation mode
# with torch.no_grad():  # Also disable gradient computation
#     output = model(**inputs)

print("Always use model.eval() and torch.no_grad() for inference!")

### Mistake 3: Not Moving Inputs to the Same Device

In [None]:
# WRONG: Model on GPU, inputs on CPU
# model = model.to("cuda")
# inputs = tokenizer("Hello", return_tensors="pt")  # On CPU!
# output = model(**inputs)  # ERROR: tensors on different devices

# CORRECT:
# model = model.to("cuda")
# inputs = tokenizer("Hello", return_tensors="pt")
# inputs = {k: v.to("cuda") for k, v in inputs.items()}  # Move to GPU!
# output = model(**inputs)

print("Remember: model.to(device) AND inputs.to(device)!")

### Mistake 4: Using Wrong dtype

In [None]:
# On DGX Spark, bfloat16 is optimal
# float16 can cause issues with some operations
# float32 uses 2x memory unnecessarily

print("dtype recommendations for DGX Spark:")
print("- bfloat16 (preferred): Best balance of speed and stability")
print("- float16: Faster but may have numerical issues")
print("- float32: Most stable but uses 2x memory")

---

## Checkpoint

You've learned:
- ✅ How to search the Hugging Face Hub programmatically
- ✅ How to read and interpret model cards
- ✅ How to load models using Auto classes
- ✅ How to test models for classification, QA, and generation
- ✅ How to document models for your own reference

---

## Challenge: Find a Hidden Gem

Find a model that:
1. Has fewer than 10,000 downloads
2. Performs a task you're interested in
3. Has a good model card
4. Actually works when you test it!

Document why you think this is an underrated model.

In [None]:
# YOUR CHALLENGE CODE HERE
# Find your hidden gem!

# Hint: Try searching for specific domains or languages
# less_popular = list(api.list_models(
#     filter="your-task",
#     sort="downloads",
#     direction=1,  # Ascending - fewer downloads first
#     limit=50
# ))


---

## Further Reading

- [Hugging Face Hub Documentation](https://huggingface.co/docs/hub)
- [Model Cards Paper](https://arxiv.org/abs/1810.03993)
- [Transformers Documentation](https://huggingface.co/docs/transformers)
- [Best Practices for Model Selection](https://huggingface.co/blog/model-selection)

---

## Cleanup

## Optional: Using the Utility Scripts

This module includes utility scripts that provide reusable functions for common tasks.
You can import and use them in your own projects:

```python
# From the module directory, you can import the utility scripts:
from scripts.hub_utils import search_models, document_model, test_model_locally
from scripts.training_utils import create_training_args, compute_metrics_factory
from scripts.peft_utils import create_lora_config, apply_lora

# Example: Search for models
models = search_models(task="text-classification", limit=5)

# Example: Document a model with full metadata
doc = document_model("distilbert-base-uncased-finetuned-sst-2-english")
print(doc.to_json())
```

See the `scripts/` directory for full documentation of available utilities.

In [None]:
# Clear GPU memory
import gc

# Clear any loaded models
try:
    del gen_model, gen_tokenizer
except NameError:
    pass

torch.cuda.empty_cache()
gc.collect()

print("Memory cleaned up!")
print_memory_usage("Final")

---

## Next Steps

In the next notebook, **02-pipeline-showcase.ipynb**, we'll explore the Pipeline API - an even easier way to use models without writing tokenization and post-processing code!

Great job completing Lab 2.4.1!