# Module 04 - Notebook 01: Hugging Face Hub

## Learning Objectives
- Navigate Hugging Face Hub to find models
- Understand model cards and documentation
- Compare different open source models
- Select appropriate models for tasks

## Prerequisites
- Hugging Face account (free): https://huggingface.co/join
- HF token (get from https://huggingface.co/settings/tokens)

---

## 1. Setup and Authentication

First, let's install the Hugging Face Hub library and set up authentication.

In [None]:
# Install required packages
!pip install -q huggingface-hub python-dotenv

In [None]:
import os
from dotenv import load_dotenv
from huggingface_hub import HfApi, list_models, model_info

# Load environment variables
load_dotenv()

# Get your HF token from .env file or set it here
HF_TOKEN = os.getenv("HUGGINGFACE_TOKEN")

if not HF_TOKEN:
    print("‚ö†Ô∏è Warning: HUGGINGFACE_TOKEN not found in .env file")
    print("Get your token from: https://huggingface.co/settings/tokens")
else:
    print("‚úì Token loaded successfully")

# Initialize the API
api = HfApi(token=HF_TOKEN)

## 2. Browsing Models

Let's explore popular text generation models on Hugging Face Hub.

In [None]:
# List popular text generation models
print("Top 10 Text Generation Models by Downloads:\n")

models = list_models(
    filter="text-generation",
    sort="downloads",
    direction=-1,
    limit=10
)

for i, model in enumerate(models, 1):
    print(f"{i}. {model.id}")
    print(f"   Downloads: {model.downloads:,}")
    print(f"   Likes: {model.likes}")
    print()

## 3. Reading Model Cards

Model cards contain crucial information about a model's capabilities, limitations, and usage.

In [None]:
# Get detailed info about a specific model
model_name = "meta-llama/Llama-3.2-3B-Instruct"

try:
    info = model_info(model_name)
    
    print(f"Model: {info.id}")
    print(f"Author: {info.author}")
    print(f"Downloads: {info.downloads:,}")
    print(f"Likes: {info.likes}")
    print(f"\nTags: {', '.join(info.tags[:10])}")
    print(f"\nLibrary: {info.library_name}")
    print(f"Pipeline Tag: {info.pipeline_tag}")
    
    if info.card_data:
        print(f"\nLicense: {info.card_data.license}")
        
except Exception as e:
    print(f"Error: {e}")
    print("Note: You may need authentication for some models")

## 4. Comparing Models

Let's compare several popular open source LLMs.

In [None]:
import pandas as pd

# Models to compare
models_to_compare = [
    "meta-llama/Llama-3.2-3B-Instruct",
    "mistralai/Mistral-7B-Instruct-v0.3",
    "microsoft/Phi-3-mini-4k-instruct",
    "google/gemma-2b-it",
]

comparison_data = []

for model_name in models_to_compare:
    try:
        info = model_info(model_name)
        comparison_data.append({
            "Model": info.id.split("/")[-1],
            "Organization": info.author,
            "Downloads": info.downloads,
            "Likes": info.likes,
            "License": info.card_data.license if info.card_data else "Unknown",
        })
    except Exception as e:
        print(f"Could not fetch {model_name}: {e}")

# Display comparison
df = pd.DataFrame(comparison_data)
print("\nModel Comparison:")
print(df.to_string(index=False))

## 5. Searching by Task

Find models for specific tasks like summarization, translation, etc.

In [None]:
# Search for summarization models
print("Top 5 Summarization Models:\n")

summarization_models = list_models(
    filter="summarization",
    sort="downloads",
    direction=-1,
    limit=5
)

for i, model in enumerate(summarization_models, 1):
    print(f"{i}. {model.id}")
    print(f"   Downloads: {model.downloads:,}\n")

## 6. Model Selection Criteria

When choosing a model, consider:

### Size
- **Small (< 3B)**: Fast, low memory, good for simple tasks
- **Medium (3-13B)**: Balanced performance and resource usage
- **Large (> 13B)**: Best quality, requires significant resources

### License
- **MIT/Apache 2.0**: Permissive, commercial use OK
- **Llama License**: Restrictions on commercial use for large companies
- **Gemma License**: Google-specific terms

### Performance Metrics
- Check benchmarks (MMLU, HumanEval, etc.)
- Review community feedback
- Test on your specific use case

## Exercise: Find a Model for Your Use Case

Complete the following task:

1. Choose a task (e.g., code generation, creative writing, Q&A)
2. Find 3 suitable models
3. Compare their licenses, sizes, and popularity
4. Document your choice and reasoning

In [None]:
# TODO: Complete this exercise
# Your task: "______"
# Your top 3 models:
# 1. ______
# 2. ______
# 3. ______
# Your final choice: ______
# Reasoning: ______

# Write your code here to research and compare models


## Summary

In this notebook, you learned:
- ‚úÖ How to authenticate with Hugging Face Hub
- ‚úÖ How to browse and search for models
- ‚úÖ How to read model cards and compare models
- ‚úÖ Model selection criteria

## Next Steps
- üìò Proceed to Notebook 02: Inference SDK
- üîó Explore the [Hugging Face Model Hub](https://huggingface.co/models)
- üìö Read about [model licensing](https://huggingface.co/docs/hub/model-cards)