# Chapter 3: Setting Up Your Hugging Face Environment

This notebook contains all examples from Chapter 3, demonstrating how to set up and use the Hugging Face ecosystem.

## 1. Environment Verification

First, let's verify that all required packages are installed correctly.

In [2]:
# Verify installations
import transformers
import datasets
import accelerate
import torch
import huggingface_hub

print("Transformers:", transformers.__version__)
print("Datasets:", datasets.__version__)
print("Accelerate:", accelerate.__version__)
print("PyTorch:", torch.__version__)
print("HF Hub:", huggingface_hub.__version__)

# Check device availability
print("\nDevice Information:")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
if hasattr(torch.backends, "mps"):
    print(f"MPS available: {torch.backends.mps.is_available()}")

Transformers: 4.39.3
Datasets: 3.6.0
Accelerate: 1.8.1
PyTorch: 2.7.1
HF Hub: 0.33.2

Device Information:
CUDA available: False
MPS available: True


## 2. Basic Pipeline Example

HuggingFace pipelines provide a simple API for common NLP tasks.

In [3]:
from transformers import pipeline

# Create a sentiment analysis pipeline
classifier = pipeline("sentiment-analysis")

# Test it
results = classifier([
    "I love HuggingFace!",
    "This is terrible.",
    "The weather is okay today."
])

for text, result in zip(["I love HuggingFace!", "This is terrible.", "The weather is okay today."], results):
    print(f'"{text}" -> {result["label"]} (score: {result["score"]:.3f})')

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


"I love HuggingFace!" -> POSITIVE (score: 1.000)
"This is terrible." -> NEGATIVE (score: 1.000)
"The weather is okay today." -> POSITIVE (score: 1.000)


## 3. HuggingFace Hub API

Explore models available on the HuggingFace Hub.

In [None]:
from huggingface_hub import HfApi, ModelFilter

# Create an API client
api = HfApi()

# List text-classification models
models = api.list_models(filter=ModelFilter(task="text-classification"))
model_list = list(models)

print(f"Found {len(model_list)} text-classification models!")
print("\nTop 5 most downloaded:")

# Show top 5
sorted_models = sorted(model_list, key=lambda x: x.downloads or 0, reverse=True)[:5]
for i, model in enumerate(sorted_models, 1):
    print(f"{i}. {model.modelId} (Downloads: {model.downloads:,})")

## 4. Model Download Example

Download and use a specific model with tokenizer.

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_name = "distilbert-base-uncased-finetuned-sst-2-english"

# Download tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

print(f"Model loaded: {model_name}")
print(f"Model type: {model.config.model_type}")
print(f"Number of parameters: {sum(p.numel() for p in model.parameters()):,}")

In [None]:
# Use the model for inference
import torch

text = "HuggingFace makes NLP so much easier!"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    
print(f"Text: '{text}'")
print(f"Prediction: Negative={predictions[0][0]:.3f}, Positive={predictions[0][1]:.3f}")

## 5. Translation Pipeline

Demonstrate translation with batch processing.

In [None]:
# Create translation pipeline
device = 0 if torch.cuda.is_available() else -1

translator = pipeline(
    "translation_en_to_fr",
    model="Helsinki-NLP/opus-mt-en-fr",
    device=device
)

sentences = [
    "Hugging Face makes AI easy.",
    "Transformers are powerful."
]

# Translate with batch processing
translations = translator(sentences, batch_size=2)

for original, result in zip(sentences, translations):
    print(f"EN: {original}")
    print(f"FR: {result['translation_text']}\n")

## 6. Text Generation

Generate text using GPT-2 model.

In [None]:
# Text generation pipeline
generator = pipeline(
    "text-generation",
    model="distilgpt2",
    device=device
)

prompt = "The future of artificial intelligence is"

# Generate text
result = generator(
    prompt,
    max_length=50,
    num_return_sequences=2,
    temperature=0.8
)

print(f"Prompt: '{prompt}'")
print("\nGenerated continuations:")
for i, generated in enumerate(result, 1):
    print(f"\n{i}. {generated['generated_text']}")

## 7. Zero-Shot Classification

Classify text without training on specific labels.

In [None]:
# Zero-shot classification
classifier = pipeline("zero-shot-classification", device=device)

text = "This is a tutorial about natural language processing with transformers."
candidate_labels = ["education", "politics", "entertainment", "technology", "sports"]

result = classifier(text, candidate_labels)

print(f"Text: '{text}'")
print("\nClassification scores:")
for label, score in zip(result['labels'], result['scores']):
    print(f"  {label}: {score:.3f}")

## 8. Question Answering

Extract answers from context using QA models.

In [None]:
# Question answering pipeline
qa_pipeline = pipeline("question-answering", device=device)

context = """
HuggingFace is a company that develops tools for building applications using machine learning.
It is most notable for its Transformers library built for natural language processing applications
and its platform that allows users to share machine learning models and datasets.
The company was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf.
"""

questions = [
    "What is HuggingFace?",
    "When was the company founded?",
    "Who founded HuggingFace?"
]

for question in questions:
    result = qa_pipeline(question=question, context=context)
    print(f"Q: {question}")
    print(f"A: {result['answer']} (score: {result['score']:.3f})\n")

## 9. Named Entity Recognition

Identify entities in text.

In [None]:
# NER pipeline
ner = pipeline("ner", aggregation_strategy="simple", device=device)

text = "Apple Inc. was founded by Steve Jobs in Cupertino, California. The company is now led by Tim Cook."

entities = ner(text)

print(f"Text: '{text}'")
print("\nEntities found:")
for entity in entities:
    print(f"  {entity['word']} -> {entity['entity_group']} (score: {entity['score']:.3f})")

## 10. Model Comparison

Compare different models for the same task.

In [None]:
# Compare sentiment analysis models
models = [
    "distilbert-base-uncased-finetuned-sst-2-english",
    "nlptown/bert-base-multilingual-uncased-sentiment"
]

text = "This product is amazing! I highly recommend it."

for model_name in models:
    try:
        classifier = pipeline("sentiment-analysis", model=model_name, device=device)
        result = classifier(text)
        print(f"Model: {model_name}")
        print(f"Result: {result[0]}\n")
    except Exception as e:
        print(f"Model: {model_name}")
        print(f"Error: {str(e)[:100]}...\n")

## 11. Batch Processing Performance

Demonstrate the performance benefits of batch processing.

In [None]:
import time

# Create sentiment analysis pipeline
classifier = pipeline("sentiment-analysis", device=device)

# Test texts
texts = [
    "I love this!",
    "This is terrible.",
    "Not bad at all.",
    "Could be better.",
    "Absolutely fantastic!",
    "Waste of time.",
    "Pretty good overall.",
    "Highly disappointed."
]

# Single processing
start = time.time()
single_results = []
for text in texts:
    result = classifier(text)
    single_results.append(result)
single_time = time.time() - start

# Batch processing
start = time.time()
batch_results = classifier(texts, batch_size=4)
batch_time = time.time() - start

print(f"Single processing time: {single_time:.3f}s")
print(f"Batch processing time: {batch_time:.3f}s")
print(f"Speedup: {single_time/batch_time:.2f}x")

## 12. Cache Information

Check the HuggingFace cache directory.

In [None]:
import os
from pathlib import Path

# Get cache directory
cache_dir = Path.home() / ".cache" / "huggingface" / "hub"

if cache_dir.exists():
    # Count cached models
    model_dirs = [d for d in cache_dir.iterdir() if d.is_dir()]
    
    # Calculate total size
    total_size = sum(f.stat().st_size for f in cache_dir.rglob('*') if f.is_file())
    
    print(f"Cache directory: {cache_dir}")
    print(f"Number of cached models: {len(model_dirs)}")
    print(f"Total cache size: {total_size / (1024**3):.2f} GB")
else:
    print("No cache directory found yet.")

## 13. HuggingFace Spaces Example

Example code for deploying to HuggingFace Spaces with Gradio.

In [None]:
# Example Gradio app for HuggingFace Spaces
# Note: This is example code - Gradio needs to be installed separately

example_gradio_code = '''
import gradio as gr
from transformers import pipeline

# Initialize pipeline
classifier = pipeline("sentiment-analysis")

def analyze_sentiment(text):
    results = classifier(text)
    return {
        "label": results[0]["label"],
        "score": results[0]["score"]
    }

# Create Gradio interface
iface = gr.Interface(
    fn=analyze_sentiment,
    inputs=gr.Textbox(lines=3, placeholder="Enter text to analyze..."),
    outputs=gr.JSON(),
    title="Sentiment Analysis Demo",
    description="Analyze the sentiment of your text using HuggingFace Transformers",
    examples=[
        ["I love this product!"],
        ["This is terrible."],
        ["It's okay, nothing special."]
    ]
)

# Launch the app
iface.launch()
'''

print("Example Gradio app for HuggingFace Spaces:")
print("=" * 50)
print(example_gradio_code)

## Summary

This notebook covered the essential components of setting up and using the HuggingFace ecosystem:

1. **Environment Verification** - Checking installations
2. **Pipelines** - Simple API for common NLP tasks
3. **Hub API** - Discovering and exploring models
4. **Model Loading** - Downloading and using specific models
5. **Various NLP Tasks** - Translation, generation, QA, NER, etc.
6. **Performance** - Batch processing benefits
7. **Deployment** - Example for HuggingFace Spaces

These examples provide a solid foundation for working with HuggingFace Transformers!