# üöÄ Chapter 2: Getting Started with Hugging Face

This notebook will guide you through setting up your environment for working with Hugging Face, understanding GPU acceleration, and interacting programmatically with the Hugging Face Hub.

---

## üìö What You'll Learn

| Section | Topic | Description |
|---------|-------|-------------|
| 1 | **Environment Setup** | Managing Python environments for ML projects |
| 2 | **Installing Transformers** | Setting up the core Hugging Face library |
| 3 | **GPU Acceleration** | Using CUDA/MPS for faster inference |
| 4 | **Hugging Face Hub** | Programmatic access to models and datasets |
| 5 | **Cache Management** | Managing downloaded model files |

---

## 1Ô∏è‚É£ Environment Setup

### Why Virtual Environments Matter

Virtual environments are **essential** for machine learning projects because:

| Problem | Solution |
|---------|----------|
| Different projects need different package versions | Isolated environments prevent conflicts |
| System Python can break with ML packages | Virtual envs protect your system |
| Reproducibility across machines | Export and share environment specs |
| Clean experimentation | Easy to delete and recreate |

### Environment Options

```bash
# Option 1: Using Conda (Recommended for ML)
conda create -n huggingface python=3.10
conda activate huggingface

# Option 2: Using venv (Built into Python)
python -m venv huggingface_env
# Windows: huggingface_env\Scripts\activate
# macOS/Linux: source huggingface_env/bin/activate

# Option 3: Using virtualenv
pip install virtualenv
virtualenv huggingface_env
```

> üí° **Tip**: Conda is particularly useful for ML because it can handle non-Python dependencies like CUDA libraries.

---

## 2Ô∏è‚É£ Installing the Transformers Library

Let's install all the necessary packages for this chapter.

In [1]:
# Uncomment to install required packages
# %pip install transformers torch accelerate huggingface_hub gputil psutil -q

In [2]:
# Let's verify our installations and check library versions
import transformers
import torch
import huggingface_hub

print("üì¶ Installed Library Versions")
print("=" * 40)
print(f"ü§ó Transformers: {transformers.__version__}")
print(f"üî• PyTorch: {torch.__version__}")
print(f"üåê Hugging Face Hub: {huggingface_hub.__version__}")

üì¶ Installed Library Versions
ü§ó Transformers: 4.57.3
üî• PyTorch: 2.9.0+cu126
üåê Hugging Face Hub: 0.36.0


---

## 3Ô∏è‚É£ GPU Acceleration

GPU acceleration can make your models run **10-100x faster** than on CPU. Let's understand the different hardware options.

### Hardware Options

| Platform | GPU Technology | Detection Method |
|----------|---------------|------------------|
| Windows/Linux | NVIDIA CUDA | `torch.cuda.is_available()` |
| macOS (Apple Silicon) | MPS (Metal) | `torch.backends.mps.is_available()` |
| Any | CPU (fallback) | Always available |

### 3.1 Detecting Available Hardware

In [3]:
import torch
import platform

def get_system_info():
    """Get comprehensive system and hardware information."""
    
    print("üñ•Ô∏è System Information")
    print("=" * 50)
    print(f"  OS: {platform.system()} {platform.release()}")
    print(f"  Python: {platform.python_version()}")
    print(f"  Machine: {platform.machine()}")
    print()
    
    print("üéÆ GPU Detection")
    print("=" * 50)
    
    # Check CUDA (NVIDIA)
    cuda_available = torch.cuda.is_available()
    print(f"  CUDA Available: {'‚úÖ Yes' if cuda_available else '‚ùå No'}")
    
    if cuda_available:
        print(f"  CUDA Version: {torch.version.cuda}")
        print(f"  GPU Count: {torch.cuda.device_count()}")
        for i in range(torch.cuda.device_count()):
            gpu_name = torch.cuda.get_device_name(i)
            gpu_memory = torch.cuda.get_device_properties(i).total_memory / (1024**3)
            print(f"  GPU {i}: {gpu_name} ({gpu_memory:.1f} GB)")
    
    # Check MPS (Apple Silicon)
    mps_available = hasattr(torch.backends, 'mps') and torch.backends.mps.is_available()
    print(f"  MPS Available: {'‚úÖ Yes' if mps_available else '‚ùå No'}")
    
    return cuda_available, mps_available

cuda_available, mps_available = get_system_info()

üñ•Ô∏è System Information
  OS: Linux 6.6.105+
  Python: 3.12.12
  Machine: x86_64

üéÆ GPU Detection
  CUDA Available: ‚úÖ Yes
  CUDA Version: 12.6
  GPU Count: 1
  GPU 0: Tesla T4 (14.7 GB)
  MPS Available: ‚ùå No


### 3.2 Advanced GPU Monitoring

For NVIDIA GPUs, we can get detailed utilization statistics.

In [4]:
def get_gpu_stats():
    """Get detailed GPU statistics for NVIDIA cards."""
    
    try:
        import GPUtil
        gpus = GPUtil.getGPUs()
        
        if not gpus:
            print("‚ö†Ô∏è No NVIDIA GPUs detected via GPUtil")
            return
        
        print("üìä Detailed GPU Statistics")
        print("=" * 60)
        
        for gpu in gpus:
            print(f"\nüéÆ GPU {gpu.id}: {gpu.name}")
            print(f"   ‚îú‚îÄ Memory Total: {gpu.memoryTotal:.0f} MB")
            print(f"   ‚îú‚îÄ Memory Used: {gpu.memoryUsed:.0f} MB ({gpu.memoryUsed/gpu.memoryTotal*100:.1f}%)")
            print(f"   ‚îú‚îÄ Memory Free: {gpu.memoryFree:.0f} MB")
            print(f"   ‚îú‚îÄ GPU Load: {gpu.load*100:.1f}%")
            print(f"   ‚îî‚îÄ Temperature: {gpu.temperature}¬∞C")
            
    except ImportError:
        print("üí° Install GPUtil for detailed GPU stats: pip install gputil")
    except Exception as e:
        print(f"‚ö†Ô∏è Could not get GPU stats: {e}")

get_gpu_stats()

üìä Detailed GPU Statistics

üéÆ GPU 0: Tesla T4
   ‚îú‚îÄ Memory Total: 15360 MB
   ‚îú‚îÄ Memory Used: 5938 MB (38.7%)
   ‚îú‚îÄ Memory Free: 9156 MB
   ‚îú‚îÄ GPU Load: 0.0%
   ‚îî‚îÄ Temperature: 47.0¬∞C


### 3.3 Smart Device Selection

Let's create a utility function that automatically selects the best available device.

In [5]:
def get_optimal_device(prefer_gpu=True, verbose=True):
    """
    Automatically select the best available compute device.
    
    Args:
        prefer_gpu: Whether to prefer GPU over CPU
        verbose: Whether to print selection info
    
    Returns:
        device: The optimal device for computation
        device_id: Device ID for pipeline() function (-1 for CPU, 0+ for GPU)
    """
    
    if prefer_gpu:
        # Priority 1: NVIDIA CUDA
        if torch.cuda.is_available():
            device = torch.device("cuda")
            device_id = 0
            device_name = torch.cuda.get_device_name(0)
            if verbose:
                print(f"üöÄ Selected: CUDA GPU ({device_name})")
        
        # Priority 2: Apple MPS
        elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
            device = torch.device("mps")
            device_id = "mps"
            if verbose:
                print("üçé Selected: Apple Metal (MPS)")
        
        # Fallback: CPU
        else:
            device = torch.device("cpu")
            device_id = -1
            if verbose:
                print("üíª Selected: CPU (no GPU available)")
    else:
        device = torch.device("cpu")
        device_id = -1
        if verbose:
            print("üíª Selected: CPU (GPU disabled)")
    
    return device, device_id

# Get the optimal device
device, device_id = get_optimal_device()

üöÄ Selected: CUDA GPU (Tesla T4)


### 3.4 GPU vs CPU Performance Comparison

Let's demonstrate the performance difference between GPU and CPU!

In [6]:
import time
from transformers import pipeline

# Sample movie reviews for sentiment analysis
movie_reviews = [
    "This film is a masterpiece of modern cinema! The acting was phenomenal.",
    "Terrible waste of time. The plot made no sense whatsoever.",
    "A heartwarming story that will make you laugh and cry.",
    "The special effects were amazing but the story was lacking.",
    "One of the best movies I've ever seen. Highly recommended!",
    "Boring and predictable. I fell asleep halfway through.",
    "An absolute thrill ride from start to finish!",
    "The director has outdone themselves with this one."
] * 5  # Multiply for better timing accuracy

def benchmark_device(device_id, device_name, texts):
    """Benchmark sentiment analysis on a specific device."""
    
    try:
        # Create pipeline on specified device
        classifier = pipeline(
            "sentiment-analysis",
            model="distilbert-base-uncased-finetuned-sst-2-english",
            device=device_id
        )
        
        # Warm-up run
        _ = classifier(texts[0])
        
        # Timed run
        start_time = time.time()
        results = classifier(texts)
        end_time = time.time()
        
        elapsed = end_time - start_time
        throughput = len(texts) / elapsed
        
        return elapsed, throughput, results
        
    except Exception as e:
        print(f"‚ö†Ô∏è Error on {device_name}: {e}")
        return None, None, None

print("üèéÔ∏è Performance Benchmark: GPU vs CPU")
print("=" * 50)
print(f"Processing {len(movie_reviews)} movie reviews...\n")

# Benchmark on optimal device
optimal_time, optimal_throughput, _ = benchmark_device(
    device_id, 
    "Optimal Device", 
    movie_reviews
)

if optimal_time:
    print(f"‚ö° Optimal Device ({device}):")
    print(f"   Time: {optimal_time:.3f} seconds")
    print(f"   Throughput: {optimal_throughput:.1f} reviews/second")

# Also benchmark CPU for comparison if we're using GPU
if device_id != -1:
    print("\nüìä Comparing with CPU...")
    cpu_time, cpu_throughput, _ = benchmark_device(-1, "CPU", movie_reviews)
    
    if cpu_time:
        print(f"\nüíª CPU:")
        print(f"   Time: {cpu_time:.3f} seconds")
        print(f"   Throughput: {cpu_throughput:.1f} reviews/second")
        
        speedup = cpu_time / optimal_time
        print(f"\nüöÄ GPU Speedup: {speedup:.2f}x faster!")

üèéÔ∏è Performance Benchmark: GPU vs CPU
Processing 40 movie reviews...



Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cuda:0


‚ö° Optimal Device (cuda):
   Time: 0.259 seconds
   Throughput: 154.3 reviews/second

üìä Comparing with CPU...


Device set to use cpu



üíª CPU:
   Time: 1.718 seconds
   Throughput: 23.3 reviews/second

üöÄ GPU Speedup: 6.63x faster!


---

## 4Ô∏è‚É£ Working with the Hugging Face Hub

The **Hugging Face Hub** is like GitHub for machine learning. You can:

- üì• Download models, datasets, and files
- üì§ Upload your own models
- üîê Access private repositories
- ü§ù Collaborate with the community

### 4.1 Exploring the Hub Programmatically

In [7]:
from huggingface_hub import HfApi, list_models

# Initialize the API
api = HfApi()

# Search for popular sentiment analysis models
print("üîç Top Sentiment Analysis Models on Hugging Face Hub")
print("=" * 60)

models = list(
    list_models(
        task="text-classification",
        sort="downloads",
        direction=-1,  # Descending
        limit=5
    )
)

for i, model in enumerate(models, 1):
    downloads = model.downloads if hasattr(model, 'downloads') else 'N/A'
    likes = model.likes if hasattr(model, 'likes') else 'N/A'
    print(f"\n{i}. {model.id}")
    print(f"   üì• Downloads: {downloads:,}" if isinstance(downloads, int) else f"   üì• Downloads: {downloads}")
    print(f"   ‚ù§Ô∏è Likes: {likes}")

üîç Top Sentiment Analysis Models on Hugging Face Hub



Use `filter` instead.



1. cross-encoder/ms-marco-MiniLM-L6-v2
   üì• Downloads: 4,645,177
   ‚ù§Ô∏è Likes: 178

2. facebook/bart-large-mnli
   üì• Downloads: 3,793,595
   ‚ù§Ô∏è Likes: 1507

3. cardiffnlp/twitter-roberta-base-sentiment-latest
   üì• Downloads: 3,665,449
   ‚ù§Ô∏è Likes: 747

4. distilbert/distilbert-base-uncased-finetuned-sst-2-english
   üì• Downloads: 2,905,900
   ‚ù§Ô∏è Likes: 861

5. BAAI/bge-reranker-v2-m3
   üì• Downloads: 2,757,640
   ‚ù§Ô∏è Likes: 847


### 4.2 Downloading Model Files

You can download specific files or entire model repositories.

In [8]:
from huggingface_hub import hf_hub_download, snapshot_download
import os

print("üì• Downloading Model Files")
print("=" * 50)

# Example 1: Download a specific file (config.json is small and fast)
print("\n1Ô∏è‚É£ Downloading a specific file...")
config_path = hf_hub_download(
    repo_id="distilbert-base-uncased-finetuned-sst-2-english",
    filename="config.json"
)
print(f"   ‚úÖ Downloaded config to: {config_path}")
print(f"   üìÅ File size: {os.path.getsize(config_path):,} bytes")

# Let's peek at the config
import json
with open(config_path, 'r') as f:
    config = json.load(f)
print(f"   üè∑Ô∏è Model type: {config.get('model_type', 'N/A')}")
print(f"   üìä Hidden size: {config.get('hidden_size', 'N/A')}")
print(f"   üî¢ Vocab size: {config.get('vocab_size', 'N/A'):,}" if config.get('vocab_size') else "")

üì• Downloading Model Files

1Ô∏è‚É£ Downloading a specific file...
   ‚úÖ Downloaded config to: /root/.cache/huggingface/hub/models--distilbert-base-uncased-finetuned-sst-2-english/snapshots/714eb0fa89d2f80546fda750413ed43d93601a13/config.json
   üìÅ File size: 629 bytes
   üè∑Ô∏è Model type: distilbert
   üìä Hidden size: N/A
   üî¢ Vocab size: 30,522


In [9]:
# Example 2: List files in a repository
print("\n2Ô∏è‚É£ Files in a model repository:")

model_info = api.model_info("distilbert-base-uncased-finetuned-sst-2-english")

print(f"\nüì¶ Model: {model_info.id}")
print(f"üè∑Ô∏è Tags: {', '.join(model_info.tags[:5])}..." if len(model_info.tags) > 5 else f"üè∑Ô∏è Tags: {', '.join(model_info.tags)}")
print(f"\nüìÅ Files:")

for sibling in model_info.siblings:
    size_kb = sibling.size / 1024 if sibling.size else 0
    if size_kb > 1024:
        size_str = f"{size_kb/1024:.1f} MB"
    else:
        size_str = f"{size_kb:.1f} KB"
    print(f"   ‚Ä¢ {sibling.rfilename} ({size_str})")


2Ô∏è‚É£ Files in a model repository:

üì¶ Model: distilbert/distilbert-base-uncased-finetuned-sst-2-english
üè∑Ô∏è Tags: transformers, pytorch, tf, rust, onnx...

üìÅ Files:
   ‚Ä¢ .gitattributes (0.0 KB)
   ‚Ä¢ README.md (0.0 KB)
   ‚Ä¢ config.json (0.0 KB)
   ‚Ä¢ map.jpeg (0.0 KB)
   ‚Ä¢ model.safetensors (0.0 KB)
   ‚Ä¢ onnx/added_tokens.json (0.0 KB)
   ‚Ä¢ onnx/config.json (0.0 KB)
   ‚Ä¢ onnx/model.onnx (0.0 KB)
   ‚Ä¢ onnx/special_tokens_map.json (0.0 KB)
   ‚Ä¢ onnx/tokenizer.json (0.0 KB)
   ‚Ä¢ onnx/tokenizer_config.json (0.0 KB)
   ‚Ä¢ onnx/vocab.txt (0.0 KB)
   ‚Ä¢ pytorch_model.bin (0.0 KB)
   ‚Ä¢ rust_model.ot (0.0 KB)
   ‚Ä¢ tf_model.h5 (0.0 KB)
   ‚Ä¢ tokenizer_config.json (0.0 KB)
   ‚Ä¢ vocab.txt (0.0 KB)


### 4.3 Authentication

For private repositories or uploading models, you need to authenticate.

In [10]:
from huggingface_hub import whoami, login

print("üîê Authentication Status")
print("=" * 50)

try:
    user_info = whoami()
    print(f"‚úÖ Logged in as: {user_info['name']}")
    print(f"üìß Email: {user_info.get('email', 'Not shared')}")
    print(f"üè¢ Organizations: {', '.join(user_info.get('orgs', [])) or 'None'}")
except Exception:
    print("‚ùå Not logged in")
    print("\nüí° To login, you have several options:")
    print("\n   Option 1: Interactive login")
    print("   >>> from huggingface_hub import login")
    print("   >>> login()")
    print("\n   Option 2: CLI login")
    print("   $ huggingface-cli login")
    print("\n   Option 3: Environment variable")
    print("   $ export HUGGING_FACE_HUB_TOKEN=your_token_here")
    print("\nüîë Get your token at: https://huggingface.co/settings/tokens")

üîê Authentication Status
‚ùå Not logged in

üí° To login, you have several options:

   Option 1: Interactive login
   >>> from huggingface_hub import login
   >>> login()

   Option 2: CLI login
   $ huggingface-cli login

   Option 3: Environment variable
   $ export HUGGING_FACE_HUB_TOKEN=your_token_here

üîë Get your token at: https://huggingface.co/settings/tokens


### 4.4 Exploring Datasets

In [11]:
from huggingface_hub import list_datasets

print("üìä Popular Datasets on Hugging Face Hub")
print("=" * 60)

# Get popular datasets
datasets = list(
    list_datasets(
        sort="downloads",
        direction=-1,
        limit=10
    )
)

print(f"\n{'Rank':<6}{'Dataset ID':<40}{'Downloads':<15}")
print("-" * 60)

for i, ds in enumerate(datasets, 1):
    downloads = ds.downloads if hasattr(ds, 'downloads') else 'N/A'
    downloads_str = f"{downloads:,}" if isinstance(downloads, int) else str(downloads)
    # Truncate long names
    name = ds.id[:38] + ".." if len(ds.id) > 40 else ds.id
    print(f"{i:<6}{name:<40}{downloads_str:<15}")

üìä Popular Datasets on Hugging Face Hub

Rank  Dataset ID                              Downloads      
------------------------------------------------------------
1     deepmind/code_contests                  2,539,159      
2     google-research-datasets/mbpp           2,383,896      
3     huggingface/documentation-images        1,744,208      
4     m-a-p/FineFineWeb                       1,198,524      
5     hf-doc-build/doc-build                  1,161,175      
6     nvidia/PhysicalAI-Robotics-GR00T-X-Emb..826,397        
7     Salesforce/wikitext                     826,140        
8     banned-historical-archives/banned-hist..788,066        
9     lavita/medical-qa-shared-task-v1-toy    778,513        
10    MRSAudio/MRSAudio                       653,649        


---

## 5Ô∏è‚É£ Managing the Model Cache

When you use Hugging Face models, they're downloaded and cached locally. This can consume significant disk space!

### 5.1 Understanding the Cache

In [12]:
from huggingface_hub import scan_cache_dir, HfFolder
import os

print("üíæ Hugging Face Cache Information")
print("=" * 60)

# Get cache location
cache_path = os.path.expanduser("~/.cache/huggingface")
if os.path.exists(cache_path):
    print(f"üìÅ Cache Location: {cache_path}")
else:
    print(f"üìÅ Default Cache Location: {cache_path} (not created yet)")

# Scan the cache
try:
    cache_info = scan_cache_dir()
    
    print(f"\nüìä Cache Statistics:")
    print(f"   ‚Ä¢ Total Size: {cache_info.size_on_disk / (1024**3):.2f} GB")
    print(f"   ‚Ä¢ Number of Repos: {len(cache_info.repos)}")
    
    if cache_info.repos:
        print(f"\nüì¶ Cached Repositories:")
        
        # Sort by size
        sorted_repos = sorted(
            cache_info.repos, 
            key=lambda x: x.size_on_disk, 
            reverse=True
        )
        
        for repo in sorted_repos[:10]:  # Show top 10
            size_mb = repo.size_on_disk / (1024**2)
            if size_mb > 1024:
                size_str = f"{size_mb/1024:.2f} GB"
            else:
                size_str = f"{size_mb:.1f} MB"
            print(f"   ‚Ä¢ {repo.repo_id} ({size_str})")
            
except Exception as e:
    print(f"\n‚ö†Ô∏è Could not scan cache: {e}")
    print("   This might happen if no models have been downloaded yet.")

üíæ Hugging Face Cache Information
üìÅ Cache Location: /root/.cache/huggingface

üìä Cache Statistics:
   ‚Ä¢ Total Size: 6.92 GB
   ‚Ä¢ Number of Repos: 8

üì¶ Cached Repositories:
   ‚Ä¢ sshleifer/distilbart-cnn-12-6 (2.28 GB)
   ‚Ä¢ facebook/bart-large-mnli (1.52 GB)
   ‚Ä¢ dbmdz/bert-large-cased-finetuned-conll03-english (1.24 GB)
   ‚Ä¢ nlptown/bert-base-multilingual-uncased-sentiment (639.3 MB)
   ‚Ä¢ gpt2 (525.4 MB)
   ‚Ä¢ distilbert/distilbert-base-uncased-finetuned-sst-2-english (255.6 MB)
   ‚Ä¢ distilbert-base-uncased-finetuned-sst-2-english (255.6 MB)
   ‚Ä¢ distilbert/distilbert-base-cased-distilled-squad (249.3 MB)


### 5.2 Cache Management Commands

You can manage your cache using the Hugging Face CLI:

```bash
# View all cached models and their sizes
huggingface-cli scan-cache

# Interactive deletion of cached models
huggingface-cli delete-cache

# Delete specific revisions (advanced)
huggingface-cli delete-cache --revision <commit_hash>
```

### 5.3 Programmatic Cache Cleanup

In [13]:
def get_cache_summary():
    """
    Get a summary of the Hugging Face cache.
    
    Returns:
        dict: Cache statistics
    """
    try:
        cache_info = scan_cache_dir()
        
        total_size_gb = cache_info.size_on_disk / (1024**3)
        
        # Categorize by type
        models = [r for r in cache_info.repos if r.repo_type == "model"]
        datasets = [r for r in cache_info.repos if r.repo_type == "dataset"]
        spaces = [r for r in cache_info.repos if r.repo_type == "space"]
        
        return {
            "total_size_gb": total_size_gb,
            "total_repos": len(cache_info.repos),
            "models": len(models),
            "datasets": len(datasets),
            "spaces": len(spaces),
            "repos": cache_info.repos
        }
    except Exception as e:
        return {"error": str(e)}

summary = get_cache_summary()

if "error" not in summary:
    print("üìà Cache Summary")
    print("=" * 40)
    print(f"üíæ Total Size: {summary['total_size_gb']:.2f} GB")
    print(f"üì¶ Total Repositories: {summary['total_repos']}")
    print(f"   ‚Ä¢ Models: {summary['models']}")
    print(f"   ‚Ä¢ Datasets: {summary['datasets']}")
    print(f"   ‚Ä¢ Spaces: {summary['spaces']}")
else:
    print(f"‚ö†Ô∏è No cache data available: {summary['error']}")

üìà Cache Summary
üíæ Total Size: 6.92 GB
üì¶ Total Repositories: 8
   ‚Ä¢ Models: 8
   ‚Ä¢ Datasets: 0
   ‚Ä¢ Spaces: 0


---

## üéØ Practical Example: Building a Smart Text Analyzer

Let's combine everything we've learned into a practical example!

In [14]:
from transformers import pipeline
import torch
import time

class SmartTextAnalyzer:
    """
    A text analyzer that automatically uses the best available hardware.
    Demonstrates GPU detection, model loading, and multiple NLP tasks.
    """
    
    def __init__(self, use_gpu=True, verbose=True):
        """
        Initialize the analyzer with automatic device selection.
        
        Args:
            use_gpu: Whether to use GPU if available
            verbose: Whether to print status messages
        """
        self.verbose = verbose
        self.device, self.device_id = get_optimal_device(use_gpu, verbose)
        
        if verbose:
            print("\n‚è≥ Loading models...")
        
        start = time.time()
        
        # Load models
        self.sentiment = pipeline(
            "sentiment-analysis",
            device=self.device_id
        )
        
        self.summarizer = pipeline(
            "summarization",
            model="facebook/bart-large-cnn",
            device=self.device_id
        )
        
        self.classifier = pipeline(
            "zero-shot-classification",
            device=self.device_id
        )
        
        load_time = time.time() - start
        
        if verbose:
            print(f"‚úÖ Models loaded in {load_time:.2f} seconds")
    
    def analyze(self, text, categories=None):
        """
        Perform comprehensive text analysis.
        
        Args:
            text: The text to analyze
            categories: Optional list of categories for classification
        
        Returns:
            dict: Analysis results
        """
        if categories is None:
            categories = ["Technology", "Business", "Science", "Sports", "Entertainment"]
        
        results = {
            "text_preview": text[:100] + "..." if len(text) > 100 else text,
            "word_count": len(text.split()),
        }
        
        # Sentiment Analysis
        sentiment = self.sentiment(text[:512])[0]  # Limit input length
        results["sentiment"] = {
            "label": sentiment["label"],
            "confidence": f"{sentiment['score']:.2%}"
        }
        
        # Summarization (if text is long enough)
        if len(text.split()) > 50:
            summary = self.summarizer(
                text, 
                max_length=100, 
                min_length=30, 
                do_sample=False
            )[0]
            results["summary"] = summary["summary_text"]
        else:
            results["summary"] = "[Text too short for summarization]"
        
        # Zero-shot Classification
        classification = self.classifier(text[:512], categories)
        results["category"] = {
            "predicted": classification["labels"][0],
            "confidence": f"{classification['scores'][0]:.2%}"
        }
        
        return results

# Don't initialize yet - we'll do it in the next cell with a sample analysis
print("‚úÖ SmartTextAnalyzer class defined!")
print("Run the next cell to see it in action.")

‚úÖ SmartTextAnalyzer class defined!
Run the next cell to see it in action.


In [15]:
# Initialize the analyzer
analyzer = SmartTextAnalyzer()

# Sample article about renewable energy
sample_article = """
The renewable energy sector has experienced unprecedented growth in recent years, 
with solar and wind power installations reaching record levels globally. According 
to the International Energy Agency, renewable energy capacity is set to increase 
by 50% over the next five years, driven by falling costs and supportive government 
policies. This expansion is expected to be led by solar power, which has become 
the cheapest source of electricity in history in many regions. The transition away 
from fossil fuels is not only beneficial for the environment but is also creating 
millions of new jobs and economic opportunities worldwide. However, challenges 
remain, including the need for better energy storage solutions and grid modernization.
"""

# Analyze the text
print("\n" + "=" * 60)
print("üìä Text Analysis Results")
print("=" * 60)

start = time.time()
results = analyzer.analyze(sample_article)
analysis_time = time.time() - start

print(f"\nüìù Text Preview: {results['text_preview']}")
print(f"üìè Word Count: {results['word_count']}")
print(f"\nüí≠ Sentiment: {results['sentiment']['label']} ({results['sentiment']['confidence']})")
print(f"\nüè∑Ô∏è Category: {results['category']['predicted']} ({results['category']['confidence']})")
print(f"\nüìã Summary:\n   {results['summary']}")
print(f"\n‚è±Ô∏è Analysis completed in {analysis_time:.2f} seconds")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


üöÄ Selected: CUDA GPU (Tesla T4)

‚è≥ Loading models...


Device set to use cuda:0


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0
No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


‚úÖ Models loaded in 27.65 seconds

üìä Text Analysis Results

üìù Text Preview: 
The renewable energy sector has experienced unprecedented growth in recent years, 
with solar and w...
üìè Word Count: 111

üí≠ Sentiment: POSITIVE (98.03%)

üè∑Ô∏è Category: Technology (30.88%)

üìã Summary:
   Renewable energy capacity is set to increase 50% over the next five years. This expansion is expected to be led by solar power, which has become the cheapest source of electricity in history in many regions. The transition away from fossil fuels is also creating millions of new jobs and economic opportunities worldwide.

‚è±Ô∏è Analysis completed in 1.71 seconds


---

## üéØ Summary & Key Takeaways

### What You Learned

| Concept | Key Points |
|---------|------------|
| **Environment Setup** | Use virtual environments (conda/venv) to isolate projects |
| **GPU Detection** | `torch.cuda.is_available()` for NVIDIA, `torch.backends.mps.is_available()` for Apple |
| **Device Selection** | Use `device` parameter in `pipeline()` for hardware acceleration |
| **Hub Access** | `huggingface_hub` package for programmatic access |
| **File Downloads** | `hf_hub_download()` for files, `snapshot_download()` for repos |
| **Cache Management** | `scan_cache_dir()` and CLI tools for managing disk space |

### Best Practices

1. ‚úÖ Always check for GPU availability before running models
2. ‚úÖ Use virtual environments for project isolation
3. ‚úÖ Monitor cache size periodically to avoid disk space issues
4. ‚úÖ Store your HF token securely (never commit to version control!)
5. ‚úÖ Use the CLI tools (`huggingface-cli`) for quick operations

### Useful CLI Commands

```bash
# Authentication
huggingface-cli login
huggingface-cli whoami

# Cache management
huggingface-cli scan-cache
huggingface-cli delete-cache

# Download models
huggingface-cli download <repo_id>
```

---

## üìö Additional Resources

- üìñ [Hugging Face Hub Documentation](https://huggingface.co/docs/huggingface_hub)
- üîß [Transformers Installation Guide](https://huggingface.co/docs/transformers/installation)
- üñ•Ô∏è [GPU Support Guide](https://pytorch.org/get-started/locally/)
- üíæ [Cache Management Guide](https://huggingface.co/docs/huggingface_hub/guides/manage-cache)

---

**Happy Learning! üöÄ**