

## Introduction to Hugging Face

Hugging Face provides **state-of-the-art** NLP (and increasingly, multimodal) models and an easy-to-use Python library called **Transformers**. With just a few lines of code you can:

* **Load** pre-trained models for dozens of tasks
* **Run** inference via high-level “pipelines”
* **Fine-tune** on your own data

---

## 1. Setup in Google Colab

```bash
# Install the core libraries
!pip install transformers datasets huggingface_hub --quiet
```

> **Tip:** Colab often comes with a GPU—go to **Runtime → Change runtime type → GPU** for faster inference.

---

## 2. Quick-start with Pipelines

The `pipeline` API wraps tokenization, model loading, and inference in one object.

### 2.1 Sentiment Analysis

```python
from transformers import pipeline

# 1. Load a sentiment-analysis pipeline (defaults to a small model)
sentiment = pipeline("sentiment-analysis")

# 2. Run inference
examples = [
    "Hugging Face makes NLP super accessible!",
    "I dislike bugs in my code..."
]
results = sentiment(examples)
for text, res in zip(examples, results):
    print(f"{text!r:50} → label={res['label']}, score={res['score']:.3f}")
```

### 2.2 Text Generation

```python
from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")
prompt = "Once upon a time"
out = generator(prompt, max_length=30, num_return_sequences=1)
print(out[0]["generated_text"])
```

### 2.3 Question Answering

```python
from transformers import pipeline

qa = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
context = (
    "Hugging Face is an AI company with the mission to democratize good machine learning. "
    "The Transformers library provides thousands of pretrained models in 100+ languages."
)

answer = qa({
    "question": "What is the mission of Hugging Face?",
    "context": context
})
print(answer["answer"])
```

### 2.4 Model Discovery with `huggingface_hub`

```python
from huggingface_hub import HfApi

api = HfApi()
# List top 5 English summarization models
models = api.list_models(task="summarization", limit=5)
for m in models:
    print(m.modelId)
```

---

## 3. Assignment: Build Your Own Pipeline

**Your task:**

1. **Choose** one problem from the list below.
2. **Find** a suitable pre-trained model on [Hugging Face Models](https://huggingface.co/models).
3. **Create** a `pipeline` in Colab to solve the problem—and demonstrate it on 2–3 examples.
4. **Experiment** with model parameters (e.g. `max_length`, `top_k`, `temperature`) and **compare** results.

### 🔹 Problem Set

* **Sentiment classification** of product reviews
* **Text summarization** of news articles
* **Machine translation** (e.g., English ↔ French)
* **Named entity recognition** on a text snippet
* **Paraphrasing** a given sentence
* **Any other** pipeline-supported task you’re curious about!

---



In [1]:
# First, install required libraries
!pip install transformers datasets huggingface_hub --quiet

In [2]:
from transformers import pipeline

# Load a summarization pipeline (using a small but effective model)
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0


In [3]:
articles = [
    """The Hubble Space Telescope has made over 1.5 million observations since its launch in 1990.
    It has contributed to many ground-breaking discoveries in astronomy, including determining the
    rate of expansion of the universe. NASA estimates Hubble's total cost at about $10 billion
    over its lifetime, making it one of the most productive scientific instruments ever built.""",

    """Artificial intelligence is transforming healthcare by enabling faster diagnosis, personalized
    treatment plans, and drug discovery. Recent advances in deep learning have allowed AI systems
    to analyze medical images with accuracy rivaling human experts. However, challenges remain in
    data privacy, algorithm bias, and integration with clinical workflows."""
]

# Summarize with default parameters
print("=== Default Summaries ===")
for article in articles:
    summary = summarizer(article, max_length=130, min_length=30, do_sample=False)
    print(f"Original: {article[:150]}...")
    print(f"Summary: {summary[0]['summary_text']}\n")

Your max_length is set to 130, but your input_length is only 84. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=42)


=== Default Summaries ===


Your max_length is set to 130, but your input_length is only 73. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=36)


Original: The Hubble Space Telescope has made over 1.5 million observations since its launch in 1990. 
    It has contributed to many ground-breaking discoverie...
Summary: The Hubble Space Telescope has made over 1.5 million observations since its launch in 1990. It has contributed to many ground-breaking discoveries in astronomy. NASA estimates Hubble's total cost at about $10 billion.

Original: Artificial intelligence is transforming healthcare by enabling faster diagnosis, personalized 
    treatment plans, and drug discovery. Recent advance...
Summary: Artificial intelligence is transforming healthcare by enabling faster diagnosis, personalized  treatment plans, and drug discovery. Recent advances in deep learning have allowed AI systems to analyze medical images with accuracy rivaling human experts.



In [4]:
# Experiment with different parameters
print("=== Parameter Experimentation ===")
params = [
    {"max_length": 100, "min_length": 20, "do_sample": True},  # Shorter, with sampling
    {"max_length": 150, "min_length": 50, "do_sample": False},  # Longer, deterministic
    {"max_length": 80, "min_length": 30, "do_sample": True, "temperature": 1.5}  # More creative
]

for param_set in params:
    print(f"\nParameters: {param_set}")
    summary = summarizer(articles[0], **param_set)
    print(f"Summary: {summary[0]['summary_text']}")

Your max_length is set to 100, but your input_length is only 84. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=42)


=== Parameter Experimentation ===

Parameters: {'max_length': 100, 'min_length': 20, 'do_sample': True}


Your max_length is set to 150, but your input_length is only 84. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=42)


Summary: The Hubble Space Telescope has made over 1.5 million observations since its launch in 1990. It has contributed to many ground-breaking discoveries in astronomy, including determining the rate of expansion of the universe.

Parameters: {'max_length': 150, 'min_length': 50, 'do_sample': False}
Summary: The Hubble Space Telescope has made over 1.5 million observations since its launch in 1990. It has contributed to many ground-breaking discoveries in astronomy, including determining the rate of expansion of the universe. NASA estimates Hubble's total cost at about $10 billion  over its lifetime.

Parameters: {'max_length': 80, 'min_length': 30, 'do_sample': True, 'temperature': 1.5}
Summary: The Hubble Space Telescope has made over 1.5 million observations since its launch in 1990. It has contributed to many ground-breaking discoveries in astronomy, including determining the rate of expansion of the universe. NASA estimates Hubble's total cost at about $10 billion.
