<div align="center">
<img src="https://poorit.in/image.png" alt="Poorit" width="40" style="vertical-align: middle;"> <b>AI SYSTEMS ENGINEERING 1</b>

## Unit 3: HuggingFace Platform and Pipelines

**CV Raman Global University, Bhubaneswar**  
*AI Center of Excellence*

---

</div>

---

### What You'll Learn

In this notebook, you will:

1. **Understand the HuggingFace ecosystem** — Hub, Transformers, Datasets
2. **Use the Pipelines API** for quick inference without complex setup
3. **Run text generation, sentiment analysis, and zero-shot classification**

**Duration:** ~45 minutes

---

## 1. Environment Setup

In [None]:
# Install required packages
!pip install -q transformers torch

In [None]:
import torch
from transformers import pipeline

# Check if GPU is available
device = 0 if torch.cuda.is_available() else -1

if torch.cuda.is_available():
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    print("Using CPU — pipelines will still work, just a bit slower.")

---

## 2. The HuggingFace Ecosystem

HuggingFace is the largest platform for sharing and using AI models.

| Component | Description |
|-----------|-------------|
| **Hub** | Repository of 500k+ models and datasets |
| **Transformers** | Library for working with transformer models |
| **Datasets** | Library for loading and processing datasets |
| **Spaces** | Platform for hosting ML demos |

The **Pipelines API** is the simplest way to use models — it handles tokenization, inference, and post-processing automatically.

---

## 3. Text Generation Pipeline

Let's start with text generation using GPT-2, a small but capable model.

In [None]:
# Create a text generation pipeline
generator = pipeline("text-generation", model="gpt2", device=device)

In [None]:
# Generate text from a prompt
prompt = "Artificial intelligence is transforming"

result = generator(prompt, max_new_tokens=50, do_sample=True, temperature=0.7)

print(result[0]["generated_text"])

That's it — one line to load the model, one call to generate text. The pipeline handles tokenization, model inference, and decoding automatically.

---

## 4. Sentiment Analysis Pipeline

Sentiment analysis classifies text as positive or negative. The pipeline uses a pretrained model (DistilBERT fine-tuned on SST-2) by default.

In [None]:
# Create a sentiment analysis pipeline
sentiment = pipeline("sentiment-analysis", device=device)

In [None]:
# Analyze sentiment of multiple texts
texts = [
    "I love this course! It's really helping me understand AI.",
    "The weather today is terrible and I'm stuck indoors.",
    "The food was okay, nothing special."
]

results = sentiment(texts)

for text, result in zip(texts, results):
    print(f"Text: {text}")
    print(f"Sentiment: {result['label']} (confidence: {result['score']:.2f})\n")

---

## 5. Zero-Shot Classification

Zero-shot classification can categorize text into labels it has **never been trained on**. You provide the labels at inference time.

In [None]:
# Create a zero-shot classifier
classifier = pipeline("zero-shot-classification", device=device)

In [None]:
# Classify a customer support ticket
text = "My order hasn't arrived yet and it's been 2 weeks. I need a refund."
labels = ["shipping issue", "payment problem", "product quality", "general inquiry"]

result = classifier(text, candidate_labels=labels)

print(f"Text: {text}\n")
print("Classifications:")
for label, score in zip(result["labels"], result["scores"]):
    print(f"  {label}: {score:.2%}")

Notice how the model correctly identifies this as a shipping issue — without ever being trained on these specific categories.

---

## 6. Exercise

Try the pipelines yourself!

**Step 1:** Run sentiment analysis on 3 sentences of your choice.  
**Step 2:** Run zero-shot classification on a text with your own custom labels.

In [None]:
# Step 1: Sentiment analysis on your own sentences
# Replace the strings below with your own text

my_texts = [
    # "your first sentence here",
    # "your second sentence here",
    # "your third sentence here",
]

results = sentiment(my_texts)

for text, result in zip(my_texts, results):
    print(f"{text}")
    print(f"  → {result['label']} ({result['score']:.2f})\n")

In [None]:
# Step 2: Zero-shot classification with your own text and labels
# Replace the text and labels below

my_text = ""  # your text here
my_labels = []  # e.g. ["sports", "politics", "technology", "entertainment"]

result = classifier(my_text, candidate_labels=my_labels)

print(f"Text: {my_text}\n")
for label, score in zip(result["labels"], result["scores"]):
    print(f"  {label}: {score:.2%}")

---

## Key Takeaways

1. **Pipelines abstract complexity** — tokenization, inference, and post-processing are handled automatically

2. **Many tasks supported** — text generation, classification, summarization, QA, and more

3. **Zero-shot classification is powerful** — classify text into any categories without training

### Common Pipeline Tasks

| Task | Pipeline Name | Example Model |
|------|--------------|---------------|
| Text Generation | `text-generation` | gpt2, llama |
| Classification | `sentiment-analysis` | distilbert |
| Zero-Shot | `zero-shot-classification` | bart-large-mnli |
| Summarization | `summarization` | bart, t5 |
| Question Answering | `question-answering` | roberta |

### What's Next?

In the next notebook, we'll explore how **tokenization** works under the hood — the process that converts text into numbers that models can understand.

---

## Additional Resources

- [HuggingFace Hub](https://huggingface.co/models)
- [Transformers Documentation](https://huggingface.co/docs/transformers)
- [Pipeline Tutorial](https://huggingface.co/docs/transformers/pipeline_tutorial)

---

**Course Information:**
- **Institution:** CV Raman Global University, Bhubaneswar
- **Program:** AI Center of Excellence
- **Course:** AI Systems Engineering 1
- **Developed by:** [Poorit Technologies](https://poorit.in) — *Transform Graduates into Industry-Ready Professionals*

---