<a href="https://colab.research.google.com/github/billycemerson/ai-engineering-project/blob/main/00_foundations/02_huggingface.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction & Objectives

**Hugging Face (HF)** is a central ecosystem for modern NLP and LLM development. For an AI Engineer, HF is not only about training models, but about **discovering, evaluating, and deploying pretrained models efficiently**.

Objectives of this notebook:

- Understand the Hugging Face ecosystem (Hub, Transformers, Datasets)

- Learn how to load and run pretrained models for inference

- Compare high-level pipelines vs low-level model usage

- Prepare knowledge for downstream tasks (embeddings, RAG, APIs)

## Hugging Face Hub Overview

The Hugging Face Hub is a repository of:
- Pretrained models
- Datasets
- Spaces (demo apps)

Key concepts:
- Model Card: documentation about training data, task, limitations, and license
- Task tags: text-classification, token-classification, text-generation, feature-extraction

Practical selection criteria:
- Language support (EN / ID)
- Model size (latency vs accuracy)
- License (commercial usage)

## Transformers Pipeline (High-level API)

The `pipeline` API provides a simple, task-oriented interface for inference.

When to use pipelines:
- Rapid prototyping
- Demos and experiments
- Simple inference without custom logic

In [1]:
from transformers import pipeline



In [2]:
sentiment_pipeline = pipeline(
task="sentiment-analysis",
model="distilbert-base-uncased-finetuned-sst-2-english"
)

Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


In [3]:
sentiment_pipeline("This product works very well and is easy to use.")

[{'label': 'POSITIVE', 'score': 0.9997636675834656}]

In [4]:
sentiment_pipeline("This product not work well when i try at first")

[{'label': 'NEGATIVE', 'score': 0.9997252821922302}]

Pros:
- Minimal code
- Automatic preprocessing & postprocessing

Cons:
- Limited control
- Less efficient for large-scale systems

## Loading Model & Tokenizer (Low-level API)

For production systems, AI Engineers often need more control.

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

In [6]:
# Model
model_name = "distilbert-base-uncased-finetuned-sst-2-english"

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

In [7]:
text = "The service response time was slow but acceptable."
inputs = tokenizer(text, return_tensors="pt", truncation=True)

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

In [8]:
outputs

SequenceClassifierOutput(loss=None, logits=tensor([[-1.5075,  1.4134]]), hidden_states=None, attentions=None)

In [9]:
logits

tensor([[-1.5075,  1.4134]])

The `logits` tensor contains raw, unnormalized scores for each class.

This approach allows:
- Custom batching
- Explicit device control (CPU / GPU)
- Integration with APIs and pipelines

## Inference Workflow

A standard inference workflow consists of:
- Text preprocessing
- Tokenization
- Model forward pass
- Post-processing (softmax, label mapping)

In [10]:
# Example text for inference
input_text = "This movie was fantastic, I loved every moment of it!"

#### Text Preprocessing

For most Hugging Face models, basic text preprocessing like lowercasing and cleaning is often handled implicitly by the tokenizer based on its training.

We'll use the raw input text.

#### Tokenization

Converts the raw text into numerical input IDs, attention masks, and token type IDs that the model understands. `'return_tensors="pt"'` ensures the output is a PyTorch tensor.

In [11]:
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True)

#### Model Forward Pass

The tokenized inputs are fed into the pre-trained model to obtain the raw output scores `(logits)`. The `torch.no_grad()` is disables.

Gradient calculation for inference.

In [12]:
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

#### Post-processing

Logits are raw scores, we need to apply `softmax` to get probabilities.

Find the class with the highest probability and map it to a human-readable label.

In [13]:
probabilities = torch.softmax(logits, dim=1)
predicted_class_id = torch.argmax(probabilities, dim=1).item()

# Assuming labels are ['NEGATIVE', 'POSITIVE'] based on the model's training
labels = ['NEGATIVE', 'POSITIVE']
predicted_label = labels[predicted_class_id]
predicted_score = probabilities[0, predicted_class_id].item()

In [14]:
predicted_label

'POSITIVE'

In [15]:
predicted_score

0.9998788833618164

In [16]:
probabilities

tensor([[1.2114e-04, 9.9988e-01]])

## Embedding Models Overview

Sentence embeddings transform text into dense vectors that capture semantic meaning.

In [17]:
from transformers import AutoTokenizer, AutoModel

In [18]:
embedding_model_name = "sentence-transformers/all-MiniLM-L6-v2"

tokenizer = AutoTokenizer.from_pretrained(embedding_model_name)
model = AutoModel.from_pretrained(embedding_model_name)

sentences = [
"AI engineering focuses on system design",
"Machine learning builds predictive models"
]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

In [19]:
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")

In [20]:
with torch.no_grad():
  outputs = model(**inputs)
  embeddings = outputs.last_hidden_state.mean(dim=1)

In [21]:
embeddings.shape

torch.Size([2, 384])

Use cases:
- Semantic search
- Similarity matching
- Retrieval-Augmented Generation (RAG)

## Hugging Face Datasets (Brief Overview)

HF Datasets provide ready-to-use datasets with standardized APIs.

In [22]:
from datasets import load_dataset

In [23]:
dataset = load_dataset("ag_news")
dataset

README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/18.6M [00:00<?, ?B/s]

data/test-00000-of-00001.parquet:   0%|          | 0.00/1.23M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/120000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7600 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 120000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 7600
    })
})

In [24]:
# Check sample dataset in train
dataset["train"][0]

{'text': "Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\\band of ultra-cynics, are seeing green again.",
 'label': 2}

For AI Engineers, datasets are mainly used for:
- Fine-tuning
- Evaluation
- Benchmarking