## HuggingFace Transformers

# HuggingFace Transformers Tutorial Documentation

This notebook demonstrates key concepts and usage of the HuggingFace Transformers library. Here's a section-by-section breakdown:

## 1. Basic Pipeline Usage
The notebook starts with simple pipeline examples:
- Sentiment analysis using BERT
- Named Entity Recognition (NER) using a specialized BERT model
- Zero-shot classification using BART
- Text generation using OPT-1.3B



In [None]:
# 1. Basic Pipeline Usage
# Import the main pipeline interface from transformers
from transformers import pipeline

# Create a sentiment analysis pipeline using BERT
# This will automatically download the model on first use
sentiment_classifier = pipeline(task="sentiment-analysis", model="bert-base-uncased")

# Test the sentiment classifier with a sample input
sentiment_classifier(inputs="I'm so excited to be learning about large language models")

# Create Named Entity Recognition pipeline using a specialized BERT model
ner = pipeline(task="ner", model = "dslim/bert-base-NER")

# Set up zero-shot classification pipeline using BART
zeroshot_classifier = pipeline(task="zero-shot-classification", model = "facebook/bart-large-mnli")

# Define test inputs for zero-shot classification
sequence_to_classify = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing']

# Text generation pipeline using OPT-1.3B
# Using bfloat16 for memory efficiency and auto device mapping
zeroshot_classifier(sequence_to_classify, candidate_labels)
import torch
pipe = pipeline(model="facebook/opt-1.3b", torch_dtype=torch.bfloat16, device_map="auto")
output = pipe("This is a cool example!", do_sample=True, top_p=0.95)
print(output[0]['generated_text'])




## 2. Tokenizer Operations
Demonstrates tokenizer functionality using BERT and XLNet models:
- Loading pre-trained tokenizers
- Converting text to tokens
- Converting tokens to IDs
- Decoding tokens back to text
- Comparing different tokenizer behaviors



In [None]:
# 2. Tokenizer Operations
# Import AutoTokenizer for automatic tokenizer loading
from transformers import AutoTokenizer

# Initialize BERT tokenizer
model = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model)

# Example text for tokenization
sentence = "I'm so excited to be learning about large language models"

# Convert text to tokens and display
input_ids = tokenizer(sentence)
print(input_ids)

# Tokenize the text and show individual tokens
tokens = tokenizer.tokenize(sentence)
print(tokens)

# Convert tokens to their numerical IDs
token_ids = tokenizer.convert_tokens_to_ids(tokens)
print(token_ids)

# Decode tokens back to text
decoded_ids = tokenizer.decode(token_ids)
print(decoded_ids)

# Compare with XLNet tokenizer
model2 = "xlnet-base-cased"
tokenizer2 = AutoTokenizer.from_pretrained(model2)




## 3. PyTorch Integration
Shows how to:
- Use models with PyTorch
- Perform inference using pre-trained models
- Handle tensors and model outputs



In [None]:
# 3. PyTorch Integration
# Import required modules for PyTorch integration
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Initialize DistilBERT tokenizer and model for sentiment analysis
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

# Convert input to PyTorch tensors
input_ids_pt = tokenizer(sentence, return_tensors ="pt")
print(input_ids_pt)

# Perform inference without gradient calculation
with torch.no_grad():
    logits = model(**input_ids_pt).logits

# Get prediction from model output
predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]




## 4. Model Management
Demonstrates how to:
- Save models locally
- Load models from local storage
- Handle model configurations



In [None]:
# 4. Model Management
# Define directory for saving models
model_directory = "my_saved_models"

# Save both tokenizer and model to local directory
tokenizer.save_pretrained(model_directory)
model.save_pretrained(model_directory)

# Load saved model and tokenizer from local directory
my_tokenizer = AutoTokenizer.from_pretrained(model_directory)
my_model = AutoModelForSequenceClassification.from_pretrained(model_directory)



## Requirements
- transformers library
- PyTorch
- Sufficient disk space for model storage

## Usage Notes
- Models are downloaded on first use
- Some operations require significant memory
- Consider using GPU acceleration for larger models