
# 🧠 Large Language Model (LLM) Inference & Fine-Tuning

This notebook provides **code templates and checklists** for **running and fine-tuning large language models (LLMs) like GPT and BERT**.

### 🔹 What’s Covered:
- Running LLM inference with Hugging Face Transformers
- Fine-tuning a transformer model on custom data
- Using LoRA (Low-Rank Adaptation) for efficient fine-tuning
- Deploying fine-tuned models


In [None]:

# Ensure required libraries are installed (Uncomment if necessary)
# !pip install transformers datasets torch accelerate peft



## 🚀 Running LLM Inference with Hugging Face

✅ Load a **pretrained LLM** for text generation or classification.  
✅ Use **pipeline** API for quick inference.  


In [None]:

from transformers import pipeline

# Load a GPT-2 text generation model
generator = pipeline("text-generation", model="gpt2")

# Generate text
prompt = "Once upon a time,"
output = generator(prompt, max_length=50)
print(output[0]["generated_text"])



## 🏋️ Fine-Tuning a Transformer Model

✅ Use **Hugging Face Trainer** for efficient fine-tuning.  
✅ Prepare a **custom dataset** for fine-tuning.  


In [None]:

from datasets import load_dataset
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

# Load a dataset (example: IMDB sentiment analysis dataset)
dataset = load_dataset("imdb")

# Load a pretrained model for text classification
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3
)

# Create trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"].shuffle().select(range(1000)),  # Using a subset for speed
    eval_dataset=dataset["test"].shuffle().select(range(500))
)

# Fine-tune the model
trainer.train()



## 🔬 Using LoRA (Low-Rank Adaptation) for Efficient Fine-Tuning

✅ Reduce **memory usage & training cost**.  
✅ Fine-tune only a subset of model weights.  


In [None]:

from peft import LoraConfig, get_peft_model

# Define LoRA config
lora_config = LoraConfig(r=8, lora_alpha=32, target_modules=["query", "value"], lora_dropout=0.05)

# Apply LoRA to the model
peft_model = get_peft_model(model, lora_config)

# Train as usual using Trainer
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=dataset["train"].shuffle().select(range(1000)),
    eval_dataset=dataset["test"].shuffle().select(range(500))
)

trainer.train()



## 🌍 Deploying Fine-Tuned Models

✅ Save the fine-tuned model for later use.  
✅ Upload to **Hugging Face Model Hub** or deploy via **FastAPI**.  


In [None]:

from transformers import AutoTokenizer

# Save fine-tuned model
model.save_pretrained("fine_tuned_model")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
tokenizer.save_pretrained("fine_tuned_model")

# Upload to Hugging Face Hub (Requires authentication)
# !huggingface-cli login
# model.push_to_hub("your-username/fine-tuned-model")



## ✅ Best Practices & Common Pitfalls

- **Fine-tune efficiently**: Use **LoRA or quantization** to reduce memory usage.  
- **Use a small dataset first**: Avoid expensive full-scale fine-tuning on first runs.  
- **Monitor loss carefully**: Overfitting happens quickly on small datasets.  
- **Use mixed precision training**: Speeds up training without sacrificing accuracy.  
