## LLM's with Hugging face

**Transformer Library Overview**
**Model Inference**
**Model Fine Tuning and Training**

Tranformer Overview 
Special Tokens,  Examples
Attention
Task : Text classification, Token Classification

https://huggingface.co/tasks 
https://huggingface.co/models
https://huggingface.co/datasets


**What is a Conditional Language Model?**

Conditional Language Model – also known as a Sequence-to-Sequence (Seq2Seq) model. **These models have an encoder-decoder architecture:**

The encoder processes the input sequence (e.g., a document, a question) and transforms it into a hidden representation.

The decoder then generates the output sequence (e.g., a summary, an answer), conditioned on that hidden state.

I used models like T5 and BART, which are pretrained for tasks like summarization, translation, and Q&A. We used Hugging Face’s AutoTokenizer and AutoModelForSeq2SeqLM to simplify integration.

Usecase : 
"In a recent project, when team was building an AI assistant that could not only answer user questions but also translate and summarize domain-specific content in real-time. **The challenge was to generate output based on a specific input – meaning the output wasn’t just a continuation of input like in GPT models, but needed to be conditioned on the input itself."**

task was to select and implement a model architecture that could handle such conditional tasks – where output depends explicitly on the input, like translation (English → French), summarization, or question answering. It had to support sequence-to-sequence transformation and be fine-tunable for our domain."

In [1]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
#AutoTokenizer is a factory class that can automatically load the tokenizer corresponding to a pre-trained model we specify
#  (in this case, t5-small model).

tokenizer = AutoTokenizer.from_pretrained("t5-small")
#Next, we create an instance of the AutoTokenizer class by calling the from_pretrained method. This tokenizer is designed to work with the t5-small model 

model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")

input_text = "translate English to French: The weather is nice today."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
  


  from .autonotebook import tqdm as notebook_tqdm


Le temps est agréable aujourd'hui.


**"In Short , Conditional Language Models (Seq2Seq) are used when the output is directly dependent on a specific input – unlike causal models which generate freely. They're ideal for summarization, translation, and question answering tasks where understanding and transforming input sequences is critical."**

In [2]:
from transformers import pipeline
model = pipeline(task="translation_en_to_fr")
answer = model("My name is Mike and i love machine learning")
print(answer)

No model was supplied, defaulted to google-t5/t5-base and revision a9723ea (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'translation_text': "Je m'appelle Mike et j'aime l'apprentissage par machine"}]


##Casual Language Models

### What is Casual Language Models ,When to use them ?
"Causal Language Models are designed for autoregressive tasks — where the model generates the next word/token based only on prior context. They’re ideal for chatbots, code assistants, and story generation. Unlike seq2seq models, they don’t require an encoder-decoder architecture — the decoder itself learns to generate outputs step-by-step."

### When to use them ?
"In  GenAI projects, we were developing an AI writing assistant that could autocomplete user input, generate stories, and provide intelligent code suggestions. The core requirement was a model that could predict the next token in a sequence given only the previous context — no peeking ahead."

"My task was to select and fine-tune a language model that could handle next-token prediction effectively. This was essential for features like autocomplete, chat generation, and coding assistance where the model builds output word-by-word based only on what has already been said or written.

So chose a Causal Language Model (CLM) — specifically GPT-2 — because it’s designed to predict the next token in a sequence, given all previous tokens. CLMs use a unidirectional attention mechanism, meaning each word sees only past words during training and inference, which is ideal for autoregressive generation tasks."

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(inputs["input_ids"], max_length=50)

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)


### What are Masked Language Models, when to use them?

"Masked Language Models (MLMs) like BERT are pre-trained by masking random words in a sentence and predicting them using full bidirectional context. This makes them excellent for understanding tasks — such as classification, QA, and information extraction — where deep contextual understanding is key. Unlike causal models, MLMs don’t generate new text but understand existing text with high accuracy."

**When to use Masked Language model**
"In one of our NLP pipeline projects, we needed to build a system that could understand the structure and meaning of customer feedback — classify sentiment, identify key entities, and detect intent. For this, we needed a model that had a deep understanding of context — not just generating text but analyzing it."

task was to select a language model that could learn bidirectional context from text. It had to understand not only the words that come before a token, but also the words that follow it. This was important for tasks like sentiment analysis and named entity recognition, where both past and future context matter."

Chose a Masked Language Model (MLM) — specifically BERT (Bidirectional Encoder Representations from Transformers) — because it is trained to predict missing tokens in a sentence by looking at both directions (left and right context).

In [4]:
from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")

sentence = "The capital of France is [MASK]."
inputs = tokenizer(sentence, return_tensors="pt")
mask_token_index = (inputs.input_ids == tokenizer.mask_token_id).nonzero(as_tuple=True)[1]

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
predicted_word = tokenizer.decode(predicted_token_id)
print(f"📌 Prediction: {predicted_word}")


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


📌 Prediction: paris


### Huggingface Libraries (most used are)

Transformers 
Provides pretrained models for NLP, computer vision, speech, and multimodal tasks.
Supports 1000+ models (like BERT, GPT-2, T5, BART, LLaMA, Whisper, etc.)
https://github.com/huggingface/transformers
https://huggingface.co/docs/transformers/index



Accelerate : Accelerate is a library that enables the same PyTorch code to be run across any distributed configuration by adding just four lines of code!https://github.com/huggingface/accelerate
https://huggingface.co/docs/accelerate/index

Evaluate: A library for easily evaluating machine learning models and datasets.
https://huggingface.co/docs/evaluate/index
https://github.com/huggingface/evaluate


### Hugging face Transformer Library - 
AutoModel, AutoTokenizer, AutoModelForSequenceClassification, etc.

Can be used for Task-specific models: summarization, translation, text generation, etc.

Supports both PyTorch and TensorFlow.

Install pip and python
pip install torch
pip install transformers


**Using Transformers Pipelines:**
Pipelines are the easiest way yo start using huggingface transformers for inference, Pipelines abstract most of the complex code required to run the model, You can use pipeline to run different tasks eg: text classification, SENTIMENTAL Analysis etc..,


In [None]:
from transformers import pipeline
model = pipeline(task="sentiment-analysis")
#The above cell will download a model for sentiment analysis  
#As you can see from the printed output, since we didn’t specify a model, the library downloads the current default model for the sentiment-analysis task, 
# which is the distilbert-base-uncased-finetuned-sst-2-english model. 
# The output also says that it’s not recommended to not specify a model to use, because the default models for the tasks may vary over time.
model = pipeline(model="distilbert-base-uncased-finetuned-sst-2-english")
answer = model("The weather is great today")
print(answer)
#The model classified the input sentence with the “POSITIVE” class, with a score of ~0.99. We can also pass several input sentences to the model as a list.

Using Auto Classess:
AutoTokenizer also allows you to automatically load the tokenizer for any specific model 

The AutoModel automatically infers and loads the correct transfoermer architecture for a give mode

There are several AutoModel Clasess per task: 
AutoModelForSeq2SeqLM -- class Used to Load conditional Language models
https://huggingface.co/docs/transformers/model_doc/auto#transformers.TFAutoModelForSeq2SeqLM


AutoModelForCausalLM  - class Used to Load Casual language models
https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM

AutoModelForSequenceClassification --class used to load classification models


 "I was working on a project to build a domain-aware chatbot that answers questions from a medical knowledge base. Since we needed to use a pretrained language model for encoding inputs and generating responses, the team decided to use Hugging Face Transformers due to its ease of integration and model flexibility."

 Your task was to efficiently load a Hugging Face model with minimal boilerplate, allowing dynamic switching between models like BERT, RoBERTa, or GPT-2 without changing much of the code. This also had to include proper tokenization of inputs to match the selected model."

 "To make the code model-agnostic and dynamic, I used AutoTokenizer and AutoModel from the Hugging Face transformers library. These classes automatically load the correct tokenizer and model architecture based on the model name or path."

In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Step 1: Load the tokenizer and model
# model_name = "t5-small"
model_name = "facebook/bart-large-cnn"



tokenizer = AutoTokenizer.from_pretrained(model_name) # 	Automatically loads the correct tokenizer for the model
model = AutoModelForSeq2SeqLM.from_pretrained(model_name) # Loads a seq2seq model like T5 or BART

# Step 2: Prepare input text (with a task prefix)
input_text = "summarize: Transformers models by Hugging Face allow users to easily run state-of-the-art NLP models for tasks like summarization, translation, and question answering."
#Converts input text to model-readable tokens


# Step 3: Tokenize the input
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)

# Step 4: Perform inference (generate output) # Generates prediction from the input tokens
outputs = model.generate(
    input_ids=inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    max_length=50,
    num_beams=4,
    early_stopping=True
)

# Step 5: Decode the generated tokens to text # Converts the output tokens back to human-readable text
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("📌 Summary:", summary)


**AutoTokenizer**

### AutoTokenizer is a class that automatically selects and loads the appropriate tokenizer for a given pre-trained model, simplifying the process of preparing text for model input.

In [7]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
# tokenizer("Using a Transformer network is simple")
# Tokenize a sentence
text = "This is a sample sentence."
tokens = tokenizer.tokenize(text)
print(tokens)

    # Convert tokens to IDs
ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)

    # Decode IDs back to text
decoded_text = tokenizer.decode(ids)
print(decoded_text)

['This', 'is', 'a', 'sample', 'sentence', '.']
[1188, 1110, 170, 6876, 5650, 119]
This is a sample sentence.


**Automodel**

In [9]:
from transformers import AutoModelForSeq2SeqLM

model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
print(model)

T5ForConditionalGeneration(
  (shared): Embedding(32128, 512)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 512)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=512, out_features=512, bias=False)
              (k): Linear(in_features=512, out_features=512, bias=False)
              (v): Linear(in_features=512, out_features=512, bias=False)
              (o): Linear(in_features=512, out_features=512, bias=False)
              (relative_attention_bias): Embedding(32, 8)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseActDense(
              (wi): Linear(in_features=512, out_features=2048, bias=False)
              (wo): Linear(in_features=2048, out_features=512, bias=False)
              (dropout): Drop

### Training and Fine-Tuning
Training acomplex model on limited data can lead to overfitting

Hugging face team created several scripts to train/fine-tune models on several tasks . The scripts can be found -- https://github.com/huggingface/transformers/tree/main/examples/pytorch

https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py -- Use to train/fine-tune for casual language models
https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py -- Use to train/fine-tune masked language models
https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-classification/run_classification.py -- use to tain/fine-tune text classfication models

The following example fine-tunes GPT-2 on WikiText-2. We're using the raw WikiText-2 (no tokens were replaced before the tokenization). The loss here is that of causal language modeling.

python run_clm.py \
    --model_name_or_path openai-community/gpt2 \
    --dataset_name wikitext \
    --dataset_config_name wikitext-2-raw-v1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --do_train \
    --do_eval \
    --output_dir /tmp/test-clm


Parameter	Description
run_clm.py	--> Script from Hugging Face 🤗 Transformers examples repo used for causal language modeling (like GPT-2).
--model_name_or_path openai-community/gpt2	--> Load the pretrained GPT-2 model from the Hugging Face hub. You can also use a local directory here if you've downloaded or fine-tuned a model.
--dataset_name wikitext	  --> Use the wikitext dataset from the Hugging Face datasets library.
--dataset_config_name wikitext-2-raw-v1	--> Choose a specific configuration: wikitext-2-raw-v1 means using the raw version of WikiText-2 (no <unk> or token replacements).
--per_device_train_batch_size 8	--> Train with a batch size of 8 per GPU. If you're using 2 GPUs, effective batch size is 16.
--per_device_eval_batch_size 8 --> 	Evaluate with batch size of 8 per GPU during validation.
--do_train	--> Perform training phase.
--do_eval	--> Evaluate the model during training.
--output_dir /tmp/test-clm	Directory where the model checkpoints, logs, and metrics will be saved.

What are Tokenizers?
Imagine that you’re trying to teach a robot to understand and speak human languages. The first challenge you’d face is how to break down language into pieces the robot can digest. That’s where tokenizers come in.

Tokenizers dissect complex language into manageable pieces, transforming raw text into a structured form that AI models can easily process. This seemingly simple step is crucial, enabling machines to grasp the nuances of human communication.

Think of tokenizers as the chefs who chop ingredients before a meal is cooked. Without this step, preparing complex dishes (or understanding complex sentences) would be much harder.

Through tokenization, AI systems can recognize patterns, understand context, and generate responses that are increasingly similar to human interaction.

By breaking down the complexities of language into digestible bits, tokenizers not only enhance AI’s linguistic capabilities but also pave the way for more intuitive, efficient, and accurate machine learning models.

Test your Understanding :

1) What is a language model, and how is it evaluated?
2) What are the primary differences between Hugging Face
Transformers, Datasets, and Tokenizers libraries, and how do they
integrate to streamline NLP workflows?
3) Describe how to use Hugging Face Pipelines for end-to-end
inference. What types of NLP tasks can pipelines handle, and what are
the main advantages of using them?
4) How does Hugging Face's Accelerate library improve model training,
and what challenges does it address in scaling NLP models across
different hardware setups?
5) How does Hugging Face's transformers library facilitate transfer
learning, and what are the typical steps for fine-tuning a pre-trained
model on a custom dataset?