<a href="https://colab.research.google.com/github/Mahemaran/Colab-notebooks/blob/main/Transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Tranformers**
* A transformer is a deep learning model that processes sequences of data (like sentences) and focuses on relevant parts of the input using a mechanism called self-attention.
* Input: Text, like sentences or paragraphs.
* Output: Predictions (e.g., next words, sentiment, answers to questions).

**Transformers Important**
* Parallel Processing: Unlike RNNs (Recurrent Neural Networks), transformers process input in parallel, which speeds up training.
* Attention Mechanism: The self-attention mechanism helps focus on specific words in the input, regardless of their position.
* Versatility: Transformers can perform many tasks like:
* Text generation
* Text classification
* Named Entity Recognition (NER)
* Machine Translation
* Question Answering

**Transformer Architecture**
* The transformer model has two main parts:
* Encoder: Processes the input sequence and extracts features.
* Decoder: Generates the output sequence (used in tasks like text generation).

**Simplified Diagram:**
```
Input Text → Embedding + Positional Encoding → Encoder → Decoder → Output

```

### **Text Generation with GPT-2**

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load pre-trained GPT-2 model and tokenizer
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Input prompt (seed text)
seed_text = "The future of artificial intelligence is"

# Tokenize the input
input_ids = tokenizer.encode(seed_text, return_tensors="pt")

# Generate text
output = model.generate(
    input_ids=input_ids,  # Input text
    max_length=50,        # Maximum number of tokens
    temperature=0.7,      # Control randomness (lower = more focused)
    top_p=0.9,            # Nucleus sampling (focus on top tokens)
    do_sample=True,       # Enable sampling
    pad_token_id=tokenizer.eos_token_id
)

# Decode and print the output
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated Text:")
print(generated_text)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Generated Text:
The future of artificial intelligence is now more complex than ever, and many of us are already moving beyond the capabilities of computers to become more capable. For example, the use of artificial intelligence in medicine has become more widely accepted than ever before. However,


### **Fine-Tuning BERT**

In [None]:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
import torch

# Load the tokenizer and model
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)  # Binary classification

# Load the dataset (example: IMDB sentiment analysis)
dataset = load_dataset("imdb")

# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Split the dataset
train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(5000))  # Subset for quick training
test_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))

# Define Training Arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    learning_rate=2e-5
)

# Define the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

# Train the model
trainer.train()

# Evaluate the model
results = trainer.evaluate()
print("Evaluation Results:", results)

# Test the model with new input
test_text = "The movie was fantastic! I loved it."
inputs = tokenizer(test_text, return_tensors="pt", truncation=True, padding="max_length", max_length=128)
output = model(**inputs)
predicted_class = torch.argmax(output.logits, dim=1).item()

print("Predicted Class:", "Positive" if predicted_class == 1 else "Negative")


### **Question Answering (Using BERT)**

In [2]:
from transformers import pipeline

# Load pre-trained QA pipeline
qa_pipeline = pipeline("question-answering", model="bert-large-uncased-whole-word-masking-finetuned-squad")

# Context and question
context = """The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Gustave Eiffel, whose company designed and built the tower."""
question = "Who designed the Eiffel Tower?"

# Get the answer
answer = qa_pipeline(question=question, context=context)

print("Answer:", answer['answer'])

config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Answer: Gustave Eiffel


### **Text Summarization (Using T5 or Pegasus)**

In [3]:
from transformers import pipeline

# Load summarization pipeline
summarizer = pipeline("summarization", model="t5-small")

# Input text
text = """
Transformers have revolutionized natural language processing. Introduced in 2017 by Vaswani et al., the Transformer architecture uses self-attention mechanisms to process text data efficiently and in parallel.
Unlike RNNs, Transformers do not process data sequentially, making them faster to train and more versatile.
"""

# Summarize text
summary = summarizer(text, max_length=50, min_length=20, do_sample=False)
print("Summary:", summary[0]['summary_text'])

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Summary: the Transformer architecture uses self-attention mechanisms to process text data efficiently and in parallel . unlike RNNs, Transformers do not process data sequentially .


### **Named Entity Recognition (NER)**

In [None]:
from transformers import pipeline

# Load NER pipeline
ner_pipeline = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")

# Input text
text = "Elon Musk is the CEO of SpaceX, which is headquartered in California."

# Perform NER
entities = ner_pipeline(text)
for entity in entities:
    print(f"{entity['word']} → {entity['entity']} ({entity['score']:.2f})")