# T5 model

In [1]:
from transformers import T5ForConditionalGeneration, T5Tokenizer, Trainer, TrainingArguments

  from .autonotebook import tqdm as notebook_tqdm





## T5 Tokenizer

In [None]:
tokenizer = T5Tokenizer.from_pretrained("t5-small")

This class handles the tokenization process for T5 models. Tokenization involves converting raw text into numerical representations (tokens) that the model can understand

## Training Arguments

In [None]:
training_args = TrainingArguments(
    output_dir="D:/Data science Notes/Python/Projects/Text summarization/results",          # output directory for checkpoints
    num_train_epochs=6,              # number of training epochs
    per_device_train_batch_size=8,   # batch size per device during training
    per_device_eval_batch_size=8,    # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir="D:/Data science Notes/Python/Projects/Text summarization/logs",            # directory for storing logs
    logging_steps=50,                # how often to log training info
    save_steps=500,                  # how often to save a model checkpoint
    eval_steps=50,                   # how often to run evaluation
    eval_strategy="epoch",     # Ensure evaluation happens every `epoch`
)

Used Arguments:

output_dir:
    Purpose: Specifies the directory where the trained model, checkpoints, and other training artifacts will be saved.
    Value: "D:/Data science Notes/Python/Projects/Text summarization/results"

num_train_epochs:
    Purpose: Determines the total number of times the entire training dataset will be passed through the model during training.
    Value: 6 (meaning the model will be trained for 6 epochs)

per_device_train_batch_size:
    Purpose: Controls the number of samples processed by the model in a single training step on each device (e.g., GPU).
    Value: 8 (meaning 8 samples will be processed together in each training step)

per_device_eval_batch_size:
    Purpose: Similar to per_device_train_batch_size, but for the evaluation phase.
    Value: 8

warmup_steps:
    Purpose: Used with learning rate schedulers. During the warmup phase, the learning rate gradually increases from a very low value to its maximum. This helps stabilize training.
    Value: 500 (meaning the learning rate will increase gradually for the first 500 training steps)

weight_decay:
    Purpose: A regularization technique that penalizes large weights in the model, preventing overfitting.
    Value: 0.01 (a small value for weight decay)

logging_dir:
    Purpose: Specifies the directory where training logs will be saved (e.g., for TensorBoard).
    Value: "D:/Data science Notes/Python/Projects/Text summarization/logs"

logging_steps:
    Purpose: Controls how often training information (e.g., loss, learning rate) is logged.
    Value: 50 (meaning logs will be printed every 50 training steps)

save_steps:
    Purpose: Determines how often model checkpoints are saved during training.
    Value: 500 (meaning a checkpoint will be saved every 500 training steps)

eval_steps:
    Purpose: Controls how often the model is evaluated on the validation set during training.
    Value: 50 (meaning evaluation will be performed every 50 training steps)

eval_strategy:
    Purpose: Controls when evaluation should occur.
    Value: "epoch" (meaning evaluation will be performed at the end of each epoch)

Other Available Arguments:

learning_rate: The initial learning rate for the optimizer.

evaluation_strategy: Can also be set to "steps" or "no".

dataloader_num_workers: Number of subprocesses used for data loading.

seed: Random seed for reproducibility.

fp16: Whether to use mixed precision training.

gradient_accumulation_steps: Accumulate gradients over multiple steps before performing an optimizer step.

**Many more: Refer to the official Hugging Face Transformers documentation for a complete list of available arguments.

## T5ForConditionalGeneration



Purpose: This class represents the T5 model architecture specifically designed for conditional text generation.

Key Features:

Encoder-Decoder Architecture: T5 employs a powerful encoder-decoder structure, where the encoder processes the input sequence and the decoder generates the output sequence.

Text-to-Text Framework: T5 treats all NLP tasks as text-to-text problems, such as:

Translation: "translate English to French: The quick brown fox ..." -> "La rapide ..."

Summarization: "summarize: The article discusses ..." -> "The article is about ..."

Question Answering: "question: What is the capital of France? context: ..." -> "Paris"
  
Pre-trained Models: Hugging Face provides pre-trained T5 models (e.g., "t5-small", "t5-base", "t5-large") with varying sizes and capabilities.

In [None]:
model = T5ForConditionalGeneration.from_pretrained("t5-small")

## Trainer

Purpose: This class simplifies the training process for various Transformer models, including T5. It handles many training-related tasks, such as:

Data Loading and Preparation: Efficiently loads and prepares the training and evaluation datasets.

Training Loop: Manages the training loop, including forward and backward passes, optimization, and logging.

Evaluation: Performs evaluation on the validation set during training.

Logging and TensorBoard: Logs training metrics (loss, accuracy, etc.) and can optionally log to TensorBoard for visualization.

Checkpointing: Saves model checkpoints at regular intervals, allowing you to resume training or fine-tune from a specific point.

In [None]:
trainer = Trainer(
    model=model, 
    args=training_args, 
    train_dataset=train_dataset, 
    eval_dataset=val_dataset
)

# Train the model
trainer.train()