## Import Libraries

In [1]:
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

* torch: PyTorch is used as the underlying deep learning framework.
* T5Tokenizer and T5ForConditionalGeneration are classes from Hugging Face's transformers library.
    * T5Tokenizer handles the tokenization of input text.
    * T5ForConditionalGeneration is the model class for generating sequences using T5.

## Initialize Models

In [2]:
model_name1 = "t5-small"
model_name2 = "t5-base"
model_name3 = "t5-large"

The code initializes three different variations of the T5 model:
* t5-small: The smallest version of T5.
* t5-base: A medium-sized version.
* t5-large: A larger version of the model.

## Initialize Tokenizers

In [3]:
tokenizer1 = T5Tokenizer.from_pretrained(model_name1)
tokenizer2 = T5Tokenizer.from_pretrained(model_name2)
tokenizer3 = T5Tokenizer.from_pretrained(model_name3)

Downloading:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


Downloading:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


For each model (small, base, large), a corresponding tokenizer is loaded.

## Load Models

In [4]:
model1 = T5ForConditionalGeneration.from_pretrained(model_name1)
model2 = T5ForConditionalGeneration.from_pretrained(model_name2)
model3 = T5ForConditionalGeneration.from_pretrained(model_name3)

Downloading:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/242M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/892M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.95G [00:00<?, ?B/s]

For each model (small, base, large), model is loaded using from_pretrained.

## Tokenize the Input Prompt

In [5]:
input_prompt = "Generate a sentence that describes the importance of Machine Learning."
input_ids1 = tokenizer1.encode(input_prompt, return_tensors='pt')
input_ids2 = tokenizer2.encode(input_prompt, return_tensors='pt')
input_ids3 = tokenizer3.encode(input_prompt, return_tensors='pt')

* Input Prompt: The prompt for all models is "Generate a sentence that describes the importance of Machine Learning.".
* The encode() method tokenizes the input prompt into token IDs and returns tensors (return_tensors='pt'), ready for PyTorch models.

## Generate Text from Each Model

In [6]:
output1 = model1.generate(
    input_ids1, max_length=50, num_return_sequences=1, pad_token_id=tokenizer1.eos_token_id
)
output2 = model2.generate(
    input_ids2, max_length=50, num_return_sequences=1, pad_token_id=tokenizer2.eos_token_id
)
output3 = model3.generate(
    input_ids3, max_length=50, num_return_sequences=1, pad_token_id=tokenizer3.eos_token_id
)

Text Generation: The generate() method is used to produce text from the model:
* input_ids1, input_ids2, and input_ids3 are the tokenized prompts passed as input.
* max_length=50: The generated text is limited to a maximum length of 50 tokens.
* num_return_sequences=1: Only 1 sequence is generated.
* pad_token_id=tokenizerX.eos_token_id: Padding is done using the end-of-sequence (EOS) token.

## Decode the Generated Text

In [7]:
generated_text1 = tokenizer1.decode(output1[0], skip_special_tokens=True)
generated_text2 = tokenizer2.decode(output2[0], skip_special_tokens=True)
generated_text3 = tokenizer3.decode(output3[0], skip_special_tokens=True)

The generated token sequences (output1, output2, output3) are decoded back into human-readable text using the tokenizer's decode() method. The skip_special_tokens=True option removes special tokens like pad or eos from the final output.

## Print the Results

In [8]:
print("Generated Text by t5-small model:", generated_text1)

Generated Text by t5-small model: Generieren Sie eine Satz, die die Beschreibung der Bedeutung von Machine Learning.


In [9]:
print("Generated Text by t5-base model:", generated_text2)

Generated Text by t5-base model: a sentence that describes the importance of Machine Learning. Generate a sentence that describes the importance of Machine Learning.


In [10]:
print("Generated Text by t5-large model:", generated_text3)

Generated Text by t5-large model: a sentence that describes the importance of Machine Learning... Describe Machine Learning....rate a sentence that describes the importance of Machine Learning.. a sentence that describes


The generated text from each model (small, base, large) is printed for comparison.

## Summary
This code compares the performance of three different T5 models (small, base, and large) in generating text based on the same input prompt. The models take the prompt "Generate a sentence that describes the importance of Machine Learning" and generate a response. The t5-small, t5-base, and t5-large models differ in size, computational resources, and ability to generate more complex and accurate text. The comparison helps observe how model size affects the quality of text generation.