In this tutorial, we'll use a transformer model to generate text based on input text. This is commonly referred to as text generation, and models like GPT (Generative Pretrained Transformer) and T5 are great for this task. We’ll use the popular model t5-base, which is a general-purpose text-to-text transformer that can perform tasks like summarization, translation, and text generation.

# Import the necessary libraries:

In [1]:
from transformers import pipeline


Create a text generation pipeline:
# We’ll initialize a text-to-text generation pipeline using the t5-base model.

In [2]:
generator = pipeline("text2text-generation", model="t5-base")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]



# Provide input text:
You can provide some input text to generate a response. For this tutorial, we’ll use the T5 model in its basic form. You can experiment with tasks like answering questions or summarizing content.

In [3]:
input_text = "Translate the following English text to French: Hugging Face is creating amazing tools for the NLP community."


# Generate the output:
Now you can use the generator() function to generate text based on the input.

In [4]:
output = generator(input_text)
print("Generated text:", output[0]['generated_text'])




Generated text: Hugging Face crée des outils extraordinaires pour la communauté de la LNP.



# Fine-Tuning the T5 Model for Text-to-Text Generation

In this section, we'll demonstrate how to fine-tune the `t5-base` model for text-to-text generation tasks on a custom dataset. Fine-tuning allows the model to adapt to specific text generation tasks such as translation, summarization, or other natural language processing tasks.

We will use the Hugging Face `Trainer` API to handle the fine-tuning process.


In [None]:

# Step 1: Load a custom text-to-text dataset (e.g., a summarization dataset)
# For demonstration purposes, we’ll use the CNN/DailyMail dataset for summarization. You can replace this with your own dataset.

from datasets import load_dataset

dataset = load_dataset('cnn_dailymail', '3.0.0', split='train[:1%]')

# Step 2: Load the pretrained model and tokenizer
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained('t5-base')
model = T5ForConditionalGeneration.from_pretrained('t5-base')

# Step 3: Preprocess the dataset for fine-tuning
def preprocess_data(examples):
    inputs = ["summarize: " + doc for doc in examples["article"]]  # Prepend task for T5
    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding='max_length')

    # Tokenize the summaries (labels)
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(examples["highlights"], max_length=128, truncation=True, padding='max_length')

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

train_data = dataset.map(preprocess_data, batched=True, remove_columns=["article", "highlights"])

# Step 4: Define training arguments and initialize Trainer
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir='./results',
evaluation_strategy="no",  # Disable evaluation
    learning_rate=3e-5,
    per_device_train_batch_size=4,
    num_train_epochs=1
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_data
)

# Step 5: Fine-tune the model
trainer.train()


# Now check model's performance after fine-tuning

# Exercise Questions:






*   Experiment with different tasks, such as summarization or translation to various languages. How does the model perform when given more complex tasks (e.g., summarizing large paragraphs)?
*   Modify the script to allow the user to continuously input new tasks and generate text without having to restart the program.
*   Try out different models from Hugging Face’s text2text-generation models (e.g., t5-large, t5-small) to see how they compare in terms of performance and speed.
Explore fine-tuning the max_length and min_length parameters in the generation function to control the length of the output text.



