# Fine-tuned LAnguage Net Text-To-Text Transfer Transformer

Fine-tuned LAnguage Net Text-To-Text Transfer Transformer (in short FLAN-T5) is a sequence-to-sequence, large language model published by Google in late 2022. It has been pre-trained on prompting datasets. This means that the model has knowledge of performing specific tasks such as summarization, classification and translation, etc. The model architecture is very similar to the encoder-decoder transformer architecture defined in the revolutionary [Attention Is All You Need](https://arxiv.org/abs/1706.03762) paper.

## 🛠️ Supported Hardware

This notebook can run in a CPU or in a GPU.

✅ AMD Instinct™ Accelerators  
✅ AMD Radeon™ RX/PRO Graphics Cards  
✅ AMD EPYC™ Processors  
✅ AMD Ryzen™ (AI) Processors  

## ⚡ Recommended Software Environment

::::{tab-set}

:::{tab-item} Linux
- [Install Docker container](https://amdresearch.github.io/aup-ai-tutorials//env/env-gpu.html)
- [Install PyTorch](https://amdresearch.github.io/aup-ai-tutorials//env/env-cpu.html)
:::

:::{tab-item} Windows
- [Install Direct-ML](https://amdresearch.github.io/aup-ai-tutorials//env/env-gpu-windows.html)
- [Install PyTorch](https://amdresearch.github.io/aup-ai-tutorials//env/env-cpu.html)
:::
::::

## 🎯 Goals

- Show you how to download a model from HuggingFace
- Run FLAN-T5 on an AMD platform
- Execute a few prompts

:::{seealso}
- [FLAN-T5](https://huggingface.co/docs/transformers/en/model_doc/flan-t5)
- [Hugging Face Flan T5 small](https://huggingface.co/google/flan-t5-small)
- [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683)
:::

## 🚀 Run FLAN-T5 on an AMD Platform

Import the necessary packages

In [None]:
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

Check if GPU can be used

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using {device=}')

Download both the tokenizer and the model from Hugging Face

In [None]:
model_id = "google/flan-t5-small"

tokenizer = T5Tokenizer.from_pretrained(model_id, legacy=False)
model = T5ForConditionalGeneration.from_pretrained(model_id, device_map=device)

In [None]:
print(f'Model size: {model.num_parameters() * model.dtype.itemsize / 1024 / 1024:.2f} MB')

Let us translate to German, first we provide our prompt to the tokenizer that returns the tokens in tensor format, also the tokens are moved to the device (if a GPU is available). Then, we pass the tokens to the model and define the maximum number of tokens in the inference. Finally, we use the tokenizer decoder to print the German text.

In [None]:
input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids, max_new_tokens=500)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Let do sentiment analysis, we will use on entry of the [Poem Dataset](https://huggingface.co/datasets/google-research-datasets/poem_sentiment?row=3)

In [None]:
input_text = "Review: when i peruse the conquered fame of heroes, and the victories of mighty generals, i do not envy the generals. Sentiment:"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids, max_new_tokens=500)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Let's use for summary. We're using on of the entries of [Wikipedia Summary Dataset](https://huggingface.co/datasets/jordiclive/wikipedia-summary-dataset?row=13)

In [None]:
input_text = """ summarize: 
Amy: Hey Mark, have you heard about the new movie coming out this weekend?
Mark: Oh, no, I haven't. What's it called?
Amy: It's called "Stellar Odyssey." It's a sci-fi thriller with amazing special effects.
Mark: Sounds interesting. Who's in it?
Amy: The main lead is Emily Stone, and she's fantastic in the trailer. The plot revolves around a journey to a distant galaxy.
Mark: Nice! I'm definitely up for a good sci-fi flick. Want to catch it together on Saturday?
Amy: Sure, that sounds great! Let's meet at the theater around 7 pm.
"""
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids, max_new_tokens=500)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

We can also ask about AMD

In [None]:
input_text = "What does AMD do?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

You can check the raw tokens for both the input to the model as well as the inference. 

In [None]:
print(input_ids)

In [None]:
print(outputs[0])

We can also ask about math problems

In [None]:
input_text = "The square root of x is the cube root of y. What is y to the power of 2, if x = 4?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
outputs = model.generate(input_ids, max_new_tokens=500)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

----------
Copyright (C) 2025 Advanced Micro Devices, Inc. All rights reserved.

SPDX-License-Identifier: MIT