# Fine-tune FLAN-T5 for chat & dialogue summarization

In this blog, you will learn how to fine-tune [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl) for chat & dialogue summarization using Hugging Face Transformers. If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. 

In this example we will use the [samsum](https://huggingface.co/datasets/samsum) dataset a collection of about 16k messenger-like conversations with summaries. Conversations were created and written down by linguists fluent in English.

You will learn how to:

1. [Setup Development Environment](#1-setup-development-environment)
2. [Load and prepare samsum dataset](#2-load-and-prepare-samsum-dataset)
3. [Fine-tune and evaluate FLAN-T5](#3-fine-tune-and-evaluate-flan-t5)
4. [Run Inference and summarize ChatGPT dialogues](#4-run-inference-and-summarize-chatgpt-dialogues)

Before we can start, make sure you have a [Hugging Face Account](https://huggingface.co/join) to save artifacts and experiments. 

## Quick intro: FLAN-T5, just a better T5

FLAN-T5 released with the [Scaling Instruction-Finetuned Language Models](https://arxiv.org/pdf/2210.11416.pdf) paper is an enhanced version of T5 that has been finetuned in a mixture of tasks. The paper explores instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. The paper discovers that overall instruction finetuning is a general method for improving the performance and usability of pretrained language models. 

![flan-t5](../assets/flan-t5.png)

* Paper: https://arxiv.org/abs/2210.11416
* Official repo: https://github.com/google-research/t5x

--- 

Now we know what FLAN-T5 is, let's get started. 🚀

_Note: This tutorial was created and run on a g4dn.xlarge AWS EC2 Instance including a NVIDIA T4._

## 1. Setup Development Environment

Our first step is to install the Hugging Face Libraries, including transformers and datasets. Running the following cell will install all the required packages. 

In [1]:
# python
!pip install pytesseract scikit-learn transformers[torch] datasets rouge-score nltk tensorboard py7zr --upgrade



In [2]:
# install git-fls for pushing model and logs to the hugging face hub
!pip install git-lfs 



This example will use the [Hugging Face Hub](https://huggingface.co/models) as a remote model versioning service. To be able to push our model to the Hub, you need to register on the [Hugging Face](https://huggingface.co/join). 
If you already have an account, you can skip this step. 
After you have an account, we will use the `notebook_login` util from the `huggingface_hub` package to log into our account and store our token (access key) on the disk. 

In [3]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## 2. Load and prepare samsum dataset

we will use the [samsum](https://huggingface.co/datasets/samsum) dataset a collection of about 16k messenger-like conversations with summaries. Conversations were created and written down by linguists fluent in English.

```json
{
  "id": "13818513",
  "summary": "Amanda baked cookies and will bring Jerry some tomorrow.",
  "dialogue": "Amanda: I baked cookies. Do you want some?\r\nJerry: Sure!\r\nAmanda: I'll bring you tomorrow :-)"
}
```

In [4]:
import datasets
print("hello world")

hello world


In [5]:
#dataset_id = "samsum"

To load the `samsum` dataset, we use the `load_dataset()` method from the 🤗 Datasets library.


In [6]:
# Load dataset from the hub
# dataset_old = datasets.load_dataset(dataset_id)

# print(f"Train dataset size: {len(dataset_old['train'])}")
# print(f"Test dataset size: {len(dataset_old['test'])}")

# Train dataset size: 14732
# Test dataset size: 819

In [7]:
import numpy as np
import pandas as pd
data = pd.read_csv("/quaso data/dataset/transformed_data.csv")


In [8]:
print(data.size)
data.dropna(inplace=True)
print(data.size)

4462284
4462280


In [9]:
#n = 2100000  #numbers under 2000000 will work
#small did 181140
#base did 51140
m=61140
data.drop(data.head(m).index, inplace=True)
print(data.size / 2)

n = 2150000
data.drop(data.tail(n).index, inplace=True)
(data.size / 2)

2170000.0


20000.0

In [10]:
X = data["Input"]
y = data["Output"]

print(X.head(3))
print(y.head(3))

61140    Title: All Bran Rolls Ingredients: shortening,...
61141    Title: Batter-Fried Mushrooms Ingredients: mus...
61142    Title: Triple Layer Pie Ingredients: instant c...
Name: Input, dtype: object
61140    Mix stir and cool shortening water sugar All B...
61141    Wipe mushrooms with damp cloth. In a medium bo...
61142    Prepare chocolate pudding with 1 3/4 cups milk...
Name: Output, dtype: object


In [11]:
from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.03)                     #20% test data
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.0564)        #25% val data

print(X_train.size, X_test.size, X_val.size)
print(y_train.size, y_test.size, y_val.size)


18305 600 1095
18305 600 1095


Lets checkout an example of the dataset.

In [12]:
#print(dataset_old)
dataset_train = datasets.Dataset.from_pandas(pd.DataFrame({"input": X_train, "target": y_train}))
dataset_test = datasets.Dataset.from_pandas(pd.DataFrame({"input": X_test , "target": y_test }))
dataset_val = datasets.Dataset.from_pandas(pd.DataFrame({"input": X_val, "target": y_val}))
dataset = datasets.DatasetDict({"train": dataset_train,
                                "test": dataset_test,
                                "val": dataset_val})
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['input', 'target', '__index_level_0__'],
        num_rows: 18305
    })
    test: Dataset({
        features: ['input', 'target', '__index_level_0__'],
        num_rows: 600
    })
    val: Dataset({
        features: ['input', 'target', '__index_level_0__'],
        num_rows: 1095
    })
})


In [13]:
from random import randrange        


sample = dataset['train'][randrange(len(dataset["train"]))]
print(f"input: \n{sample['input']}\n---------------")
print(f"target: \n{sample['target']}\n---------------")

input: 
Title: Beulah'S Fruit Cake Ingredients: cake flour, margarine, sugar, egg yolks, soda, water, lemon extract, dates, white raisins, pecans, candied red cherries, candied green cherries, candied pineapple, egg whites
---------------
target: 
Cream well together margarine and sugar. Add beaten egg yolks. Add soda which has been dissolved in the water. Add lemon extract and mix thoroughly. Set aside. In a large bowl mix dates raisins pecans and candied fruits with about 1/4 of the flour. Mix remainder of flour into the sugar butter mixture then add fruit and nut mix and stir until thoroughly blended. Beat egg whites and fold in. Spray a large tube pan with nonstick spray. Pour in batter making sure it settles down into pan. Bake at 275\u00b0 for 3 hours until wooden pick comes out clean.
---------------


To train our model we need to convert our inputs (text) to token IDs. This is done by a 🤗 Transformers Tokenizer. If you are not sure what this means check out [chapter 6](https://huggingface.co/course/chapter6/1?fw=tf) of the Hugging Face Course.

In [14]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id="MettBrot/flan-t5-base-quaso-gen2.2"

# Load tokenizer of FLAN-t5-base
tokenizer = AutoTokenizer.from_pretrained(model_id)


before we can start training we need to preprocess our data. Abstractive Summarization is a text2text-generation task. This means our model will take a text as input and generate a summary as output. For this we want to understand how long our input and output will be to be able to efficiently batch our data. 

In [15]:
from datasets import concatenate_datasets

# The maximum total input sequence length after tokenization. 
# Sequences longer than this will be truncated, sequences shorter will be padded.
tokenized_inputs = concatenate_datasets([dataset["train"], dataset["test"]]).map(lambda x: tokenizer(x["input"], truncation=True), batched=True, remove_columns=["input", "target"])
max_source_length = max([len(x) for x in tokenized_inputs["input_ids"]])
print(f"Max source length: {max_source_length}")

# The maximum total sequence length for target text after tokenization. 
# Sequences longer than this will be truncated, sequences shorter will be padded."
tokenized_targets = concatenate_datasets([dataset["train"], dataset["test"]]).map(lambda x: tokenizer(x["target"], truncation=True), batched=True, remove_columns=["input", "target"])
max_target_length = max([len(x) for x in tokenized_targets["input_ids"]])
print(f"Max target length: {max_target_length}")

Map:   0%|          | 0/18905 [00:00<?, ? examples/s]

Max source length: 100


Map:   0%|          | 0/18905 [00:00<?, ? examples/s]

Max target length: 319


In [16]:
def preprocess_function(sample,padding="max_length"):
    # add prefix to the input for t5
    inputs = ["Create a recipe for: " + item for item in sample["input"]]

    # tokenize inputs
    model_inputs = tokenizer(inputs, max_length=max_source_length, padding=padding, truncation=True)

    # Tokenize targets with the `text_target` keyword argument
    labels = tokenizer(text_target=sample["target"], max_length=max_target_length, padding=padding, truncation=True)

    # If we are padding here, replace all tokenizer.pad_token_id in the labels by -100 when we want to ignore
    # padding in the loss.
    if padding == "max_length":
        labels["input_ids"] = [
            [(l if l != tokenizer.pad_token_id else -100) for l in label] for label in labels["input_ids"]
        ]

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_dataset = dataset.map(preprocess_function, batched=True, remove_columns=["input", "target", "__index_level_0__"])
print(f"Keys of tokenized dataset: {list(tokenized_dataset['train'].features)}")

Map:   0%|          | 0/18305 [00:00<?, ? examples/s]

Map:   0%|          | 0/600 [00:00<?, ? examples/s]

Map:   0%|          | 0/1095 [00:00<?, ? examples/s]

Keys of tokenized dataset: ['input_ids', 'attention_mask', 'labels']


## 3. Fine-tune and evaluate FLAN-T5

After we have processed our dataset, we can start training our model. Therefore we first need to load our [FLAN-T5](https://huggingface.co/models?search=flan-t5) from the Hugging Face Hub. In the example we are using a instance with a NVIDIA V100 meaning that we will fine-tune the `base` version of the model. 
_I plan to do a follow-up post on how to fine-tune the `xxl` version of the model using Deepspeed._


In [17]:
from transformers import AutoModelForSeq2SeqLM

# huggingface hub model id
#model_id="google/flan-t5-small"

# load model from the hub
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, max_length=512)

We want to evaluate our model during training. The `Trainer` supports evaluation during training by providing a `compute_metrics`.  
The most commonly used metrics to evaluate summarization task is [rogue_score](https://en.wikipedia.org/wiki/ROUGE_(metric)) short for Recall-Oriented Understudy for Gisting Evaluation). This metric does not behave like the standard accuracy: it will compare a generated summary against a set of reference summaries

We are going to use `evaluate` library to evaluate the `rogue` score.

In [18]:
import evaluate
import nltk
import numpy as np
from nltk.tokenize import sent_tokenize
nltk.download("punkt")

# Metric
metric = evaluate.load("rouge")

# helper function to postprocess text
def postprocess_text(preds, labels):
    preds = [pred.strip() for pred in preds]
    labels = [label.strip() for label in labels]

    # rougeLSum expects newline after each sentence
    preds = ["\n".join(sent_tokenize(pred)) for pred in preds]
    labels = ["\n".join(sent_tokenize(label)) for label in labels]

    return preds, labels

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    if isinstance(preds, tuple):
        preds = preds[0]
    decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
    # Replace -100 in the labels as we can't decode them.
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # Some simple post-processing
    decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)

    result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
    result = {k: round(v * 100, 4)/100 for k, v in result.items()}
    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
    result["gen_len"] = np.mean(prediction_lens)
    return result

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\florentin\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Before we can start training is to create a `DataCollator` that will take care of padding our inputs and labels. We will use the `DataCollatorForSeq2Seq` from the 🤗 Transformers library. 

In [19]:
from transformers import DataCollatorForSeq2Seq

# we want to ignore tokenizer pad token in the loss
label_pad_token_id = -100
# Data collator
data_collator = DataCollatorForSeq2Seq(
    tokenizer,
    model=model,
    label_pad_token_id=label_pad_token_id,
    pad_to_multiple_of=8
)


The last step is to define the hyperparameters (`TrainingArguments`) we want to use for our training. We are leveraging the [Hugging Face Hub](https://huggingface.co/models) integration of the `Trainer` to automatically push our checkpoints, logs and metrics during training into a repository.

In [20]:
from huggingface_hub import HfFolder
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

# Hugging Face repository id
repository_id = "MettBrot/flan-t5-base-quaso-gen2.3"

# Define training args
training_args = Seq2SeqTrainingArguments(
    output_dir=repository_id,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    predict_with_generate=True,
    fp16=False, # Overflows with fp16
    bf16=True,
    # idk what the other guy wanted to tell me here but you cant use fp16 with this model you have to use bf16
    learning_rate=5e-5,
    num_train_epochs=5, #original 5
    # logging & evaluation strategies
    logging_dir=f"{repository_id}/logs",
    logging_strategy="steps",
    logging_steps=500,
    eval_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=2,
    load_best_model_at_end=True,
    # metric_for_best_model="overall_f1",
    # push to hub parameters
    report_to="tensorboard",
    push_to_hub=False,
    hub_strategy="every_save",
    hub_model_id=repository_id,
    hub_token=HfFolder.get_token(),
)

# Create Trainer instance
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    compute_metrics=compute_metrics,
)




We can start our training by using the `train` method of the `Trainer`.

In [21]:
# Start training
trainer.train()

  0%|          | 0/22885 [00:00<?, ?it/s]

{'loss': 1.8482, 'grad_norm': 1.9129873514175415, 'learning_rate': 4.890758138518681e-05, 'epoch': 0.11}
{'loss': 1.838, 'grad_norm': 1.9941673278808594, 'learning_rate': 4.781516277037361e-05, 'epoch': 0.22}
{'loss': 1.8428, 'grad_norm': 1.6862812042236328, 'learning_rate': 4.672274415556041e-05, 'epoch': 0.33}
{'loss': 1.853, 'grad_norm': 2.1075921058654785, 'learning_rate': 4.5630325540747213e-05, 'epoch': 0.44}
{'loss': 1.8367, 'grad_norm': 2.2247538566589355, 'learning_rate': 4.453790692593402e-05, 'epoch': 0.55}
{'loss': 1.8282, 'grad_norm': 2.233309030532837, 'learning_rate': 4.344548831112082e-05, 'epoch': 0.66}
{'loss': 1.8157, 'grad_norm': 1.971685528755188, 'learning_rate': 4.235306969630763e-05, 'epoch': 0.76}
{'loss': 1.8068, 'grad_norm': 1.4842884540557861, 'learning_rate': 4.126065108149443e-05, 'epoch': 0.87}
{'loss': 1.8222, 'grad_norm': 2.435084104537964, 'learning_rate': 4.016823246668123e-05, 'epoch': 0.98}


  0%|          | 0/150 [00:00<?, ?it/s]

Non-default generation parameters: {'max_length': 512}


{'eval_loss': 1.715450406074524, 'eval_rouge1': 0.356378, 'eval_rouge2': 0.131576, 'eval_rougeL': 0.28763, 'eval_rougeLsum': 0.33907699999999996, 'eval_gen_len': 36.44833333333333, 'eval_runtime': 162.2386, 'eval_samples_per_second': 3.698, 'eval_steps_per_second': 0.925, 'epoch': 1.0}
{'loss': 1.7688, 'grad_norm': 1.8010356426239014, 'learning_rate': 3.907581385186804e-05, 'epoch': 1.09}
{'loss': 1.763, 'grad_norm': 1.5233500003814697, 'learning_rate': 3.798339523705484e-05, 'epoch': 1.2}
{'loss': 1.7776, 'grad_norm': 2.349514961242676, 'learning_rate': 3.689097662224165e-05, 'epoch': 1.31}
{'loss': 1.7759, 'grad_norm': 2.3652114868164062, 'learning_rate': 3.579855800742845e-05, 'epoch': 1.42}
{'loss': 1.7833, 'grad_norm': 1.68069589138031, 'learning_rate': 3.470613939261525e-05, 'epoch': 1.53}
{'loss': 1.7518, 'grad_norm': 1.7118258476257324, 'learning_rate': 3.361372077780205e-05, 'epoch': 1.64}
{'loss': 1.7677, 'grad_norm': 2.2422821521759033, 'learning_rate': 3.252130216298886e-05

  0%|          | 0/150 [00:00<?, ?it/s]

Non-default generation parameters: {'max_length': 512}


{'eval_loss': 1.6988353729248047, 'eval_rouge1': 0.36676499999999995, 'eval_rouge2': 0.138924, 'eval_rougeL': 0.29584299999999997, 'eval_rougeLsum': 0.347785, 'eval_gen_len': 43.85166666666667, 'eval_runtime': 213.5583, 'eval_samples_per_second': 2.81, 'eval_steps_per_second': 0.702, 'epoch': 2.0}
{'loss': 1.715, 'grad_norm': 2.2209997177124023, 'learning_rate': 2.9244046318549267e-05, 'epoch': 2.08}
{'loss': 1.7016, 'grad_norm': 1.900525450706482, 'learning_rate': 2.815162770373607e-05, 'epoch': 2.18}
{'loss': 1.7352, 'grad_norm': 2.150651454925537, 'learning_rate': 2.7059209088922875e-05, 'epoch': 2.29}
{'loss': 1.7324, 'grad_norm': 1.8237974643707275, 'learning_rate': 2.596679047410968e-05, 'epoch': 2.4}
{'loss': 1.7219, 'grad_norm': 2.2832772731781006, 'learning_rate': 2.4874371859296484e-05, 'epoch': 2.51}
{'loss': 1.7214, 'grad_norm': 1.503172516822815, 'learning_rate': 2.378195324448329e-05, 'epoch': 2.62}
{'loss': 1.7226, 'grad_norm': 2.2559854984283447, 'learning_rate': 2.2689

  0%|          | 0/150 [00:00<?, ?it/s]

Non-default generation parameters: {'max_length': 512}


{'eval_loss': 1.6921210289001465, 'eval_rouge1': 0.367031, 'eval_rouge2': 0.138945, 'eval_rougeL': 0.297269, 'eval_rougeLsum': 0.34847700000000004, 'eval_gen_len': 40.32833333333333, 'eval_runtime': 187.0917, 'eval_samples_per_second': 3.207, 'eval_steps_per_second': 0.802, 'epoch': 3.0}
{'loss': 1.6948, 'grad_norm': 1.8149487972259521, 'learning_rate': 1.94122787852305e-05, 'epoch': 3.06}
{'loss': 1.7069, 'grad_norm': 2.060004949569702, 'learning_rate': 1.8319860170417304e-05, 'epoch': 3.17}
{'loss': 1.6928, 'grad_norm': 2.342198371887207, 'learning_rate': 1.722744155560411e-05, 'epoch': 3.28}
{'loss': 1.6813, 'grad_norm': 2.7239925861358643, 'learning_rate': 1.6135022940790913e-05, 'epoch': 3.39}
{'loss': 1.6854, 'grad_norm': 2.1783459186553955, 'learning_rate': 1.5042604325977716e-05, 'epoch': 3.5}
{'loss': 1.6814, 'grad_norm': 1.7549923658370972, 'learning_rate': 1.3950185711164519e-05, 'epoch': 3.6}
{'loss': 1.676, 'grad_norm': 2.3355705738067627, 'learning_rate': 1.28577670963513

  0%|          | 0/150 [00:00<?, ?it/s]

Non-default generation parameters: {'max_length': 512}


{'eval_loss': 1.688697099685669, 'eval_rouge1': 0.37326, 'eval_rouge2': 0.142172, 'eval_rougeL': 0.300908, 'eval_rougeLsum': 0.353696, 'eval_gen_len': 42.035, 'eval_runtime': 201.231, 'eval_samples_per_second': 2.982, 'eval_steps_per_second': 0.745, 'epoch': 4.0}
{'loss': 1.6708, 'grad_norm': 1.838274359703064, 'learning_rate': 9.580511251911733e-06, 'epoch': 4.04}
{'loss': 1.6628, 'grad_norm': 1.4072484970092773, 'learning_rate': 8.488092637098537e-06, 'epoch': 4.15}
{'loss': 1.6523, 'grad_norm': 2.729693651199341, 'learning_rate': 7.39567402228534e-06, 'epoch': 4.26}
{'loss': 1.6546, 'grad_norm': 2.911719560623169, 'learning_rate': 6.303255407472143e-06, 'epoch': 4.37}
{'loss': 1.6567, 'grad_norm': 1.6340956687927246, 'learning_rate': 5.210836792658947e-06, 'epoch': 4.48}
{'loss': 1.686, 'grad_norm': 2.389305591583252, 'learning_rate': 4.118418177845751e-06, 'epoch': 4.59}
{'loss': 1.6728, 'grad_norm': 2.219902992248535, 'learning_rate': 3.025999563032554e-06, 'epoch': 4.7}
{'loss': 

Non-default generation parameters: {'max_length': 512}


  0%|          | 0/150 [00:00<?, ?it/s]

Non-default generation parameters: {'max_length': 512}


{'eval_loss': 1.6873698234558105, 'eval_rouge1': 0.371005, 'eval_rouge2': 0.142546, 'eval_rougeL': 0.300257, 'eval_rougeLsum': 0.35225, 'eval_gen_len': 43.495, 'eval_runtime': 216.3902, 'eval_samples_per_second': 2.773, 'eval_steps_per_second': 0.693, 'epoch': 5.0}


There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].


{'train_runtime': 7910.8687, 'train_samples_per_second': 11.57, 'train_steps_per_second': 2.893, 'train_loss': 1.735341051704685, 'epoch': 5.0}


TrainOutput(global_step=22885, training_loss=1.735341051704685, metrics={'train_runtime': 7910.8687, 'train_samples_per_second': 11.57, 'train_steps_per_second': 2.893, 'total_flos': 1.27303346386944e+16, 'train_loss': 1.735341051704685, 'epoch': 5.0})


![flan-t5-tensorboard](../assets/flan-t5-tensorboard.png)

Nice, we have trained our model. 🎉 Lets run evaluate the best model again on the test set.


In [22]:
trainer.evaluate()

  0%|          | 0/150 [00:00<?, ?it/s]

{'eval_loss': 1.6873698234558105,
 'eval_rouge1': 0.371005,
 'eval_rouge2': 0.142546,
 'eval_rougeL': 0.300257,
 'eval_rougeLsum': 0.35225,
 'eval_gen_len': 43.495,
 'eval_runtime': 214.3138,
 'eval_samples_per_second': 2.8,
 'eval_steps_per_second': 0.7,
 'epoch': 5.0}

The best score we achieved is an `rouge1` score of `47.23`. 

Lets save our results and tokenizer to the Hugging Face Hub and create a model card. 

In [23]:
# Save our tokenizer and create model card
tokenizer.save_pretrained(repository_id)
trainer.create_model_card()
# Push the results to the hub
trainer.push_to_hub()

Non-default generation parameters: {'max_length': 512}


events.out.tfevents.1720013070.Florentin-PC.26736.0:   0%|          | 0.00/18.5k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.30k [00:00<?, ?B/s]

events.out.tfevents.1720021195.Florentin-PC.26736.1:   0%|          | 0.00/623 [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/MettBrot/flan-t5-base-quaso-gen2.3/commit/2d7baf56494042b738aebe40e58d5ed4cbd0fab4', commit_message='End of training', commit_description='', oid='2d7baf56494042b738aebe40e58d5ed4cbd0fab4', pr_url=None, pr_revision=None, pr_num=None)

## 4. Run Inference

Now we have a trained model, we can use it to run inference. We will use the `pipeline` API from transformers and a `test` example from our dataset.

In [30]:
from transformers import pipeline
from random import randrange        

model_id = "MettBrot/flan-t5-small-quaso-gen3"

# load model and tokenizer from huggingface hub with pipeline
summarizer = pipeline("text2text-generation", model=model_id, device=0, max_length=1000)

model_id_old = "MettBrot/flan-t5-base-quaso-gen2.3"

summarizer_old = pipeline("text2text-generation", model=model_id_old, device=0, max_length=1000)

model_id_older = "MettBrot/flan-t5-base-quaso-gen2.2"

summarizer_older = pipeline("text2text-generation", model=model_id_older, device=0, max_length=1000)

# select a random test sample
sample = dataset['test'][randrange(len(dataset["test"]))]
print(f"Input: \nCreate a recipe for: {sample['input']}\n---------------")

print(f"Target: \n{sample['target']}\n---------------")

# summarize dialogue
res = summarizer("Create a recipe for: " + sample["input"])

print(f"quaso small gen 3 summary:\n{res}\n---------------")

res_old = summarizer_old("Create a recipe for: " + sample["input"])

print(f"quaso base gen 2.2 summary:\n{res_old}")

res_older = summarizer_older("Create a recipe for: " + sample["input"])

print(f"quaso base gen 2.1 summary:\n{res_older}")

Input: 
Create a recipe for: Title: Krispy Cheese Wafers Ingredients: oleo, flour, Cheddar cheese, Rice Krispies, red pepper
---------------
Target: 
Mix all ingredients thoroughly. Shape into balls. Place on a greased cookie sheet. Flatten with bottom of a glass dipped in flour. Bake at 350\u00b0 until done.
---------------
quaso small gen 3 summary:
[{'generated_text': 'Mix all ingredients together. Roll into small balls. Bake at 350u00b0 for 10 to 12 minutes.'}]
---------------
quaso base gen 2.2 summary:
[{'generated_text': 'Melt oleo in a 9 x 13-inch pan. Mix flour and cheese. Add Rice Krispies and red pepper. Mix well. Spread on oleo. Bake at 350u00b0 for 15 minutes.'}]
quaso base gen 2.1 summary:
[{'generated_text': 'Melt oleo in a large skillet. Add flour and stir until smooth. Add cheese and stir until melted. Add Rice Krispies and stir until well blended. Drop by teaspoonfuls onto ungreased cookie sheet. Bake at 350u00b0 for 10 to 12 minutes.'}]
