### Fine-tuning T5-small

The T5-small model has 60M parameters, and is part of the family of the T5 text-to-text models. It was trained on Common Crawl.

Import libraries that are going to be used in the following subsections

In [6]:
try:
  from google.colab import drive
  drive.mount('/content/drive')
  import sys
  path_to_project = '/content/drive/MyDrive/NLP_Project'
  sys.path.append(path_to_project)
  IN_COLAB = True
except:
  IN_COLAB = False

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [7]:
import pandas as pd
import torch
import numpy as np

In [8]:
dataset_path = path_to_project + '/final_ds.csv' if IN_COLAB else './final_ds.csv'
df = pd.read_csv(dataset_path)

In [None]:
!pip install -q transformers
!pip install datasets

In [10]:
from datasets import Dataset, DatasetDict

#### Dataset Pre-processing



For the fine-tuning of T5-small we decided to keep 5000 samples. The model is relatively small, with only 60M parameters and already trained: a higher amount of samples led to worse performance because it caused the training to overwrite parameters unecessarily

In [None]:
df = df.head(5000).copy()
df.shape

(5000, 6)

Subset the data into training dataset, validation dataset and test dataset. We decided to use 10% of the dataset for tesing, and the remaining for train and validation.

In [12]:
from sklearn.model_selection import train_test_split

train_val, test = train_test_split(df, test_size=0.1)
train, val = train_test_split(train_val, test_size=0.2)

In [13]:
print('# train instances: ', train.shape[0])
print('# test instances:  ', test.shape[0])
print('# val instances:   ', val.shape[0])

# train instances:  3600
# test instances:   500
# val instances:    900


Now we check the device

In [14]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cuda


#### Model and Tokenizer

Retrieve the model and tokenizer

In [15]:
from transformers import T5Tokenizer, T5ForConditionalGeneration, T5Model

In [16]:
model_name = 'google-t5/t5-small'

Upload the Tokenizer

In [17]:
tokenizer = T5Tokenizer.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [18]:
tokenizer.pad_token = tokenizer.eos_token

In [19]:
print("vocabulary size: ", tokenizer.vocab_size)

vocabulary size:  32000


In [20]:
list(tokenizer.get_vocab().items())[600:610]

[('▁big', 600),
 ('▁God', 601),
 ('▁dass', 602),
 ('im', 603),
 ('▁30', 604),
 ('▁event', 605),
 ('▁development', 606),
 ('▁form', 607),
 ('▁read', 608),
 ('▁hand', 609)]

In [21]:
text = "You have an array in input, order the elements in it in O(n) time complexity. Add a wrong wordd"
encoded_input = tokenizer._tokenize(text)
print(encoded_input)

['▁You', '▁have', '▁an', '▁array', '▁in', '▁input', ',', '▁order', '▁the', '▁elements', '▁in', '▁it', '▁in', '▁O', '(', 'n', ')', '▁time', '▁complexity', '.', '▁Add', '▁', 'a', '▁wrong', '▁word', 'd']


In [22]:
encoded_ids = tokenizer(text)['input_ids']
print(encoded_ids)

[148, 43, 46, 5590, 16, 3785, 6, 455, 8, 2479, 16, 34, 16, 411, 599, 29, 61, 97, 11641, 5, 2334, 3, 9, 1786, 1448, 26, 1]


The model is uploaded and connected to the device

In [23]:
t5 = T5ForConditionalGeneration.from_pretrained(model_name, device_map=device)
t5.resize_token_embeddings(len(tokenizer))

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Embedding(32100, 512)

In [24]:
print(t5)

T5ForConditionalGeneration(
  (shared): Embedding(32100, 512)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32100, 512)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=512, out_features=512, bias=False)
              (k): Linear(in_features=512, out_features=512, bias=False)
              (v): Linear(in_features=512, out_features=512, bias=False)
              (o): Linear(in_features=512, out_features=512, bias=False)
              (relative_attention_bias): Embedding(32, 8)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseActDense(
              (wi): Linear(in_features=512, out_features=2048, bias=False)
              (wo): Linear(in_features=2048, out_features=512, bias=False)
              (dropout): Drop

Consider now the number of parameters of the model

In [25]:
n_params = sum(param.numel() for param in t5.parameters())
n_params

60492288

In [26]:
for name, param in t5.named_parameters():
    print(f"Parameter name: {name}")
    print(f"Parameter shape: {param.size()}")
    print(f"Is trainable: {param.requires_grad}")
    print()

Parameter name: shared.weight
Parameter shape: torch.Size([32100, 512])
Is trainable: True

Parameter name: encoder.block.0.layer.0.SelfAttention.q.weight
Parameter shape: torch.Size([512, 512])
Is trainable: True

Parameter name: encoder.block.0.layer.0.SelfAttention.k.weight
Parameter shape: torch.Size([512, 512])
Is trainable: True

Parameter name: encoder.block.0.layer.0.SelfAttention.v.weight
Parameter shape: torch.Size([512, 512])
Is trainable: True

Parameter name: encoder.block.0.layer.0.SelfAttention.o.weight
Parameter shape: torch.Size([512, 512])
Is trainable: True

Parameter name: encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight
Parameter shape: torch.Size([32, 8])
Is trainable: True

Parameter name: encoder.block.0.layer.0.layer_norm.weight
Parameter shape: torch.Size([512])
Is trainable: True

Parameter name: encoder.block.0.layer.1.DenseReluDense.wi.weight
Parameter shape: torch.Size([2048, 512])
Is trainable: True

Parameter name: encoder.block.0.lay

#### Training


Now that we have both Tokenizer and Model we can tokenize the dataset and train the model

In [27]:
def apply_eos_token(idx, df, eos_token):
    # build the user input including the problem description, the time complexity and the space complexity
    chat_string = 'User:' + df.loc[idx, 'problem_description'] + ' Time complexity: ' + df.loc[idx, 'time_complexity_inferred'] + '; Space complexity: ' + df.loc[idx, 'space_complexity_inferred']
    # now add the eos token and the response from the assistant, the code solution for the problem
    chat_string = chat_string + eos_token + 'Assistant: ' + df.loc[idx, 'solution_code'] + eos_token
    return chat_string

In [28]:
train_str = [apply_eos_token(idx, train, tokenizer.eos_token) for idx in train.index]
test_str = [apply_eos_token(idx, test, tokenizer.eos_token) for idx in test.index]
val_str = [apply_eos_token(idx, val, tokenizer.eos_token) for idx in val.index]

In [29]:
train_str[0]

'User:You are given a sequence a_1, a_2, ..., a_n consisting of n integers.\n\nYou can choose any non-negative integer D (i.e. D ≥ 0), and for each a_i you can:\n\n  * add D (only once), i. e. perform a_i := a_i + D, or \n  * subtract D (only once), i. e. perform a_i := a_i - D, or \n  * leave the value of a_i unchanged. \n\n\n\nIt is possible that after an operation the value a_i becomes negative.\n\nYour goal is to choose such minimum non-negative integer D and perform changes in such a way, that all a_i are equal (i.e. a_1=a_2=...=a_n).\n\nPrint the required D or, if it is impossible to choose such value D, print -1.\n\nFor example, for array [2, 8] the value D=3 is minimum possible because you can obtain the array [5, 5] if you will add D to 2 and subtract D from 8. And for array [1, 4, 7, 7] the value D=3 is also minimum possible. You can add it to 1 and subtract it from 7 and obtain the array [4, 4, 4, 4].\n\nInput\n\nThe first line of the input contains one integer n (1 ≤ n ≤ 10

The strings where the eos token was applied, are now put in a dictionary structure

In [30]:
train_data = Dataset.from_dict({'chat': train_str})
test_data = Dataset.from_dict({'chat': test_str})
val_data = Dataset.from_dict({'chat': val_str})

In [31]:
data = DatasetDict()
data['train'] = train_data
data['val'] = val_data
data['test'] = test_data

Tokenize the data

In [32]:
def tokenize_function(examples):
    input_encodings = tokenizer(examples["chat"],
        truncation=True,
        padding="max_length",
        max_length=512)
    sample = {
        'input_ids': input_encodings.input_ids,
        'attention_mask': input_encodings.attention_mask,
        #'labels': input_encodings.input_ids.copy()
    }
    return sample

tokenized_data = data.map(tokenize_function, batched=True)

Map:   0%|          | 0/3600 [00:00<?, ? examples/s]



Map:   0%|          | 0/900 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

In [33]:
# get all sequences in same batch
from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)

Now the training starts

##### Calculate Perplexity before Fine-tuning

We decided to calculate the perplexity before the fine-tuning to observe if the fine-tuning process improves the performance of the model

In [34]:
# get inputs from test_data
test_input = [dialogue.split('</s>')[0] + tokenizer.eos_token + 'Assistant: ' for dialogue in test_data['chat'][:3]]

# get outputs from test data
test_output = [dialogue.split('</s>')[1] + tokenizer.eos_token for dialogue in test_data['chat'][:3]]

In [None]:
!pip install lmppl

In [36]:
import lmppl

scorer = lmppl.EncoderDecoderLM(model_name)

ppl = scorer.get_perplexity(input_texts= test_input, output_texts=test_output)
print(list( ppl))
print(f"average perplexity: {sum(ppl)/len(ppl)}")

  0%|          | 0/1 [00:00<?, ?it/s]Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.
100%|██████████| 1/1 [00:01<00:00,  1.32s/it]

[66.54613901709006, 105.79112153990572, 35432.329456935135]
average perplexity: 11868.222239164044





In [37]:
print(f"user input: \n{test_input[ppl.index(min(ppl))]}")
print('#################')
print(f"prediction: \n{test_output[ppl.index(min(ppl))]}")

user input: 
User:Write a program which prints $n$-th fibonacci number for a given integer $n$. The $n$-th fibonacci number is defined by the following recursive formula:

\begin{equation*} fib(n)= \left \\{ \begin{array}{ll} 1 & (n = 0) \\\ 1 & (n = 1) \\\ fib(n - 1) + fib(n - 2) & \\\ \end{array} \right. \end{equation*}

Constraints

* $0 \leq n \leq 44$

Input

An integer $n$ is given.

Example

Input

3


Output

3 Time complexity: O(nlogn); Space complexity: O(n**2)</s>Assistant: 
#################
prediction: 
Assistant: N = int(input())

dp = [1] * (N + 1)

for n in range(2, N + 1):
    dp[n] = dp[n - 1] + dp[n - 2]

print(dp[N])</s>


#### Fine-Tuning
Now we can start the training

In [38]:
from transformers import TrainingArguments, Trainer
import os

In [39]:
os.environ["WANDB_DISABLED"] = "true"

In [40]:
training_args = TrainingArguments(
    "t5_fine-tune",
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=8,
    num_train_epochs=3,
    save_steps=500,
    eval_steps=500,
    learning_rate=1e-4,
    lr_scheduler_type="linear",
    bf16=True,
    report_to=None,
)

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


In [41]:
trainer = Trainer(
    model= t5,
    args= training_args,
    train_dataset= tokenized_data['train'],
    eval_dataset= tokenized_data['val'],
    data_collator= data_collator
)

Measure the time taken by the model to finish training

In [42]:
import time
begin = time.time()

In [43]:
trainer.train()

Step,Training Loss
500,0.2624


TrainOutput(global_step=675, training_loss=0.20890251300953053, metrics={'train_runtime': 1313.4505, 'train_samples_per_second': 8.223, 'train_steps_per_second': 0.514, 'total_flos': 1461691455897600.0, 'train_loss': 0.20890251300953053, 'epoch': 3.0})

In [44]:
end = time.time()
print("Training time: ", end - begin)

Training time:  1314.5914108753204


In [45]:
# Get all of the model's parameters as a list of tuples.
params = list(t5.named_parameters())

print('The T5 model has {:} different named parameters.\n'.format(len(params)))

print('==== Embedding Layer ====\n')

for p in params[0:2]:
    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))

print('\n==== First Transformer ====\n')

for p in params[2:14]:
    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))

print('\n==== Output Layer ====\n')

for p in params[-2:]:
    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))

The T5 model has 131 different named parameters.

==== Embedding Layer ====

shared.weight                                           (32100, 512)
encoder.block.0.layer.0.SelfAttention.q.weight            (512, 512)

==== First Transformer ====

encoder.block.0.layer.0.SelfAttention.k.weight            (512, 512)
encoder.block.0.layer.0.SelfAttention.v.weight            (512, 512)
encoder.block.0.layer.0.SelfAttention.o.weight            (512, 512)
encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight      (32, 8)
encoder.block.0.layer.0.layer_norm.weight                     (512,)
encoder.block.0.layer.1.DenseReluDense.wi.weight         (2048, 512)
encoder.block.0.layer.1.DenseReluDense.wo.weight         (512, 2048)
encoder.block.0.layer.1.layer_norm.weight                     (512,)
encoder.block.1.layer.0.SelfAttention.q.weight            (512, 512)
encoder.block.1.layer.0.SelfAttention.k.weight            (512, 512)
encoder.block.1.layer.0.SelfAttention.v.weight      

In [46]:
t5.eval()

T5ForConditionalGeneration(
  (shared): Embedding(32100, 512)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32100, 512)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=512, out_features=512, bias=False)
              (k): Linear(in_features=512, out_features=512, bias=False)
              (v): Linear(in_features=512, out_features=512, bias=False)
              (o): Linear(in_features=512, out_features=512, bias=False)
              (relative_attention_bias): Embedding(32, 8)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseActDense(
              (wi): Linear(in_features=512, out_features=2048, bias=False)
              (wo): Linear(in_features=2048, out_features=512, bias=False)
              (dropout): Drop

Save model's parameters

In [47]:
from datetime import datetime
t5_training_path = path_to_project + '/Transformer-trained-models/' + f"t5_finetune_{datetime.now().strftime('%Y_%m_%d_%H_%M_%S')}"
tokenizer.save_pretrained(t5_training_path)
t5.save_pretrained(t5_training_path)
print(f"Checkpoint saved at: \'{t5_training_path}\'")

Checkpoint saved at: '/content/drive/MyDrive/NLP_Project/Transformer-trained-models/t5_finetune_2025_05_20_18_19_41'


#### Testing

Retrieve the trained model

In [48]:
device = 'cuda'

#t5_training_path = '/content/drive/MyDrive/NLP_Project/Transformer-trained-models/t5_finetune_2025_05_18_20_03_51'
tokenizer = T5Tokenizer.from_pretrained(t5_training_path)
t5 = T5ForConditionalGeneration.from_pretrained(t5_training_path, device_map=device)

To test the model, we first extract one chat from the test data randomly, give it as input to the model, and we see the response

In [49]:
import random

random.seed(43)

idx = random.choice(range(len(test_data))) # select a random conversation
print(idx)
dialogue = test_data['chat'][idx]
print(dialogue)

19
User:Lunar New Year is approaching, and Bob decides to take a wander in a nearby park.

The park can be represented as a connected graph with n nodes and m bidirectional edges. Initially Bob is at the node 1 and he records 1 on his notebook. He can wander from one node to another through those bidirectional edges. Whenever he visits a node not recorded on his notebook, he records it. After he visits all nodes at least once, he stops wandering, thus finally a permutation of nodes a_1, a_2, …, a_n is recorded.

Wandering is a boring thing, but solving problems is fascinating. Bob wants to know the lexicographically smallest sequence of nodes he can record while wandering. Bob thinks this problem is trivial, and he wants you to solve it.

A sequence x is lexicographically smaller than a sequence y if and only if one of the following holds: 

  * x is a prefix of y, but x ≠ y (this is impossible in this problem as all considered sequences have the same length); 
  * in the first positio

In [50]:
# now take only the 'input part' and the 'output part'
# parse string
test_input, test_output, en = dialogue.split('</s>')

# add eos token at the end of test_input and test_output
test_input = test_input + tokenizer.eos_token + ' Assistant: '
test_output = test_output + tokenizer.eos_token

print('User input: \n', test_input)
print('User input lenght: ', len(test_input))
print('####')
print('Correct solution output: \n', test_output)

User input: 
 User:Lunar New Year is approaching, and Bob decides to take a wander in a nearby park.

The park can be represented as a connected graph with n nodes and m bidirectional edges. Initially Bob is at the node 1 and he records 1 on his notebook. He can wander from one node to another through those bidirectional edges. Whenever he visits a node not recorded on his notebook, he records it. After he visits all nodes at least once, he stops wandering, thus finally a permutation of nodes a_1, a_2, …, a_n is recorded.

Wandering is a boring thing, but solving problems is fascinating. Bob wants to know the lexicographically smallest sequence of nodes he can record while wandering. Bob thinks this problem is trivial, and he wants you to solve it.

A sequence x is lexicographically smaller than a sequence y if and only if one of the following holds: 

  * x is a prefix of y, but x ≠ y (this is impossible in this problem as all considered sequences have the same length); 
  * in the fi

In [51]:
input_ids = tokenizer(test_input, return_tensors="pt", max_length = 2000).to(device)
output = t5.generate(**input_ids, max_new_tokens=800)
gen_text = tokenizer.decode(output[0])

print(gen_text)
print(gen_text.split('Assistant:')[-1])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


<pad> User:Lunar New Year is approaching, and Bob decides to take a wander in a nearby park. The park can be represented as a connected graph with n nodes and m bidirectional edges. Initially Bob is at the node 1 and he records 1 on his notebook. He can wander from one node to another through those bidirectional edges. Whenever he visits a node not recorded on his notebook, he records it. After he visits all nodes at least once, he stops wandering, thus finally a permutation of nodes a_1, a_2,..., a_n is recorded. Wandering is a boring thing, but solving problems is fascinating. Bob wants to know the lexicographically smallest sequence of nodes he can record while wandering. Bob thinks this problem is trivial, and he wants you to solve it. A sequence x is lexicographically smaller than a sequence y if and only if one of the following holds: * x is a prefix of y, but x <unk> y (this is impossible in this problem as all considered sequences have the same length); * in the first position 

Also in this case, the model does not perform well: it doesn't generate the code, mainly copies the input, and struggles to keep the separation between 'User' and 'Assistant'. Overall the generated text from T5-small is less clear than the text generated by the training of T5-base: the bigger amount of parameters of T5-base guarantees better text quality.

#### Evaluation Metrics

It was decided to consider the metrics Perplexity, BLEU and F1.

**Perplexity**

In [52]:
# get inputs from test_data
test_input = [dialogue.split('</s>')[0] + tokenizer.eos_token + 'Assistant: ' for dialogue in test_data['chat'][:3]]

# get outputs from test data
test_output = [dialogue.split('</s>')[1] + tokenizer.eos_token for dialogue in test_data['chat'][:3]]

In [None]:
!pip install lmppl

In [54]:
import lmppl

scorer = lmppl.EncoderDecoderLM(t5_training_path)

ppl = scorer.get_perplexity(input_texts= test_input, output_texts=test_output)
print(list( ppl))
print(f"average perplexity: {sum(ppl)/len(ppl)}")

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
100%|██████████| 1/1 [00:00<00:00,  8.32it/s]

[439.4691293912123, 720.7729464902942, 1981.2242516858746]
average perplexity: 1047.1554425224604





The perplexity values before the fine-tuning were: [66.54613901709006, 105.79112153990572, 35432.329456935135] with
average perplexity: 11868.222239164044

We can see that the fine-tuning did not increase considerably the performance of the model. The average of the perplexity decreased though, indicating a better understanding of the new functionality with respect to the model prior the fine-tuning.

In [55]:
print(f"user input: \n{test_input[ppl.index(min(ppl))]}")
print('#################')
print(f"prediction: \n{test_output[ppl.index(min(ppl))]}")

user input: 
User:Write a program which prints $n$-th fibonacci number for a given integer $n$. The $n$-th fibonacci number is defined by the following recursive formula:

\begin{equation*} fib(n)= \left \\{ \begin{array}{ll} 1 & (n = 0) \\\ 1 & (n = 1) \\\ fib(n - 1) + fib(n - 2) & \\\ \end{array} \right. \end{equation*}

Constraints

* $0 \leq n \leq 44$

Input

An integer $n$ is given.

Example

Input

3


Output

3 Time complexity: O(nlogn); Space complexity: O(n**2)</s>Assistant: 
#################
prediction: 
Assistant: N = int(input())

dp = [1] * (N + 1)

for n in range(2, N + 1):
    dp[n] = dp[n - 1] + dp[n - 2]

print(dp[N])</s>


BLEU

BLEU focuses on precision by counting matching n-grams.

In [None]:
!pip -q install parlai

In [57]:
from parlai.core.metrics import BleuMetric

input_ids = tokenizer(test_input[0], return_tensors="pt", max_length = 2000).to(device)
output = t5.generate(**input_ids, max_new_tokens=800)
gen_text = tokenizer.decode(output[0])

bleu = BleuMetric.compute(gen_text, [test_output[0]])
print(f"BLEU: {bleu}")

BLEU: 7.417e-06


F1 Score

In [58]:
from parlai.core.metrics import F1Metric

f1_score = F1Metric.compute(gen_text, [test_output[0]])
print(f"F1: {f1_score}")

F1: 0.0744
