# Fine-Tuning LLMs

### BERT, T5, LLaMA3.2 models, also xlm-roberta-base - ENCODER only model, FLAN-T5 - ENCODER/DECODER.

Copy the model name:

* "bert-base-uncased" (BERT)

* "google-t5/t5-small" (T5)

* "meta-llama/Llama-3-8b" (LLaMA 3.2)

* "xlm-roberta-base" (XLM-RoBERTa)

* "google/flan-t5-small" (FLAN-T5)

In [1]:
!pip install transformers datasets

Collecting datasets
  Downloading datasets-3.5.1-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.5.1-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.4/491.4 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2025.3.0-py3-none-any.whl (

The **IMDB** dataset used in the fine-tuning script contains 50,000 movie reviews labeled as either positive (1) or negative (0).

Breakdown of **IMDB** Dataset:

* **Training Set**: 25,000 reviews

* **Test Set**: 25,000 reviews

* **Balanced**: 50% positive, 50% negative

* **Average Length**: ~231 words per review

Raw text:

* **Compressed** (Raw .tar.gz from Hugging Face/Keras) → ~80 MB

* **Uncompressed** (Tokenized in Memory for Training) → ~200-300 MB, depending on tokenization settings

* **With Additional Preprocessing** (Hugging Face datasets library, cached on disk) → ~1-2 GB

In [2]:
!pip install -U transformers



In [3]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, Trainer
from datasets import load_dataset
import torch

model_name = "bert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

raw_datasets = load_dataset("imdb")

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

def format_dataset(dataset):
    dataset = dataset.remove_columns(["text"])
    dataset = dataset.rename_column("label", "labels")
    dataset.set_format("torch")
    return dataset

train_dataset = format_dataset(tokenized_datasets["train"]).select(range(1000))
test_dataset = format_dataset(tokenized_datasets["test"]).select(range(1000))

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

training_args = TrainingArguments(
    output_dir="./results",
    #evaluation_strategy="epoch",
    #save_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

trainer.train()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

README.md:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

unsupervised-00000-of-00001.parquet:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mstu2001261021[0m ([33meswi[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
10,0.1156
20,0.0059
30,0.0014
40,0.0007
50,0.0005
60,0.0004
70,0.0003
80,0.0003
90,0.0003
100,0.0003


TrainOutput(global_step=125, training_loss=0.010095012189820408, metrics={'train_runtime': 118.9531, 'train_samples_per_second': 8.407, 'train_steps_per_second': 1.051, 'total_flos': 263111055360000.0, 'train_loss': 0.010095012189820408, 'epoch': 1.0})

* Loss values are close, suggesting that the model isn't overfitting significantly.

* A low validation loss indicates the model is learning **effectively**.

* **Training Loss (Overall)**: 0.324 → is a reasonable value for a classification task.

* **Samples per Second**: 8.39 → The model is training efficiently given the dataset size.

* **Epochs Completed**: 1 → If performance isn’t satisfactory, running more epochs could improve results - wait long time.

In [4]:
!pip install transformers datasets



In [5]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, Trainer
from datasets import load_dataset
import torch

model_name = "google-t5/t5-small"

tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

raw_datasets = load_dataset("imdb")

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

def format_dataset(dataset):
    dataset = dataset.remove_columns(["text"])
    dataset = dataset.rename_column("label", "labels")
    dataset.set_format("torch")
    return dataset

train_dataset = format_dataset(tokenized_datasets["train"]).select(range(1000))
test_dataset = format_dataset(tokenized_datasets["test"]).select(range(1000))


model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

training_args = TrainingArguments(
    output_dir="./results",
    #evaluation_strategy="epoch",
    #save_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=1, # 0.5
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

trainer.train() # W&B_API_KEY

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

Some weights of T5ForSequenceClassification were not initialized from the model checkpoint at google-t5/t5-small and are newly initialized: ['classification_head.dense.bias', 'classification_head.dense.weight', 'classification_head.out_proj.bias', 'classification_head.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Step,Training Loss
10,1.0476
20,0.453
30,0.2372
40,0.1202
50,0.061
60,0.0335
70,0.0211
80,0.0146
90,0.014
100,0.0123


TrainOutput(global_step=125, training_loss=0.1630739126801491, metrics={'train_runtime': 75.5484, 'train_samples_per_second': 13.237, 'train_steps_per_second': 1.655, 'total_flos': 136151832576000.0, 'train_loss': 0.1630739126801491, 'epoch': 1.0})

# F1-score:

**F1** е широко използван показател за оценка на ефективността на класификационните модели, особено когато се работи с небалансирани набори от данни. Той комбинира прецизност и извикване в един показател чрез изчисляване на хармоничната средна стойност на тези две стойности. В контекста на големи езикови модели (LLM) като **T5**, който се използва за задачи за класификация на текст, резултатът F1 се изчислява по същия начин, както за други модели за машинно обучение.

### Изчисляване на резултат F1 за класификация на текст с LLM

За бинарна класификация резултатът F1 се изчислява като:

F1 = 2× $\frac{\text{Precision} \ \times \ \text{Recall}}{\text{Precision} \ + \ \text{Recall}}$

* **Прецизността** е частта от съответните екземпляри сред извлечените екземпляри:

Precision = $\frac{\text{True Positives}}{\text{True Positives} \ + \ \text{False Positives}}$

* **Извикването** е частта от съответните екземпляри, които са били извлечени:

Recall = $\frac{\text{True Positives}}{\text{True Positives} \ + \ \text{False Negatives}}$

### Как се използва F1 в LLM

Когато се използват LLM, като например flan-t5-small, F1 резултатът може да се изчисли чрез:

* **Изход на модела**: След като моделът прогнозира етикети за входните данни, изходът обикновено е набор от необработени логити (предсказания с реална стойност за всеки клас).

* **Argmax**: Прилагате *argmax*, за да преобразува тези логически стойности в предвидените етикети.

* **Изчисляване на F1 резултат**: Сравнява тези прогнозирани етикети с основната истина (действителни етикети) и след това изчислява точността и извиква, за да изчисли F1 резултата.

### Как се тълкува F1 score?

* **F1 от 1** показва перфектна прецизност и запомняне, което означава, че моделът се представя много добре.

* **F1 от 0** означава, че моделът има или нулева прецизност, или нулево припомняне, което показва лоша производителност.

* Обикновено добрия резултат на F1 е да бъде възможно най-близо до 1.

In [6]:
!pip install transformers datasets



Example F1 score:

In [7]:
from sklearn.metrics import f1_score

y_true = [0, 1, 2, 2, 2, 2, 1, 0, 2, 1, 0]
y_pred = [0, 0, 2, 2, 1, 2, 1, 0, 1, 2, 1]

f1_per_class = f1_score(y_true, y_pred, average=None)
f1_micro = f1_score(y_true, y_pred, average='micro')
f1_macro = f1_score(y_true, y_pred, average='macro')
f1_weighted = f1_score(y_true, y_pred, average='weighted')

print("F1 score per class:", f1_per_class)
print("Micro-average F1 score:", f1_micro)
print("Macro-average F1 score:", f1_macro)
print("Weighted-average F1 score:", f1_weighted)

F1 score per class: [0.66666667 0.28571429 0.66666667]
Micro-average F1 score: 0.5454545454545454
Macro-average F1 score: 0.5396825396825397
Weighted-average F1 score: 0.5627705627705627


LLM Example:

In [None]:
# OLD Do not run

'''from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, Trainer
from datasets import load_dataset
from sklearn.metrics import f1_score
import torch

model_name = "distilbert-base-uncased" # Classification model needed

tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=256)


raw_datasets = load_dataset("imdb")

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

def format_dataset(dataset):
    dataset = dataset.remove_columns(["text"])
    dataset = dataset.rename_column("label", "labels")
    dataset.set_format("torch")
    return dataset

train_dataset = format_dataset(tokenized_datasets["train"]).select(range(1000))
test_dataset = format_dataset(tokenized_datasets["test"]).select(range(1000))

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

# Define a compute_metrics function to calculate F1 score
def compute_metrics(p):
    predictions, labels = p
    preds = torch.argmax(predictions, dim=1)  # Convert logits to predicted labels
    return {"f1": f1_score(labels, preds, average="binary")}

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=1, # 8
    per_device_eval_batch_size=1, # 8
    gradient_accumulation_steps=2, # Accumulate gradients over 2 steps ; 4
    num_train_epochs=0.3,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
    fp16=True, # Enable mixed precision
    optim="adamw_torch",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,  # Pass the compute_metrics function
)

torch.cuda.empty_cache()  # Clear GPU memory before starting the training

trainer.train()'''

In [8]:
!pip install transformers datasets scikit-learn



In [9]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, Trainer
from datasets import load_dataset
from sklearn.metrics import f1_score
import torch

# DistilBERT – за класификация
model_name = "distilbert-base-uncased" # Classification model needed

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=256)

raw_datasets = load_dataset("imdb")

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

def format_dataset(dataset):
    dataset = dataset.remove_columns(["text"])
    dataset = dataset.rename_column("label", "labels")
    dataset.set_format("torch")
    return dataset

train_dataset = format_dataset(tokenized_datasets["train"]).select(range(1000))
test_dataset = format_dataset(tokenized_datasets["test"]).select(range(1000))

# Функция за метрика F1-score
def compute_metrics(p):
    predictions, labels = p
    preds = predictions.argmax(axis=1)  # numpy use
    return {"f1": f1_score(labels, preds, average="binary", zero_division=1)}

training_args = TrainingArguments(
    output_dir="./results",
    #evaluation_strategy="epoch",
    #save_strategy="epoch",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=2,
    num_train_epochs=3, # Make it 20
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
    fp16=True,
    optim="adamw_torch",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
)

# Изчистване на GPU кеша и старт на тренировката
torch.cuda.empty_cache()

trainer.train()

# Добавяне печат на примерни предсказания:
preds_output = trainer.predict(test_dataset)
preds = preds_output.predictions.argmax(axis=1)

print("Sample predictions:", preds[:20])
print("True labels:", preds_output.label_ids[:20])

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

Step,Training Loss
10,0.2824
20,0.0145
30,0.0026
40,0.0011
50,0.0006
60,0.0005
70,0.0004
80,0.0003
90,0.0002
100,0.0002


Sample predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
True labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]


Резултатът не е добър – но не защото моделът е "лош", а защото още не е научил нищо реално.

Ето какво значат показаните стойности:

 **Training Loss**: 0.000000 → това е нереалистично малко.

  **Validation Loss**: 0.000023 → твърде ниско и също не кореспондира с 0.0 F1.

Това обикновено се случва, когато:

* Има грешка в етикетите или форматирането на набора от данни.

* Моделът "заклещва" в предсказване само на един клас (най-често 0), което е често срещано при небалансирано обучение.

* F1-score се смята върху празни или сбъркани данни.

New train:

In [10]:
!pip install transformers datasets scikit-learn



In [11]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TrainingArguments, Trainer
from datasets import load_dataset, Dataset, concatenate_datasets
from sklearn.metrics import f1_score
from collections import Counter
import torch

model_name = "distilbert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=256)

raw_datasets = load_dataset("imdb")
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)

def format_dataset(dataset):
    dataset = dataset.remove_columns(["text"])
    dataset = dataset.rename_column("label", "labels")
    dataset.set_format("torch")
    return dataset

train_full = format_dataset(tokenized_datasets["train"])
test_full = format_dataset(tokenized_datasets["test"])

# Филтриране по клас и създаване на балансирани набори от данни
train_0 = train_full.filter(lambda x: x['labels'] == 0).select(range(500))
train_1 = train_full.filter(lambda x: x['labels'] == 1).select(range(500))
test_0 = test_full.filter(lambda x: x['labels'] == 0).select(range(250))
test_1 = test_full.filter(lambda x: x['labels'] == 1).select(range(250))

#train_dataset = Dataset.from_dict(train_0 + train_1)
#test_dataset = Dataset.from_dict(test_0 + test_1)
train_dataset = concatenate_datasets([train_0, train_1])
test_dataset = concatenate_datasets([test_0, test_1])

train_dataset = train_dataset.shuffle(seed=42)
test_dataset = test_dataset.shuffle(seed=42)

def compute_metrics(p):
    predictions, labels = p
    preds = predictions.argmax(axis=1)
    return {"f1": f1_score(labels, preds, average="binary", zero_division=1)}

training_args = TrainingArguments(
    output_dir="./results",
    #evaluation_strategy="epoch",
    #save_strategy="epoch",
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=2,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=10,
    fp16=True,
    optim="adamw_torch",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
)

torch.cuda.empty_cache()
trainer.train()

# Проверка на предсказания
preds_output = trainer.predict(test_dataset)
preds = preds_output.predictions.argmax(axis=1)

print("Sample predictions:", preds[:20])
print("True labels:", preds_output.label_ids[:20])

# Разпределение на класове
print("Predicted label counts:", Counter(preds))
print("True label counts:", Counter(preds_output.label_ids))

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Filter:   0%|          | 0/25000 [00:00<?, ? examples/s]

Filter:   0%|          | 0/25000 [00:00<?, ? examples/s]

Filter:   0%|          | 0/25000 [00:00<?, ? examples/s]

Filter:   0%|          | 0/25000 [00:00<?, ? examples/s]

Step,Training Loss
10,0.7397
20,0.7085
30,0.6531
40,0.643
50,0.547
60,0.4668
70,0.586
80,0.5255
90,0.5302
100,0.5196


Sample predictions: [1 1 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 1 0 1]
True labels: [1 0 1 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 0 1]
Predicted label counts: Counter({np.int64(0): 262, np.int64(1): 238})
True label counts: Counter({np.int64(1): 250, np.int64(0): 250})


F1-score по епохи:

Епохи |	Loss (train)	| Loss (val)	| F1-score

1 | 0.509 | 0.659 | 0.811

2	| 0.4151	| 0.666	| 0.842

3	| 0.002	| 0.752	| 0.847

> F1 ≈ 0.85 = отличен резултат, особено за 1000 train примера (балансирано)!

### Предсказания срещу реалност:

* **Predicted label counts**: {1: 234, 0: 266}

* **True label counts**: {1: 250, 0: 250}

* **Sample predictions**: добре смесени

Следователно:

1. Моделът не е "залепнал" в един клас.

2. Има добър баланс.

3. Обучението е успешно!

Какво следва:

Може да се добави confusion matrix и accuracy / precision / recall.

Може да се тестваме върху произволен текст като:

In [16]:
text = "This movie was really amazing and touching!"

inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=256)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)

pred = torch.argmax(outputs.logits, dim=1)

#print("Predicted label:", pred.item())  # 1 = positive, 0 = negative
if pred.item() == 1:
    print("Predicted label: positive")
else:
    print("Predicted label: negative")

Predicted label: positive


--------------------------

### Save the model locally:

In [None]:
model.save_pretrained("model_sentiment_name")
tokenizer.save_pretrained("model_sentiment_name")

### Upload the model in HF:

In [None]:
from huggingface_hub import notebook_login

notebook_login()

In [None]:
from huggingface_hub import HfApi, HfFolder
from transformers import push_to_hub

model.push_to_hub("distilbert-sentiment-MODEL-NAME")
tokenizer.push_to_hub("distilbert-sentiment-MODEL-NAME")

### HF Hub upload:

In [None]:
!pip install huggingface_hub

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from huggingface_hub import notebook_login

notebook_login()

# Локално запазване
model.save_pretrained("model_sentiment_name")
tokenizer.save_pretrained("model_sentiment_name")

# Качване в HuggingFace Hub
model.push_to_hub("distilbert-sentiment-MODEL-NAME")
tokenizer.push_to_hub("distilbert-sentiment-MODEL-NAME")

Моделът ще се появи на:

`https://huggingface.co/<YOUR_PROFILE_NAME>/distilbert-sentiment-MODEL-NAME`

Може да е направите публичен или частен по избор.

---------------------

### Зареждане обратно от Hugging Face:

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_name = "<YOUR_PROFILE_NAME>/distilbert-sentiment-MODEL-NAME"

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

---------------------

### Интерактивна демо-страница (Inference Widget)

Как се направи:

* Ще качим модела с нужните параметри.

* Ще добавим README.md файл, който автоматично създава демо-страница.

* Ще настроим Inference Widget за текстова класификация.

Upload:

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Запазване локално
model.save_pretrained("model_sentiment_name")
tokenizer.save_pretrained("model_sentiment_name")

# Качване в HuggingFace Hub
model.push_to_hub("distilbert-sentiment-MODEL-NAME", use_auth_token=True)
tokenizer.push_to_hub("distilbert-sentiment-MODEL-NAME", use_auth_token=True)

Добавяне на README.md файл с мета-данни:

In [None]:
# DO NOT RUN!
---
language: en
license: apache-2.0
tags:
- sentiment
- classification
- imdb
- distilbert
- transformers
datasets:
- imdb
inference: true
---

# DistilBERT Sentiment Classifier by NAME
This model is fine-tuned on a balanced subset of the IMDb dataset to classify **positive** or **negative** sentiment in movie reviews.

## How it works:
- Input: A sentence like "This movie was amazing!"
- Output: `positive` (label 1) or `negative` (label 0)

Try it in the widget below ⬇

Качване на README.md:

In [None]:
from huggingface_hub import HfApi

api = HfApi()
api.upload_file(
    path_or_fileobj="README.md",
    path_in_repo="README.md",
    repo_id="<YOUR_PROFILE_NAME>/distilbert-sentiment-MODEL-NAME",
    repo_type="model"
)

Публичен модел:

`https://huggingface.co/<YOUR_PROFILE_NAME>/distilbert-sentiment-MODEL-NAME/settings`

Избор: Make this repository public

Вграждане на модела на личен сайт, блог, портфолио или навсякъде, където има HTML достъп:

```<iframe
  src="https://huggingface.co/spaces/HF_NAME/distilbert-sentiment-MODEL-NAME"
  frameborder="0"
  width="100%"
  height="500">
  LLM
  </iframe>```
