# Глубокое обучение и обработка естественного языка

## Домашняя работа №6
Изучить [главу 3. Fine-tuning a pretrained model туториалов HuggingFace](https://huggingface.co/learn/nlp-course/chapter3/1)

1. Выбрать модель архитектуры BERT/GPT в HuggingFace hub для решения задач Text Classification / Text Generation
2. Выбрать набор данных для конкретной задачи из HuggingFace Datasets
3. Дообучить выбранную модель - 5 баллов
4. Сравнить качество до и после дообучения с учетом метрик, специфичных для выбранной задачи - 2 балла
5. Обеспечена воспроизводимость решения: зафиксированы random_state, ноутбук воспроизводится от начала до конца без ошибок - 2 балла
6. Соблюден code style на уровне pep8 и [On writing clean Jupyter notebooks](https://ploomber.io/blog/clean-nbs/) - 1 балл

In [None]:
!pip install datasets evaluate

!pip uninstall -y transformers accelerate
!pip install transformers accelerate

In [1]:
import pandas as pd
import numpy as np
import torch

from datasets import load_dataset
import evaluate
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer

In [2]:
# функция токенизации
def tokenize_function(examples, tokenizer):
  return tokenizer(examples["text"], padding="max_length", truncation=True)

# функция подсчета метрики
def compute_metrics(eval_preds):
  metric1 = evaluate.load("accuracy")
  metric2 = evaluate.load("precision")
  metric3 = evaluate.load("recall")
  metric4 = evaluate.load("f1")
  logits, labels = eval_preds
  predictions = np.argmax(logits, axis=-1)

  accuracy = metric1.compute(predictions=predictions, references=labels)
  precision = metric2.compute(predictions=predictions, references=labels, average='macro')
  recall = metric3.compute(predictions=predictions, references=labels, average='macro')
  f1 = metric4.compute(predictions=predictions, references=labels, average='macro')

  return {**accuracy, **precision, **recall, **f1}

In [3]:
# сводная таблица
results = pd.DataFrame([[0, 0, 0, 0], [0, 0, 0, 0]],
                       columns=["Accuracy", "Precision", "Recall", "F1"],
                       index=["Pre-trained", "Fine-tuned"])
SEED = 2023
model_name = "bert-base-cased"

## Загрузка датасета

In [4]:
dataset = load_dataset("dair-ai/emotion")

In [5]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenized_datasets = dataset.map(tokenize_function, batched=True, fn_kwargs={"tokenizer": tokenizer})

In [6]:
small_train_dataset = tokenized_datasets["train"].shuffle(seed=SEED).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=SEED).select(range(1000))

## Модель

In [7]:
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=6).to('cuda:0')

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [8]:
training_args = TrainingArguments(
    output_dir="./finetuned",         # The output directory
    overwrite_output_dir=True,        # Overwrite the content of the output dir
    num_train_epochs=40,              # number of training epochs
    per_device_train_batch_size=20,   # batch size for training
    per_device_eval_batch_size=32,    # batch size for evaluation
    warmup_steps=9,                   # number of warmup steps for learning rate scheduler
    gradient_accumulation_steps=5,
    logging_steps=1,
    evaluation_strategy="epoch"
)

In [9]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

### Pre-trained

In [None]:
pt_results = trainer.evaluate()

In [None]:
results['Accuracy']['Pre-trained'] = pt_results['eval_accuracy']
results['Precision']['Pre-trained'] = pt_results['eval_precision']
results['Recall']['Pre-trained'] = pt_results['eval_recall']
results['F1']['Pre-trained'] = pt_results['eval_f1']

### Fine-tuned

In [12]:
trainer.train();

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5737,1.612451,0.365,0.136927,0.183006,0.122246
2,1.4332,1.418502,0.533,0.189691,0.282746,0.221156
3,0.939,1.006642,0.634,0.476021,0.370444,0.345819
4,0.7256,0.796213,0.755,0.632486,0.571404,0.57463
5,0.5083,0.6464,0.792,0.821846,0.632939,0.638231
6,0.2304,0.570948,0.824,0.80596,0.741785,0.758838
7,0.1099,0.548462,0.848,0.804732,0.813959,0.807839


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


KeyboardInterrupt: ignored

In [None]:
ft_results = trainer.evaluate()

In [14]:
results['Accuracy']['Fine-tuned'] = ft_results['eval_accuracy']
results['Precision']['Fine-tuned'] = ft_results['eval_precision']
results['Recall']['Fine-tuned'] = ft_results['eval_recall']
results['F1']['Fine-tuned'] = ft_results['eval_f1']

## Итог

In [15]:
results

Unnamed: 0,Accuracy,Precision,Recall,F1
Pre-trained,0.08,0.013333,0.166667,0.024691
Fine-tuned,0.854,0.836933,0.796831,0.811512


In [17]:
!pip freeze > requirements.txt

In [16]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
classifier.evaluate()