# Hugging Face Transformers 微调训练入门

本示例将介绍基于 Transformers 实现模型微调训练的主要流程，包括：
- 数据集下载
- 数据预处理
- 训练超参数配置
- 训练评估指标设置
- 训练器基本介绍
- 实战训练
- 模型保存

## YelpReviewFull 数据集

**Hugging Face 数据集：[ YelpReviewFull ](https://huggingface.co/datasets/yelp_review_full)**

### 数据集摘要

Yelp评论数据集包括来自Yelp的评论。它是从Yelp Dataset Challenge 2015数据中提取的。

### 支持的任务和排行榜
文本分类、情感分类：该数据集主要用于文本分类：给定文本，预测情感。

### 语言
这些评论主要以英语编写。

### 数据集结构

#### 数据实例
一个典型的数据点包括文本和相应的标签。

来自YelpReviewFull测试集的示例如下：

```json
{
    'label': 0,
    'text': 'I got \'new\' tires from them and within two weeks got a flat. I took my car to a local mechanic to see if i could get the hole patched, but they said the reason I had a flat was because the previous patch had blown - WAIT, WHAT? I just got the tire and never needed to have it patched? This was supposed to be a new tire. \\nI took the tire over to Flynn\'s and they told me that someone punctured my tire, then tried to patch it. So there are resentful tire slashers? I find that very unlikely. After arguing with the guy and telling him that his logic was far fetched he said he\'d give me a new tire \\"this time\\". \\nI will never go back to Flynn\'s b/c of the way this guy treated me and the simple fact that they gave me a used tire!'
}
```

#### 数据字段

- 'text': 评论文本使用双引号（"）转义，任何内部双引号都通过2个双引号（""）转义。换行符使用反斜杠后跟一个 "n" 字符转义，即 "\n"。
- 'label': 对应于评论的分数（介于1和5之间）。

#### 数据拆分

Yelp评论完整星级数据集是通过随机选取每个1到5星评论的130,000个训练样本和10,000个测试样本构建的。总共有650,000个训练样本和50,000个测试样本。

## 下载数据集

In [1]:
%env HF_ENDPOINT=https://hf-mirror.com
from datasets import load_dataset
dataset = load_dataset("yelp_review_full")
#1.load时出现Loading a dataset cached in a LocalFileSystem is not supported.
#解决方法：更新datasets pip install -U datasets

env: HF_ENDPOINT=https://hf-mirror.com


  from .autonotebook import tqdm as notebook_tqdm
Using the latest cached version of the module from /root/.cache/huggingface/modules/datasets_modules/datasets/yelp_review_full/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf (last modified on Thu Dec 28 22:50:48 2023) since it couldn't be found locally at yelp_review_full, or remotely on the Hugging Face Hub.


In [2]:
dataset

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 650000
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 50000
    })
})

In [3]:
dataset["train"][10]

{'label': 0,
 'text': "Owning a driving range inside the city limits is like a license to print money.  I don't think I ask much out of a driving range.  Decent mats, clean balls and accessible hours.  Hell you need even less people now with the advent of the machine that doles out the balls.  This place has none of them.  It is april and there are no grass tees yet.  BTW they opened for the season this week although it has been golfing weather for a month.  The mats look like the carpet at my 107 year old aunt Irene's house.  Worn and thread bare.  Let's talk about the hours.  This place is equipped with lights yet they only sell buckets of balls until 730.  It is still light out.  Finally lets you have the pit to hit into.  When I arrived I wasn't sure if this was a driving range or an excavation site for a mastodon or a strip mining operation.  There is no grass on the range. Just mud.  Makes it a good tool to figure out how far you actually are hitting the ball.  Oh, they are cash 

In [4]:
import random
import pandas as pd
import datasets
from IPython.display import display, HTML

In [5]:
def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [6]:
show_random_elements(dataset["train"])

Unnamed: 0,label,text
0,3 stars,"I know, I know... I'm reviewing AVN. \n\nWell even though I *worked* the event this year (2012) I did *experience* the event as well because I was given a full access pass to all three floors of the event. This year that included: The Joint, Muse Hall, and Festival Hall (the \""novelties\"" aka sex toys floor). AVN took place, for the first time, at Hard Rock Hotel. AVN noted they did this because they \""wanted to take over an entire casino\"". Well, they definitely did.\n\nI worked information booth and happened to notice a lot of people confused as to what they could and could not see. I often felt bad because people thought they could go somewhere, but were denied access. The line to get tickets was OUTSIDE of Hard Rock, in the front of the hotel, and from what I have heard, extended forever and the wait was up to two hours for some to get tickets.\n\nLook, I'd never willingly go to a porn convention, but since I had all access I figured I'd check it out. Come on, it's the PORN CONVENTION. Basically porn stars signing autographs, companies selling sexy items, and girls getting half naked. I'll admit my favorite part of the convention was actually the Sapphire Gentleman's Lounge, who had strippers getting down in the hallway between the floors.\n\nOf course, you can collect many goodies to take home with you and some are up for purchase as well. I also gained access to the International Trade Show Floor, which shows you just what's up in the land of erotica in different countries.\n\nThe Hard Rock for 4 days straight was taken over by porn stars, a lot of dudes, and people just curious, or people serious about sex, porn, and et cetera. \n\nI personally thought it was confusing for people though and cramped as could be. AVN should reconsider the location for next year."
1,2 star,"The only reason I'm not giving just one star is because our server did try hard to make our experience a good one. Unfortunately, our food took forever, it was cold, my eggs were overdone... AND we were sitting where I could see right into the kitchen. This allowed me to see another server drop toast on the floor, then pick it up, put it on a plate and serve it to the people behind us. I'm not sure if my upset stomach is my mindset or bad food. We should've gone to TJ's. Won't make that mistake again."
2,5 stars,"Picasso - Bellagio\nOverall \n5 Stars\n\nGold Medal - Cooked to Perfection \nSilver Medal - Perfect Ambiance\nBronze Medal - Great Service\n\nPicasso is located inside the Fabulous Bellagio Hotel. There are various artworks and numerous ceramics by the Spanish Master located in the restaurant. Try to request a terrace or window seat, so you can see the Bellagio Lake and the Bellagio water show. Imagine eating dinner and hearing the amazing music blasting while watching the water show. Could there be anything more fantastic than this??\n\nTasting and prix-fixe menus are $123 -$130. There is also an extra option for wine pairing with your meal. \n\nThe food here is amazing. It is perfection. Yes, perfection!! Imagine eating something that could not have been cooked or could not have tasted any better. That is Picasso for you. The shrimp was cooked perfectly and all the sauces and flavors were perfect. Nothing was too chewy or too dry or too salty. The oysters and foie gras were like Heaven in my mouth. The lobster option had an extra charge, but it was worth the extra money paid. My taste buds have never been so happy. Everything was PERFECT! \n\nThe ambiance is extremely upscale and romantic. If you can get a terrace or window seat, you have the best seats in the restaurant. \n\nService was excellent. Fast and attentive. I asked where the restroom was located and the server escorted me all the way to the restroom. When I returned, my table napkin was refolded for me and the seating area had looked like I just arrived. Then, here comes the server to help me push my chair in behind me. Why, thank you :) So fancy!\n\nThis place is very pricey, but I think it is perfection and worth every penny. I have been to many other upscale restaurants, where the food did not taste as good and was still pricey. I would rather save my pennies and have perfection here.\n\nHAPPY PICASSOING!!!"
3,2 star,"We came here tonight with a groupon deal, with my parent's in law... The food was good but the waiter was not friendly at all. Nobody except us were in the restaurant and the waiter didn't take time for us or make any smile. It seems that we were disturbing his peaceful evening. We suggest him to take another waiter/ess more friendly than him. \nThe restaurant smell the toilet cleaning product too...eurk\nHowever the food was good so... we would like the ambiance to be as good..."
4,2 star,"I want to like it. I really do. I've been twice, but each time I found the dishes bland ... or with a spiciness that was just spice, not flavor."
5,5 stars,"Stop the presses.\n\nSkip other nightclubs.\n\nNo (or super low) cover charge? No line? No snootiness? No exclusiveness? Essentially no uptight dress code? No police-like bouncers? No male/female ratio nonsense? Open all day and all night? Fun, crazy bartenders? Outdoors? Temperature-controlled even though it's outdoors? Constant dancing? Convenient location? A party-like atmosphere? Great music? Gaming opportunities? Down-to-earth crowd? Super fun people and staff? Excellent people-watching ops? Nice, affordable drinks? Spaciousness to move yet lots of people? FREE seats? This establishment shows that 99% of the other clubs are not worth the effort it takes to be there."
6,2 star,"Marginal experience. Food was not very good. Wait time to order was slow - appears they are understaffed as the front door man was also seating people, getting items from the back room, taking orders, bussing tables, & checking on customers. Steak was undercooked, biscuits were bland, etc. Overall, I'm not surprised because it is Denny's but I'm still a little disappointed."
7,4 stars,"daughter and i were in vegas in april this year,. we had a coupon for buy one get one free entree she had mushroom chicken and i had shrimp and chicken with pineapple.. the entree on both was fresh and good.. the plate was huge i am headed back to vegas june 4th.. and wil go back here.. yummmmmmmm"
8,5 stars,"I can't believe (not to say the aren't true) some of the other reviews on here. I went there just window shopping yesterday and I would have to say that they were the nicest and most helpful of the many dealerships we went to yesterday. Maybe it is under new management because the name is not Biddulph anymore. But they gave me a great estimate on my trade in, better than any other dealer we went to. When we said we weren't ready to buy, they didn't pressure us at all, just told us to come back or call anytime we have any questions. Seriously the best experience I've ever had at a car lot. It is worth mentioning that we worked with Bob the salesman, and that the manager/owner also talked with us for a bit. They were both great and I would honestly feel bad if I didn't leave a good review for them. If you've had a bad experience in the past here, I would recommend trying again, because I can't imagine them being as horrible as some of these reviews."
9,3 stars,"Our experience at Otto's was mixed. They brought the breadsticks that are imported from Italy. We ask for bread, which came with no butter or oil. They brought oil after we ask. I has the braised pork shoulder and would give it a five. The apple reduction was out of this world. My wife had spaghetti with pepper and butter. It was just average. I ordered Lemoncello and it was terrible. I sent back and the waiter said the bar tender did not shake it up. The second glass was better, but still not good. I would not recommend. We share the three flavored of Gelato and we only found two favors and they were just OK, again I wouldn't reorder.\nOur service was good. The prices were high, but for what I have seen of Las Vegas, they were about right."


## 预处理数据

下载数据集到本地后，使用 Tokenizer 来处理文本，对于长度不等的输入数据，可以使用填充（padding）和截断（truncation）策略来处理。

Datasets 的 `map` 方法，支持一次性在整个数据集上应用预处理函数。

下面使用填充到最大长度的策略，处理整个数据集：

In [9]:
from transformers import AutoTokenizer
%env HUGGINGFACE_CO_RESOLVE_ENDPOINT=https://hf-mirror.com

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")


def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)


tokenized_datasets = dataset.map(tokenize_function, batched=True)

env: HUGGINGFACE_CO_RESOLVE_ENDPOINT=https://hf-mirror.com


Map: 100%|██████████| 50000/50000 [00:10<00:00, 4587.94 examples/s]


In [13]:
show_random_elements(tokenized_datasets["train"], num_examples=1)

Unnamed: 0,label,text,input_ids,token_type_ids,attention_mask
0,3 stars,"pretty good crispy shredded beef tacos, the beans and rice were average nothing to scream about. the unlimited free chips and salsa was a plus. drink selection is okay did not see any alcohol on site. place looks a little run down i think it used to be a burger place. overall i wouldnt go out of my way to go there but if i was nearby and craving some tacos I would hit it up.","[101, 2785, 1363, 19501, 1183, 188, 8167, 23372, 1174, 14413, 27629, 13538, 117, 1103, 15154, 1105, 7738, 1127, 1903, 1720, 1106, 7015, 1164, 119, 1103, 22921, 1714, 13228, 1105, 21718, 3447, 1161, 1108, 170, 4882, 119, 3668, 4557, 1110, 3008, 1225, 1136, 1267, 1251, 6272, 1113, 1751, 119, 1282, 2736, 170, 1376, 1576, 1205, 178, 1341, 1122, 1215, 1106, 1129, 170, 171, 23872, 1282, 119, 2905, 178, 2010, 1204, 1301, 1149, 1104, 1139, 1236, 1106, 1301, 1175, 1133, 1191, 178, 1108, 2721, 1105, 172, 1611, 3970, 1199, 27629, 13538, 146, 1156, 1855, 1122, 1146, 119, 102, 0, 0, 0, 0, ...]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, ...]"


### 数据抽样

使用 1000 个数据样本，在 BERT 上演示小规模训练（基于 Pytorch Trainer）

`shuffle()`函数会随机重新排列列的值。如果您希望对用于洗牌数据集的算法有更多控制，可以在此函数中指定generator参数来使用不同的numpy.random.Generator。

In [18]:
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))

## 微调训练配置

### 加载 BERT 模型

警告通知我们正在丢弃一些权重（`vocab_transform` 和 `vocab_layer_norm` 层），并随机初始化其他一些权重（`pre_classifier` 和 `classifier` 层）。在微调模型情况下是绝对正常的，因为我们正在删除用于预训练模型的掩码语言建模任务的头部，并用一个新的头部替换它，对于这个新头部，我们没有预训练的权重，所以库会警告我们在用它进行推理之前应该对这个模型进行微调，而这正是我们要做的事情。

In [3]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

model.safetensors: 100%|██████████| 436M/436M [00:40<00:00, 10.7MB/s] 
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### 训练超参数（TrainingArguments）

完整配置参数与默认值：https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments

源代码定义：https://github.com/huggingface/transformers/blob/v4.36.1/src/transformers/training_args.py#L161

**最重要配置：模型权重保存路径(output_dir)**

In [4]:
from transformers import TrainingArguments

model_dir = "models/bert-base-cased"

# logging_steps 默认值为500，根据我们的训练数据和步长，将其设置为100
training_args = TrainingArguments(output_dir=f"{model_dir}/test_trainer",
                                  logging_dir=f"{model_dir}/test_trainer/runs",
                                  logging_steps=100)

In [5]:
# 完整的超参数配置
print(training_args)

TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=IntervalStrategy.NO,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=

### 训练过程中的指标评估（Evaluate)

**[Hugging Face Evaluate 库](https://huggingface.co/docs/evaluate/index)** 支持使用一行代码，获得数十种不同领域（自然语言处理、计算机视觉、强化学习等）的评估方法。 当前支持 **完整评估指标：https://huggingface.co/evaluate-metric**

训练器（Trainer）在训练过程中不会自动评估模型性能。因此，我们需要向训练器传递一个函数来计算和报告指标。 

Evaluate库提供了一个简单的准确率函数，您可以使用`evaluate.load`函数加载

In [6]:
import numpy as np
import evaluate

metric = evaluate.load("accuracy")

2023-12-29 12:14:30.747495: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Downloading builder script: 4.20kB [00:00, 4.89MB/s]



接着，调用 `compute` 函数来计算预测的准确率。

在将预测传递给 compute 函数之前，我们需要将 logits 转换为预测值（**所有Transformers 模型都返回 logits**）。

In [7]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

#### 训练过程指标监控

通常，为了监控训练过程中的评估指标变化，我们可以在`TrainingArguments`指定`evaluation_strategy`参数，以便在 epoch 结束时报告评估指标。

In [8]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(output_dir=f"{model_dir}/test_trainer",
                                  evaluation_strategy="epoch", 
                                  logging_dir=f"{model_dir}/test_trainer/runs",
                                  logging_steps=100)

## 开始训练

### 实例化训练器（Trainer）

`kernel version` 版本问题：暂不影响本示例代码运行

In [28]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


## 使用 nvidia-smi 查看 GPU 使用

为了实时查看GPU使用情况，可以使用 `watch` 指令实现轮询：`watch -n 1 nvidia-smi`:

```shell
Every 1.0s: nvidia-smi                                                   Wed Dec 20 14:37:41 2023

Wed Dec 20 14:37:41 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:0D.0 Off |                    0 |
| N/A   64C    P0              69W /  70W |   6665MiB / 15360MiB |     98%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     18395      C   /root/miniconda3/bin/python                6660MiB |
+---------------------------------------------------------------------------------------+
```

In [30]:
trainer.train()



Epoch,Training Loss,Validation Loss,Accuracy
1,1.5285,1.221993,0.482
2,1.1599,1.02991,0.562
3,0.9064,1.030973,0.569


TrainOutput(global_step=375, training_loss=1.1027792460123698, metrics={'train_runtime': 345.485, 'train_samples_per_second': 8.683, 'train_steps_per_second': 1.085, 'total_flos': 789354427392000.0, 'train_loss': 1.1027792460123698, 'epoch': 3.0})

In [31]:
small_test_dataset = tokenized_datasets["test"].shuffle(seed=64).select(range(100))

In [32]:
trainer.evaluate(small_test_dataset)

{'eval_loss': 1.0372836589813232,
 'eval_accuracy': 0.55,
 'eval_runtime': 2.9366,
 'eval_samples_per_second': 34.053,
 'eval_steps_per_second': 4.427,
 'epoch': 3.0}

### 保存模型和训练状态

- 使用 `trainer.save_model` 方法保存模型，后续可以通过 from_pretrained() 方法重新加载
- 使用 `trainer.save_state` 方法保存训练状态

In [34]:
trainer.save_model(f"{model_dir}/finetuned-trainer")

In [36]:
trainer.save_state()

## Homework: 使用完整的 YelpReviewFull 数据集训练，对比看 Acc 最高能到多少

In [4]:
%env HF_ENDPOINT=https://hf-mirror.com
%env HUGGINGFACE_CO_RESOLVE_ENDPOINT=https://hf-mirror.com

from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments
import numpy as np
import evaluate
from transformers import TrainingArguments, Trainer
from transformers import AutoModelForSequenceClassification

dataset = load_dataset("yelp_review_full")
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
model_dir = "models/bert-base-cased"
metric = evaluate.load("accuracy")
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)
training_args = TrainingArguments(output_dir=f"{model_dir}/test_trainer",
                                  evaluation_strategy="epoch", 
                                  save_steps=10000,
                                  save_total_limit=3,
                                  logging_dir=f"{model_dir}/test_trainer/runs",
                                  logging_steps=100)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    compute_metrics=compute_metrics
)

trainer.train(resume_from_checkpoint=True)

trainer.save_model(f"{model_dir}/finetuned-trainer")

trainer.save_state()

env: HF_ENDPOINT=https://hf-mirror.com
env: HUGGINGFACE_CO_RESOLVE_ENDPOINT=https://hf-mirror.com


You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
Using the latest cached version of the module from /root/.cache/huggingface/modules/datasets_modules/datasets/yelp_review_full/e8e18e19d7be9e75642fc66b198abadb116f73599ec89a69ba5dd8d1e57ba0bf (last modified on Thu Dec 28 22:50:48 2023) since it couldn't be found locally at yelp_review_full, or remotely on the Hugging Face Hub.
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using the latest cached version of the module from /root/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--accuracy/f887c0aab52c2d38e1f8a215681126379eca617f96c

Epoch,Training Loss,Validation Loss,Accuracy
1,0.8302,0.868247,0.63394
2,0.7659,0.839738,0.65258
3,0.6937,0.803514,0.66234
