<a href="https://colab.research.google.com/github/cenzhiming/LLM-quickstart/blob/main/transformers/fine-tune-quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hugging Face Transformers 微调训练入门

本示例将介绍基于 Transformers 实现模型微调训练的主要流程，包括：
- 数据集下载
- 数据预处理
- 训练超参数配置
- 训练评估指标设置
- 训练器基本介绍
- 实战训练
- 模型保存

## YelpReviewFull 数据集

**Hugging Face 数据集：[ YelpReviewFull ](https://huggingface.co/datasets/yelp_review_full)**

### 数据集摘要

Yelp评论数据集包括来自Yelp的评论。它是从Yelp Dataset Challenge 2015数据中提取的。

### 支持的任务和排行榜
文本分类、情感分类：该数据集主要用于文本分类：给定文本，预测情感。

### 语言
这些评论主要以英语编写。

### 数据集结构

#### 数据实例
一个典型的数据点包括文本和相应的标签。

来自YelpReviewFull测试集的示例如下：

```json
{
    'label': 0,
    'text': 'I got \'new\' tires from them and within two weeks got a flat. I took my car to a local mechanic to see if i could get the hole patched, but they said the reason I had a flat was because the previous patch had blown - WAIT, WHAT? I just got the tire and never needed to have it patched? This was supposed to be a new tire. \\nI took the tire over to Flynn\'s and they told me that someone punctured my tire, then tried to patch it. So there are resentful tire slashers? I find that very unlikely. After arguing with the guy and telling him that his logic was far fetched he said he\'d give me a new tire \\"this time\\". \\nI will never go back to Flynn\'s b/c of the way this guy treated me and the simple fact that they gave me a used tire!'
}
```

#### 数据字段

- 'text': 评论文本使用双引号（"）转义，任何内部双引号都通过2个双引号（""）转义。换行符使用反斜杠后跟一个 "n" 字符转义，即 "\n"。
- 'label': 对应于评论的分数（介于1和5之间）。

#### 数据拆分

Yelp评论完整星级数据集是通过随机选取每个1到5星评论的130,000个训练样本和10,000个测试样本构建的。总共有650,000个训练样本和50,000个测试样本。

## 下载数据集

In [2]:
!pip install --upgrade datasets fsspec
from datasets import load_dataset

dataset = load_dataset("yelp_review_full")

Collecting fsspec
  Using cached fsspec-2025.5.1-py3-none-any.whl.metadata (11 kB)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [3]:
dataset

DatasetDict({
    train: Dataset({
        features: ['label', 'text'],
        num_rows: 650000
    })
    test: Dataset({
        features: ['label', 'text'],
        num_rows: 50000
    })
})

In [4]:
dataset["train"][121]

{'label': 1,
 'text': "In general I do like Shake N' Steak, but this location is a hit or miss location!  You never know what kind of quality or service you're going to find here.  A friend and myself went a few weeks back after a movie and it had to be one of the worst trips there EVER!  You can't entirely blame the waitress since she was the only one there for the entire place...poor scheduling on the manager's part. However, while she can't be accountable for the slooooow service, she was accountable for both orders being incorrect.  The burgers were over cooked and the fries were soggie and the milkshake was runny at best...\\n\\nBy far my worst visit to Steak n' Shake!"}

In [5]:
import random
import pandas as pd
import datasets
from IPython.display import display, HTML

In [6]:
def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)

    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, datasets.ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
    display(HTML(df.to_html()))

In [7]:
show_random_elements(dataset["train"])

Unnamed: 0,label,text
0,3 stars,"So this place used to be a Fridays or some such thing.\n\nFirst up, the parking lot in this end of the plaza is a zoo. Traffic coming in off of Gilbert road and people trying to get to or from the In and Out and Chick-filet make it wildly dangerous.\n\nIf you survive the parking lot then its time to eat. I've been there twice and they couldn't manage to deliver hot fries either time. Of course you don't come to a Greek restaurant to eat fries. The \""Greek\"" food part of the menu is serviceable but nothing special. Ordered a gyro and after 3 tries they got it right (no lettuce, no tomatoes). Never got the fries hot, but eventually got the gyro right.\n\nProbably a 2.5 in reality but that wasn't a choice."
1,4 stars,And they serve breakfast all day. No Sunday brunch crowd either.
2,2 star,"*The following is an update of my last update. Another Yelp reviewer recently sent me a PM due to her curiosity as to why I didn't do or say anything about my experience at The Cal (while I was still a guest there), as well as in regards to what I'd suggest the hotel could've done to make my stay more satisfactory. This is an excerpt from my reply to her.*\n\nThere's a few aspects of my review that I didn't mention because I thought they were obvious, would be redundant and/or would represent more of a chaotic rambling rather than a concise summary. But apparently not, so... here they are:\n\n1. Without fail, I heard housekeeping doing their rounds prior to noon. The only reason why I mentioned them was because of all the noises I heard, they were the loudest.\n\n2. I stayed at a few other hotels in LV since last December, and I've never encountered a roach before - and a huge one at that. Not sure how this could be remedied, since if there's one - there's definitely more!\n\n3. My roomie really didn't have issues with the room's \""deficiences\"" the same way I did. In fact, he already had super low expectations of The Cal, expectations that even I thought were too harsh (he also doesn't have a roach phobia, and the used bath water from the clogged tub barely reached his ankles and therefore wasn't too bothersome. I'm about a foot shorter than him). Also, the room was under both our names but reserved by him. So anything I reported would've ultimately fallen on him rather than me.\n\n4. The number of luggage/belongings I mentioned are only my things. My roomie had a very big carry-on at the beginning of the trip and, like me, accumulated more things as the trip went on. The idea of getting Bellhop to assist with transporting our things from one tower to another sounds like a good idea, but you've forgotten about tipping. I would feel compelled to tip and - quite honestly - that's not something I'm thrilled about since I wouldn't have had to tip in the first place if the room had already been decent. Having maintenance fix the problem with the clogged bathroom also sounds like a good idea, but if for some reason we couldn't change rooms before then, I would feel compelled to actually remain in the room while said person was checking up on it, which could be hours. It's kind of like asking a repairman to come to your house to fix something... usually, you'd stay at home while he's doing it because you really can't say for sure if he'd steal something or even be able to / earnestly try to fix the problem."
3,4 stars,"This place was just so cute. It was really old fashioned which I like, the quality is so good and they don't,t feel the need to do some fancy and inappropriate refurbishment which I respect a lot. I had a beautiful homemade potato and leek soup and then a coronation chicken salad. My mum liked her brie and apple panini too. Yummy."
4,4 stars,"This is one of the first places I go to when I come back to visit family and friends. The seasonal beers are always great, the food never tastes bad, and the people that frequent it are usually pretty mixed. The only problem is parking on a busy night, hit the alley in the back and you might be able to find a spot. See you in December."
5,1 star,"My friends and I went to Taco Haus today for dinner because I was told by a classmate that it was new and it was good...Well I am thinking that he has never tasted good food because it was very far from good. We went because we like trying new places but the food lacked something...and not just 1 or 2 items but everything we had lacked something. First the chips, salsa and guac. The salsa was way too smoky and in reading different reviews I was disappointed to here that there was a green salsa as well as the too smoky brown salsa. The guac was good and so were the chips..and that was about it aside from the soda. We all order Carne asada tacos and the meat was very tough, there was wayyyy to much pickling on the tacos. I ordered the columbian rice as well because I usually like columbian rice. However in the past the columbian rice I have had has a little spice to it but this rice was so gross. It had absolutely no flavor what so every. We then got the chirizo empanadas and the best I have to say about them is the outer crust. The chirizo lacked something and I have tasted better chirizo back in my hometown of Milwaukee. Overall a huge disappointment for the price. Chipotle is much better for mexican. We will not return for sure. So disappointed too because I really like the brat haus"
6,1 star,How does a new battery take 3 hours to install? \nI had 1/2 a mind to push my dead car to Advance Auto
7,1 star,"Horrible service, mediocre food. The first sign of a problem should've been only a few tables full on a Friday night at 9 p.m. At first, our waiter was attentive. We ordered appetizers, soup, entree and dessert. Our waiter disappeared right after the appetizer. A busser filled our empty water glasses after we were finished with our entrees but literally, there was no server for at least 45 minutes! We both wanted more cocktails and wine but there was no one to place an order with. Only the person who brought out the food. No checking in, zilch, nonexistant service! This has never happened to us, whether it's at a Chile's level restaurant and certainly not at a finer dining place!!!\nThe hot and sour soup was the highlight of the meal. The entree flavors were bland and the dessert was a joke - prefrozen pastry puffs with ice cream in the middle - served hard as a rock, you'd be better off with a $3 dessert from Trader Joe's feezer aisle."
8,3 stars,"We just left here. A little disappointed in two things. The bartender Phil had a very negative energy for an Asian restaurant. He seemed unpleasant as if he didn't wanna be there or something and he showed that through his service. Secondly, the chicken teriyaki was over cooked. I felt that the place over all was a fun hip cool place to go to but something's need more attention."
9,3 stars,"I didn't have the food but the dessert was awesome, and the coffee was great too. We didn't even realize before coming here for the Desert Belle boat ride that there would be anyplace to eat at all, we were pleasantly surprised to find this place. \n\nWe came on a Thursday afternoon in February and it was pretty crowded, we had no reservation but only waited about 10 minutes for a table. Didn't care about a view (there's plenty of view outside and on the boat) and got a table outside overlooking the parking lot, but no complaints here since the scenery around was breathtaking. \n\nOur dessert came in huge dishes and we couldn't even finish it (well alright maybe we could). I ordered the brownie sundae, Mom got the strawberry shortcake with vanilla ice crearm, and my daughter got ice cream. We were impressed it was so good. We got a nice hostess and waitress and our service was just the right pace and we didn't feel rushed. \n\nI contemplated ordering the pulled pork sandwich, which I may next time, because I'm sure I'll be back. I wouldn't come here expecting a fancy and definitely not romantic waterfront restaurant, it's really for casual dining amongst beautiful scenery and tourists. Pair this with a boat trip on the Desert Belle."


## 预处理数据

下载数据集到本地后，使用 Tokenizer 来处理文本，对于长度不等的输入数据，可以使用填充（padding）和截断（truncation）策略来处理。

Datasets 的 `map` 方法，支持一次性在整个数据集上应用预处理函数。

下面使用填充到最大长度的策略，处理整个数据集：

In [9]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")


def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)


tokenized_datasets = dataset.map(tokenize_function, batched=True)

Map:   0%|          | 0/650000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [10]:
show_random_elements(tokenized_datasets["train"], num_examples=3)

Unnamed: 0,label,text,input_ids,token_type_ids,attention_mask
0,1 star,"Kitchen was very slow, all food did not come at same time 15 minutes from the first delivery of food until the last person got there. The food was extremely over cooked and cold \n\nWould never come back","[101, 18988, 1108, 1304, 3345, 117, 1155, 2094, 1225, 1136, 1435, 1120, 1269, 1159, 1405, 1904, 1121, 1103, 1148, 6779, 1104, 2094, 1235, 1103, 1314, 1825, 1400, 1175, 119, 1109, 2094, 1108, 4450, 1166, 13446, 1105, 2504, 165, 183, 165, 183, 2924, 6094, 5253, 1309, 1435, 1171, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]"
1,2 star,"I've only been to Philip Pelusi for haircuts. The salon offers a range of prices for cuts. \r\n\r\nIf you go for a cheaper cut, you're likely to get one of their stylists-in-training. Those cuts are a real gamble; more often than not, they're just bad.\r\n\r\nEven with the more expensive cuts, the stylists are more talk than ability. They know their products, but when it comes to actually cutting the hair, they're still no good.\r\n\r\nThe Philip Pelusi on Murray might yet be good for spa treatments and waxes; I haven't tried either of those. Go somewhere else for a good cut.","[101, 146, 112, 1396, 1178, 1151, 1106, 4367, 153, 1883, 1361, 1182, 1111, 1716, 12734, 1116, 119, 1109, 20310, 3272, 170, 2079, 1104, 7352, 1111, 7484, 119, 165, 187, 165, 183, 165, 187, 165, 183, 2240, 2087, 1128, 1301, 1111, 170, 17780, 2195, 117, 1128, 112, 1231, 2620, 1106, 1243, 1141, 1104, 1147, 188, 2340, 18286, 118, 1107, 118, 2013, 119, 4435, 7484, 1132, 170, 1842, 176, 16033, 132, 1167, 1510, 1190, 1136, 117, 1152, 112, 1231, 1198, 2213, 119, 165, 187, 165, 183, 165, 187, 165, 183, 2036, 7912, 1114, 1103, 1167, 5865, 7484, 117, 1103, 188, 2340, 18286, ...]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]"
2,1 star,"Garbage. 50 dollars for a buffet not offering lobster or King crab legs. Don't make me laugh, over priced hype is what i paid for. There are way better options within the Vegas strip for the fraction of the price. Don't get duped into the \""number 1 buffet\"" marketing gimic they post all over the strip because quite frankly i went with a group of 15 people, the majority of us were disappointed.\n\nFood: \nThe steak was bland, parts of it were completely raw. They offered small snow crab legs that were bad quality, it was disappointing. The price in my opinion doesn't justify the ratings, or the food at all and is a misrepresentation of what to expect. This Buffet would be better represented if the price of entry was around 20 dollars. The Food was mediocre for the price of 50,service was mediocre deserts selection was mediocre.\n\nConclusion. It's a mediocre buffet, save your self the money and the drive and stay on the strip. I'm telling you now, Go to wicked spoon, Paris, The Rio buffet, the Wynn. ANYWHERE but this place.","[101, 144, 1813, 22070, 119, 1851, 5860, 1111, 170, 171, 9435, 2105, 1136, 4733, 25338, 4832, 2083, 1137, 1624, 24121, 2584, 119, 1790, 112, 189, 1294, 1143, 4046, 117, 1166, 23812, 177, 16726, 1110, 1184, 178, 3004, 1111, 119, 1247, 1132, 1236, 1618, 6665, 1439, 1103, 6554, 6322, 1111, 1103, 13394, 1104, 1103, 3945, 119, 1790, 112, 189, 1243, 3840, 3537, 1154, 1103, 165, 107, 1295, 122, 171, 9435, 2105, 165, 107, 6213, 176, 4060, 1596, 1152, 2112, 1155, 1166, 1103, 6322, 1272, 2385, 27642, 178, 1355, 1114, 170, 1372, 1104, 1405, 1234, 117, 1103, 2656, 1104, 1366, 1127, 9333, ...]","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]"


### 数据抽样

使用 1000 个数据样本，在 BERT 上演示小规模训练（基于 Pytorch Trainer）

`shuffle()`函数会随机重新排列列的值。如果您希望对用于洗牌数据集的算法有更多控制，可以在此函数中指定generator参数来使用不同的numpy.random.Generator。

In [11]:
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(5000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(5000))

## 微调训练配置

### 加载 BERT 模型

警告通知我们正在丢弃一些权重（`vocab_transform` 和 `vocab_layer_norm` 层），并随机初始化其他一些权重（`pre_classifier` 和 `classifier` 层）。在微调模型情况下是绝对正常的，因为我们正在删除用于预训练模型的掩码语言建模任务的头部，并用一个新的头部替换它，对于这个新头部，我们没有预训练的权重，所以库会警告我们在用它进行推理之前应该对这个模型进行微调，而这正是我们要做的事情。

In [12]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### 训练超参数（TrainingArguments）

完整配置参数与默认值：https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments

源代码定义：https://github.com/huggingface/transformers/blob/v4.36.1/src/transformers/training_args.py#L161

**最重要配置：模型权重保存路径(output_dir)**

In [13]:
from transformers import TrainingArguments

model_dir = "models/bert-base-cased-finetune-yelp"

# logging_steps 默认值为500，根据我们的训练数据和步长，将其设置为100
training_args = TrainingArguments(output_dir=model_dir,
                                  per_device_train_batch_size=16,
                                  num_train_epochs=5,
                                  logging_steps=100)

In [14]:
# 完整的超参数配置
print(training_args)

TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
average_tokens_across_devices=False,
batch_eval_metrics=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_steps=None,
eval_strategy=IntervalStrategy.NO,
eval_use_gather_object=False,


### 训练过程中的指标评估（Evaluate)

**[Hugging Face Evaluate 库](https://huggingface.co/docs/evaluate/index)** 支持使用一行代码，获得数十种不同领域（自然语言处理、计算机视觉、强化学习等）的评估方法。 当前支持 **完整评估指标：https://huggingface.co/evaluate-metric**

训练器（Trainer）在训练过程中不会自动评估模型性能。因此，我们需要向训练器传递一个函数来计算和报告指标。

Evaluate库提供了一个简单的准确率函数，您可以使用`evaluate.load`函数加载

In [15]:
!pip install evaluate
import numpy as np
import evaluate

metric = evaluate.load("accuracy")




接着，调用 `compute` 函数来计算预测的准确率。

在将预测传递给 compute 函数之前，我们需要将 logits 转换为预测值（**所有Transformers 模型都返回 logits**）。

In [16]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

#### 训练过程指标监控

通常，为了监控训练过程中的评估指标变化，我们可以在`TrainingArguments`指定`evaluation_strategy`参数，以便在 epoch 结束时报告评估指标。

In [21]:
#!pip install --upgrade transformers
from transformers import TrainingArguments, Trainer
import transformers

print(transformers.__version__)



4.53.1


In [23]:
training_args = TrainingArguments(output_dir=model_dir,
                                  eval_strategy="epoch",
                                  per_device_train_batch_size=16,
                                  num_train_epochs=3,
                                  logging_steps=30)

## 开始训练

### 实例化训练器（Trainer）

`kernel version` 版本问题：暂不影响本示例代码运行

In [25]:
# trainer = Trainer(
#     model=model,
#     args=training_args,
#     train_dataset=tokenized_datasets["train"],
#     eval_dataset=tokenized_datasets["test"],
#     compute_metrics=compute_metrics,
# )

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    compute_metrics=compute_metrics,
)

## 使用 nvidia-smi 查看 GPU 使用

为了实时查看GPU使用情况，可以使用 `watch` 指令实现轮询：`watch -n 1 nvidia-smi`:

```shell
Every 1.0s: nvidia-smi                                                   Wed Dec 20 14:37:41 2023

Wed Dec 20 14:37:41 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:0D.0 Off |                    0 |
| N/A   64C    P0              69W /  70W |   6665MiB / 15360MiB |     98%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     18395      C   /root/miniconda3/bin/python                6660MiB |
+---------------------------------------------------------------------------------------+
```

In [None]:
trainer.train()



<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mtonyzhimingcen[0m ([33mtonyzhimingcen-student[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss


In [None]:
!nvidia-smi

Tue May 20 14:32:49 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   77C    P0             71W /   70W |    6504MiB /  15360MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [None]:
small_test_dataset = tokenized_datasets["test"].shuffle(seed=64).select(range(100))

In [None]:
trainer.evaluate(small_test_dataset)

{'eval_loss': 1.2956901788711548,
 'eval_accuracy': 0.64,
 'eval_runtime': 2.8591,
 'eval_samples_per_second': 34.976,
 'eval_steps_per_second': 4.547,
 'epoch': 3.0}

### 保存模型和训练状态

- 使用 `trainer.save_model` 方法保存模型，后续可以通过 from_pretrained() 方法重新加载
- 使用 `trainer.save_state` 方法保存训练状态

In [None]:
trainer.save_model(model_dir)

In [None]:
trainer.save_state()

In [None]:
# trainer.model.save_pretrained("./")

## Homework: 使用完整的 YelpReviewFull 数据集训练，看 Acc 最高能到多少