# 一、项目背景

本项目主要基于PaddleNLP通过预训练模型NeZha在SMP2020微博情绪6分类数据集上进行微调从而完成6分类情感分析模型的搭建，使用前后端分离的方式搭建Web端交互平台，支持文本细粒度情感分类预测，具有前沿性和广泛的应用价值。

情感分析在当下信息产业时代具有重要作用：在舆情分析方面，通过对热点事件进行情感剖析，寻找情感原因，对政府了解民意，预防危害事件的发生具有一定的意义。

# 二、项目方案

## 总技术路线

基于PaddleNLP通过预训练模型NeZha在SMP2020微博情绪6分类数据集上进行微调从而完成6分类情感分析模型的搭建，自动识别文本中的情绪信息；

使用 Vue 和 FastAPI 实现 Web 前后端分离部署，更方便地实现对自然文本中地情感识别。

![技术路线](https://ai-studio-static-online.cdn.bcebos.com/855b6274ef794f57a4fe861b7438cdc96f63231b5cca4a839b361ae7f0cb2661)

## 运行环境要求

注意模型训练需要使用GPU环境



# 三、数据说明

数据来源：SMP2020微博情绪分类技术评测

本次使用数据集为SMP2020微博情绪分类技术评测数据集（SMP2020-EWECT）

该技术评测使用的标注数据集由哈尔滨工业大学社会计算与信息检索研究中心提供，原始数据源于新浪微博，由微热点大数据研究院提供，数据集分为两部分。

第一部分为通用微博数据集，该数据集内的微博内容是随机获取到微博内容，不针对特定的话题，覆盖的范围较广。

第二部分为疫情微博数据集，该数据集内的微博内容是在疫情期间使用相关关键字筛选获得的疫情微博，其内容与新冠疫情相关。

每条微博被标注为以下六个类别之一：neutral（无情绪）、happy（积极）、angry（愤怒）、sad（悲伤）、fear（恐惧）、surprise（惊奇）。

通用微博训练数据集包括27,768条微博，验证集包含2,000条微博，测试数据集包含5,000条微博。

疫情微博训练数据集包括8,606条微博，验证集包含2,000条微博，测试数据集包含3,000条微博。

## 数据集处理

In [2]:
# 通过pandas读取并处理数据
import pandas as pd

# 训练数据集
train1 = pd.read_csv('./usual_train.csv')
train2 = pd.read_csv('./virus_train.csv')

# 验证数据集
dev1 = pd.read_csv('./usual_eval_labeled.csv')
dev2 = pd.read_csv('./virus_eval_labeled.csv')

# 测试数据集
test1 = pd.read_csv('./usual_test_labeled.csv')
test2 = pd.read_csv('./virus_test_labeled.csv')

# 合并数据集
train = pd.concat([train1, train2])
dev = pd.concat([dev1, dev2])
test = pd.concat([test1, test2])

# 构造总数据集便于统计分析
total = pd.concat([train, dev, test])

total.info()
total.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 48374 entries, 0 to 2999
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   数据编号    48374 non-null  int64 
 1   文本      48372 non-null  object
 2   情绪标签    48374 non-null  object
dtypes: int64(1), object(2)
memory usage: 1.5+ MB


Unnamed: 0,数据编号,文本,情绪标签
0,1,气死姐姐了，快二是阵亡了吗，尼玛，一个半小时过去了也没上车,angry
1,2,妞妞啊，今天又承办了一个发文登记文号是126~嘻~么么哒~晚安哟,happy
2,3,这里还值得注意另一个事实，就是张鞠存原有一个东溪草堂为其读书处。,neutral
3,4,这在前华约国家(尤其是东德)使用R-73的首次联合演习期间，被一些北约组织的飞行员所证实。,neutral
4,5,TinyThief上wii了？！,surprise


In [3]:
# 将数据处理为text_a, label的格式便于进行统一处理
train['text_a'] = train['文本']
dev['text_a'] = dev['文本']
test['text_a'] = test['文本']
total['text_a'] = total['文本']

train['label'] = train['情绪标签']
dev['label'] = dev['情绪标签']
test['label'] = test['情绪标签']
total['label'] = total['情绪标签']

train = train[['text_a', 'label']]
dev = dev[['text_a', 'label']]
test = test[['text_a', 'label']]
total = total[['text_a', 'label']]

total.head()

Unnamed: 0,text_a,label
0,气死姐姐了，快二是阵亡了吗，尼玛，一个半小时过去了也没上车,angry
1,妞妞啊，今天又承办了一个发文登记文号是126~嘻~么么哒~晚安哟,happy
2,这里还值得注意另一个事实，就是张鞠存原有一个东溪草堂为其读书处。,neutral
3,这在前华约国家(尤其是东德)使用R-73的首次联合演习期间，被一些北约组织的飞行员所证实。,neutral
4,TinyThief上wii了？！,surprise


## 数据集检查

In [4]:
# 查看数据文件信息
total.info()
train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 48374 entries, 0 to 2999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   text_a  48372 non-null  object
 1   label   48374 non-null  object
dtypes: object(2)
memory usage: 1.1+ MB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 36374 entries, 0 to 8605
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   text_a  36372 non-null  object
 1   label   36374 non-null  object
dtypes: object(2)
memory usage: 852.5+ KB


In [5]:
# 可以发现训练数据中text_a存在缺失，直接清除缺失行
train = train.dropna(subset = ['text_a'])
total = total.dropna(subset = ['text_a'])

In [6]:
# 统计标签分布情况
total['label'].value_counts()

happy       13673
angry       12536
neutral      9615
sad          7269
surprise     2942
fear         2337
Name: label, dtype: int64

In [7]:
# 对处理后的数据进行存储
train.to_csv('train.csv', sep='\t', index = False)
dev.to_csv('dev.csv', sep='\t', index = False)
test.to_csv('test.csv', sep='\t', index = False)

# 四、基于PaddleNLP构建微情感分析模型

PaddleNLP 是飞桨自然语言处理开发库，具备 易用的文本领域API，多场景的应用示例、和 高性能分布式训练 三大特点，旨在提升飞桨开发者文本领域建模效率，旨在提升开发者在文本领域的开发效率，并提供丰富的NLP应用示例。


## 4.1 加载 NeZha 预训练模型

NEZHA是华为在预训练模型上的实践总结，它在BERT的基础上加了很多当下有用的优化，比如Functional Relative Positional Encoding、Whole Word Masking策略、混合精度训练和Lamb优化器。实验表明，NEZHA在多项具有代表性的NLU任务上均取得了不错的成绩。

![](https://ai-studio-static-online.cdn.bcebos.com/d295f95afd504f32897d2ad17c85cc3c1ee97b5b9f3a41b5b3e6ee2a142b0a66)


In [8]:
# 导入所需的第三方库
import math
import numpy as np
import os
import collections
from functools import partial
import random
import time
import inspect
import importlib
from tqdm import tqdm
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.io import IterableDataset
from paddle.utils.download import get_path_from_url

In [9]:
# 安装最新的paddlenlp
!pip install --upgrade paddlenlp

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting paddlenlp
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d4/6e/209c5b64d45ef0dfea06da9a597dab09b27293fd9d6da9c1064e50de5030/paddlenlp-2.4.4-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting uvicorn
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/96/f3/f39ac8ac3bdf356b4934b8f7e56173e96681f67ef0cd92bd33a5059fae9e/uvicorn-0.20.0-py3-none-any.whl (56 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.9/56.9 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
Collecting typer
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/0d/44/56c3f48d2bb83d76f5c970aef8e2c3ebd6a832f09e3621c5395371fe6999/typer-0.7.0-py3-none-any.whl (38 kB)
Collecting fastapi
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d8/09/ce090f6d53ce8b6335954488087210fa1e054

In [10]:
# 导入paddlenlp相关的包
import paddlenlp as ppnlp
from paddlenlp.data import JiebaTokenizer, Pad, Stack, Tuple, Vocab
from paddlenlp.datasets import MapDataset
from paddle.dataset.common import md5file
from paddlenlp.datasets import DatasetBuilder

In [None]:
# 调用华为的NeZha模型，同时因为本任务的情感分类是6类，设置num_classes = 6
model = ppnlp.transformers.NeZhaForSequenceClassification.from_pretrained('nezha-large-wwm-chinese', num_classes = 6)
tokenizer = ppnlp.transformers.NeZhaTokenizer.from_pretrained('nezha-large-wwm-chinese')

## 4.2 模型训练前置工作

In [12]:
# 定义数据集对应文件及其文件存储格式
class EmotionData(DatasetBuilder):
    SPLITS = {
        'train': 'train.csv',  # 训练集
        'dev': 'dev.csv',    # 验证集
        'test': 'test.csv',    # 测试集
    }

    def _get_data(self, mode, **kwargs):
        filename = self.SPLITS[mode]
        return filename

    def _read(self, filename):
        """读取数据"""
        with open(filename, 'r', encoding='utf-8') as f:
            head = None
            for line in f:
                data = line.strip().split("\t")    # 以'\t'分隔各列
                if not head:
                    head = data
                else:
                    text_a, label = data
                    yield {"text_a": text_a, "label": label}  # 数据的格式：text_a,label

    def get_labels(self):
        return label_list   # 类别标签

In [13]:
# 定义数据集对应文件及其文件存储格式
class EmotionData(DatasetBuilder):
    SPLITS = {
        'train': 'train.csv',  # 训练集
        'dev': 'dev.csv',    # 验证集
        'test': 'test.csv',    # 测试集
    }

    def _get_data(self, mode, **kwargs):
        filename = self.SPLITS[mode]
        return filename

    def _read(self, filename):
        """读取数据"""
        with open(filename, 'r', encoding='utf-8') as f:
            head = None
            for line in f:
                data = line.strip().split("\t")    # 以'\t'分隔各列
                if not head:
                    head = data
                else:
                    text_a, label = data
                    yield {"text_a": text_a, "label": label}  # 数据的格式：text_a,label

    def get_labels(self):
        return label_list   # 类别标签

In [16]:
# 定义数据集加载函数
def load_dataset(name=None,
                 data_files=None,
                 splits=None,
                 lazy=None,
                 **kwargs):
   
    reader_cls = EmotionData
    print(reader_cls)
    if not name:
        reader_instance = reader_cls(lazy=lazy, **kwargs)
    else:
        reader_instance = reader_cls(lazy=lazy, name=name, **kwargs)

    datasets = reader_instance.read_datasets(data_files=data_files, splits=splits)
    return datasets


In [17]:
# 定义数据加载和处理函数
def convert_example(example, tokenizer, max_seq_length=512, is_test=False):
    qtconcat = example["text_a"]
    encoded_inputs = tokenizer(text=qtconcat, max_seq_len=max_seq_length)
    input_ids = encoded_inputs["input_ids"]
    token_type_ids = encoded_inputs["token_type_ids"]

    if not is_test:
        label = np.array([example["label"]], dtype="int64")
        return input_ids, token_type_ids, label
    else:
        return input_ids, token_type_ids

# 数据加载函数dataloader
def create_dataloader(dataset,
                      mode='train',
                      batch_size=1,
                      batchify_fn=None,
                      trans_fn=None):
    if trans_fn:
        dataset = dataset.map(trans_fn)

    shuffle = True if mode == 'train' else False
    if mode == 'train':
        batch_sampler = paddle.io.DistributedBatchSampler(
            dataset, batch_size=batch_size, shuffle=shuffle)
    else:
        batch_sampler = paddle.io.BatchSampler(
            dataset, batch_size=batch_size, shuffle=shuffle)

    return paddle.io.DataLoader(
        dataset=dataset,
        batch_sampler=batch_sampler,
        collate_fn=batchify_fn,
        return_list=True)

## 4.3 配置预训练参数

可以通过对部分参数的简单修改，满足相关的训练要求

In [19]:
# 定义要进行分类的类别
label_list = ['angry', 'happy', 'neutral', 'surprise', 'sad', 'fear']
label_map = {idx: label for idx, label in enumerate(label_list)}
print(label_map)

{0: 'angry', 1: 'happy', 2: 'neutral', 3: 'surprise', 4: 'sad', 5: 'fear'}


In [20]:
# 加载训练集、验证集和测试集
train_ds, dev_ds, test_ds = load_dataset(splits=["train", "dev", "test"])

<class '__main__.EmotionData'>


In [21]:
batch_size = 96  #批处理大小，可根据训练环境条件，适当修改此项
max_seq_length = 128  #文本序列截断长度

# 将数据处理成模型可读入的数据格式
trans_func = partial(
    convert_example,
    tokenizer=tokenizer,
    max_seq_length=max_seq_length)

batchify_fn = lambda samples, fn=Tuple(
    Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input_ids
    Pad(axis=0, pad_val=tokenizer.pad_token_type_id),  # token_type_ids
    Stack()  # labels
): [data for data in fn(samples)]

# 训练集迭代器
train_data_loader = create_dataloader(
    train_ds,
    mode='train',
    batch_size=batch_size,
    batchify_fn=batchify_fn,
    trans_fn=trans_func)
# 验证集迭代器
dev_data_loader = create_dataloader(
    dev_ds,
    mode='dev',
    batch_size=batch_size,
    batchify_fn=batchify_fn,
    trans_fn=trans_func)
# 测试集迭代器
test_data_loader = create_dataloader(
    test_ds, 
    mode='test', 
    batch_size=batch_size, 
    batchify_fn=batchify_fn, 
    trans_fn=trans_func)

In [22]:
# 定义超参，loss，优化器等
from paddlenlp.transformers import LinearDecayWithWarmup

# 定义训练过程中的最大学习率
learning_rate = 2e-5
# 训练轮次
epochs = 3
# 学习率预热比例
warmup_proportion = 0.1
# 权重衰减系数，类似模型正则项策略，避免模型过拟合
weight_decay = 0.01

num_training_steps = len(train_data_loader) * epochs
lr_scheduler = LinearDecayWithWarmup(learning_rate, num_training_steps, warmup_proportion)

# AdamW优化器
optimizer = paddle.optimizer.AdamW(
    learning_rate=lr_scheduler,
    parameters=model.parameters(),
    weight_decay=weight_decay,
    apply_decay_param_fun=lambda x: x in [
        p.name for n, p in model.named_parameters()
        if not any(nd in n for nd in ["bias", "norm"])
    ])

criterion = paddle.nn.loss.CrossEntropyLoss()  # 交叉熵损失函数
metric = paddle.metric.Accuracy()  # accuracy评价指标

## 4.4 训练模型与评估

In [23]:
# 定义模型训练验证评估函数
@paddle.no_grad()
def evaluate(model, criterion, metric, data_loader):
    model.eval()
    metric.reset()
    losses = []
    for batch in data_loader:
        input_ids, token_type_ids, labels = batch
        logits = model(input_ids, token_type_ids)
        loss = criterion(logits, labels)
        losses.append(loss.numpy())
        correct = metric.compute(logits, labels)
        metric.update(correct)
        accu = metric.accumulate()
    print("eval loss: %.5f, accu: %.5f" % (np.mean(losses), accu))
    model.train()
    metric.reset()
    return accu  # 返回准确率

In [24]:
# 模型训练：
import paddle.nn.functional as F

save_dir = "checkpoint"
if not  os.path.exists(save_dir):
    os.makedirs(save_dir)

pre_accu=0
accu=0
global_step = 0
for epoch in range(1, epochs + 1):
    for step, batch in enumerate(train_data_loader, start=1):
        input_ids, segment_ids, labels = batch
        logits = model(input_ids, segment_ids)
        loss = criterion(logits, labels)
        probs = F.softmax(logits, axis=1)
        correct = metric.compute(probs, labels)
        metric.update(correct)
        acc = metric.accumulate()

        global_step += 1
        if global_step % 10 == 0 :
            print("global step %d, epoch: %d, batch: %d, loss: %.5f, acc: %.5f" % (global_step, epoch, step, loss, acc))
        loss.backward()
        optimizer.step()
        lr_scheduler.step()
        optimizer.clear_grad()
    # 每轮结束对验证集进行评估
    accu = evaluate(model, criterion, metric, dev_data_loader)
    print(accu)
    if accu > pre_accu:
        # 保存较上一轮效果更优的模型参数
        save_param_path = os.path.join(save_dir, 'model_state.pdparams')  # 保存模型参数
        paddle.save(model.state_dict(), save_param_path)
        pre_accu=accu
tokenizer.save_pretrained(save_dir)

global step 10, epoch: 1, batch: 10, loss: 1.75473, acc: 0.22604
global step 20, epoch: 1, batch: 20, loss: 1.70342, acc: 0.24375
global step 30, epoch: 1, batch: 30, loss: 1.69516, acc: 0.24306
global step 40, epoch: 1, batch: 40, loss: 1.71487, acc: 0.25729
global step 50, epoch: 1, batch: 50, loss: 1.62575, acc: 0.26813
global step 60, epoch: 1, batch: 60, loss: 1.63088, acc: 0.28993
global step 70, epoch: 1, batch: 70, loss: 1.46991, acc: 0.31920
global step 80, epoch: 1, batch: 80, loss: 1.45474, acc: 0.34622
global step 90, epoch: 1, batch: 90, loss: 1.32177, acc: 0.37326
global step 100, epoch: 1, batch: 100, loss: 1.33373, acc: 0.39823
global step 110, epoch: 1, batch: 110, loss: 1.10039, acc: 0.42055
global step 120, epoch: 1, batch: 120, loss: 1.06426, acc: 0.43724
global step 130, epoch: 1, batch: 130, loss: 1.13974, acc: 0.45256
global step 140, epoch: 1, batch: 140, loss: 1.07292, acc: 0.46659
global step 150, epoch: 1, batch: 150, loss: 0.96340, acc: 0.48000

[2022-12-08 00:21:04,525] [    INFO] - tokenizer config file saved in checkpoint/tokenizer_config.json
[2022-12-08 00:21:04,528] [    INFO] - Special tokens file saved in checkpoint/special_tokens_map.json


('checkpoint/tokenizer_config.json',
 'checkpoint/special_tokens_map.json',
 'checkpoint/added_tokens.json')

## 4.5 模型预测

In [25]:
# 定义6个分类类别
label_list = ['angry', 'happy', 'neutral', 'surprise', 'sad', 'fear']
label_map = {idx: label for idx, label in enumerate(label_list)}

In [30]:
# 定义模型预测函数
def predict(model, data, tokenizer, label_map, batch_size=1):
    examples = []
    for text in data:
        input_ids, segment_ids = convert_example(
            text,
            tokenizer,
            max_seq_length=128,
            is_test=True)
        examples.append((input_ids, segment_ids))

    batchify_fn = lambda samples, fn=Tuple(
        Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input id
        Pad(axis=0, pad_val=tokenizer.pad_token_id),  # segment id
    ): fn(samples)

    # Seperates data into some batches.
    batches = []
    one_batch = []
    for example in examples:
        one_batch.append(example)
        if len(one_batch) == batch_size:
            batches.append(one_batch)
            one_batch = []
    if one_batch:
        # The last batch whose size is less than the config batch_size setting.
        batches.append(one_batch)

    results = []
    model.eval()
    for batch in batches:
        input_ids, segment_ids = batchify_fn(batch)
        input_ids = paddle.to_tensor(input_ids)
        segment_ids = paddle.to_tensor(segment_ids)
        logits = model(input_ids, segment_ids)
        probs = F.softmax(logits, axis=1)
        idx = paddle.argmax(probs, axis=1).numpy()
        idx = idx.tolist()
        labels = [label_map[i] for i in idx]
        results.extend(labels)
    return results

In [None]:
# 导入NeZha模型
model = ppnlp.transformers.NeZhaForSequenceClassification.from_pretrained('nezha-large-wwm-chinese', num_classes=6)
tokenizer = ppnlp.transformers.NeZhaTokenizer.from_pretrained('nezha-large-wwm-chinese')

In [32]:
# 导入模型权重参数
params_path = 'checkpoint/model_state.pdparams'
if params_path and os.path.isfile(params_path):
    # 加载模型参数
    state_dict = paddle.load(params_path)
    model.set_dict(state_dict)
    print("已导入模型参数：", params_path)

已导入模型参数： checkpoint/model_state.pdparams


In [33]:
# 定义需要预测的语句
data = [
    # angry
    {"text_a": '更年期的女boss真的让人受不了，烦躁'},
    # fear
    {"text_a": '尼玛吓死我了，人家剪个头发回来跟劳改犯一样短的可怕，后面什么鬼[黑线][黑线][黑线][白眼][白眼]'},
    # neutral
    {"text_a": "这个村的年轻人大多数都出外打工。"},
    # surprise
    {"text_a": "我竟然才知道我有一个富二代加官二代加红二代的朋友"},
    # sad
    {"text_a": "江泽民同志逝世的消息让他十分心痛"},
    # happy
    {"text_a": "今天吃火锅，香死我了！！！"},
]

In [34]:
# 模型预测结果
results = predict(model, data, tokenizer, label_map, batch_size=1)
for idx, text in enumerate(data):
    print('语句: {} \t 情绪: {}'.format(text['text_a'], results[idx]))

语句: 更年期的女boss真的让人受不了，烦躁 	 情绪: angry
语句: 尼玛吓死我了，人家剪个头发回来跟劳改犯一样短的可怕，后面什么鬼[黑线][黑线][黑线][白眼][白眼] 	 情绪: fear
语句: 这个村的年轻人大多数都出外打工。 	 情绪: neutral
语句: 我竟然才知道我有一个富二代加官二代加红二代的朋友 	 情绪: surprise
语句: 江泽民同志逝世的消息让他十分心痛 	 情绪: sad
语句: 今天吃火锅，香死我了！！！ 	 情绪: happy


# 五、基于FastAPI和Vue实现Web可视化开发

**web部分需保存好"checkpoint/"内训练好地模型参数，本地部署使用。**


使用FastAPI需要使用pip工具(```pip install fastapi```)安装好相关依赖，详情可见[FastAPI文档](https://fastapi.tiangolo.com/zh/#_3)。



## 5.1 后端模型处理

```Python
# 导入所需的库
import paddle
import numpy as np
import paddle.nn.functional as F
from paddlenlp.data import Pad, Tuple
```

```python
# 格式化函数
def format_print(results, data):
    for idx, text in enumerate(data):
        print('语句: {} \t 情绪: {}'.format(text['text_a'], results[idx]))

def parseTodata(input_text):
    return [{"text_a": input_text}]
```

```python
# 定义数据加载和处理函数
def convert_example(example, tokenizer, max_seq_length=512, is_test=False):
    qtconcat = example["text_a"]
    encoded_inputs = tokenizer(text=qtconcat, max_seq_len=max_seq_length)
    input_ids = encoded_inputs["input_ids"]
    token_type_ids = encoded_inputs["token_type_ids"]

    if not is_test:
        label = np.array([example["label"]], dtype="int64")
        return input_ids, token_type_ids, label
    else:
        return input_ids, token_type_ids

# 定义模型预测函数
def predict(model, input_text, tokenizer, label_map, batch_size):
    data = parseTodata(input_text)
    examples = []
    for text in data:
        input_ids, segment_ids = convert_example(
            text,
            tokenizer,
            max_seq_length=128,
            is_test=True)
        examples.append((input_ids, segment_ids))

    batchify_fn = lambda samples, fn=Tuple(
        Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input id
        Pad(axis=0, pad_val=tokenizer.pad_token_id),  # segment id
    ): fn(samples)

    # Seperates data into some batches.
    batches = []
    one_batch = []
    for example in examples:
        one_batch.append(example)
        if len(one_batch) == batch_size:
            batches.append(one_batch)
            one_batch = []
    if one_batch:
        # The last batch whose size is less than the config batch_size setting.
        batches.append(one_batch)

    results = []
    model.eval()
    for batch in batches:
        input_ids, segment_ids = batchify_fn(batch)
        input_ids = paddle.to_tensor(input_ids)
        segment_ids = paddle.to_tensor(segment_ids)
        logits = model(input_ids, segment_ids)
        probs = F.softmax(logits, axis=1)
        idx = paddle.argmax(probs, axis=1).numpy()
        idx = idx.tolist()
        labels = [label_map[i] for i in idx]
        results.extend(labels)
    format_print(results, data)
    return  results
```

## 5.2 加载模型，启动后端服务

```python
# 导入所需的库
import os
import paddle
import paddlenlp as ppnlp
from fastapi import FastAPI, HTTPException, UploadFile
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import uvicorn
```

```python
# 定义6个分类类别
label_list = ['angry', 'happy', 'neutral', 'surprise', 'sad', 'fear']
label_map = {idx: label for idx, label in enumerate(label_list)}

# 导入NeZha模型
model = ppnlp.transformers.NeZhaForSequenceClassification.from_pretrained('nezha-large-wwm-chinese', num_classes=6)
tokenizer = ppnlp.transformers.NeZhaTokenizer.from_pretrained('nezha-large-wwm-chinese')
print(str.center("NeZha模型导入完毕",80,"="))

# 导入模型权重参数
params_path = '../checkpoint/model_state.pdparams'
if params_path and os.path.isfile(params_path):
    # 加载模型参数
    state_dict = paddle.load(params_path)
    model.set_dict(state_dict)
    print("Loaded parameters from %s" % params_path)
    print(str.center("训练模型权重加载完毕",80,"="))

# 模型预热
batch_size = 1
input_text = "今天吃火锅，香死我了！！！"
predict(model, input_text, tokenizer, label_map, batch_size)
print(str.center("模型预热完毕",80,"="))
print(str.center("正在启动Web服务",80,"="))

# 创建 FastAPI 实例
PublicSentimentAnalysis = FastAPI()

# 设置跨域
PublicSentimentAnalysis.add_middleware(
    CORSMiddleware,
    allow_origins=['*'],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# 单文本情感分析接口
@PublicSentimentAnalysis.get("/singleSentimentAnalysis/", status_code=200)
# 定义路径操作函数，当接口被访问将调用该函数
async def SingleSentimentAnalysis(text: str):
    try:
        # 获取用户输入的要进行属性级情感分析的文本内容
        input_text = text
        # 调用加载好的模型进行属性级情感分析
        singleAnalysisResult = predict(model, input_text, tokenizer, label_map, batch_size)
        # 接口结果返回
        results = {"message": "success", "inputText": input_text, "singleAnalysisResult": singleAnalysisResult[0]}
        return results
    # 异常处理
    except Exception as e:
        print("异常信息：", e)
        raise HTTPException(status_code=500, detail=str("请求失败，服务器端发生异常！异常信息提示：" + str(e)))

# 建立后端服务
# 本地部署服务运行后可以打开 http://localhost:8000/docs 进行接口调试
uvicorn.run(PublicSentimentAnalysis, host="127.0.0.1", port=8000)
```

后端API接口测试图

![后端接口调试图](https://ai-studio-static-online.cdn.bcebos.com/760f704d759b493eae1b2f388786f8e75dac988845d44137b531bb652af9c4d3)