add multi classification application #2675

lugimzzz · 2022-06-28T09:58:16Z

PR types

Others

PR changes

Others

Description

新增多分类应用，完成训练，预测，静态图导出，裁剪等功能，目前支持onnxruntime、paddle serving、triton server三种部署方式。

以下是本项目主要代码结构及说明：

multi_class/
├── deploy # 部署
│   └── predictor # 导出ONNX模型并基于ONNXRuntime部署
│   │   ├── infer.py # ONNXRuntime推理部署示例
│   │   ├── predictor.py
│   │   └── README.md # 使用说明
│   ├── paddle_serving # 基于Paddle Serving 部署
│   │   ├──config.yml # 分类任务启动服务端的配置文件
│   │   ├──rpc_client.py # 分类任务发送pipeline预测请求的脚本
│   │   ├──service.py # 分类任务启动服务端的脚本
│   │   └── README.md # 使用说明
│   └── triton_serving # 基于Triton server部署
│       ├── README.md # 使用说明
│       ├── seqcls_grpc_client.py # 客户端测试代码
│       └── models # 部署模型
│           ├── seqcls
│           │   └── config.pbtxt
│           ├── seqcls_model
│           │   └──config.pbtxt
│           ├── seqcls_postprocess
│           │   ├── 1
│           │   │   └── model.py
│           │   └── config.pbtxt
│           └── tokenizer
│               ├── 1
│               │   └── model.py
│               └── config.pbtxt
├── train.py # 训练评估脚本
├── predict.py # 预测脚本
├── export_model.py # 动态图参数导出静态图参数脚本
├── utils.py # 工具函数脚本
├── metric.py # metric脚本
├── prune.py # 裁剪脚本
├── prune_trainer.py # 裁剪trainer脚本
├── prune_config.py # 裁剪训练参数配置
├── requirements.txt # 环境依赖
└── README.md # 使用说明

lugimzzz · 2022-07-01T16:00:25Z

新增了doccano标注

wawltor · 2022-07-04T02:41:19Z

applications/text_classification/doccano.md

+**以下标注示例用到的环境配置：**
+
+- doccano 1.6.2
+- pip (Python 3.8+)


这块可以先限定一下python的版本

已修改：

以下标注示例用到的环境配置：

Python 3.8+

doccano 1.6.2

wawltor · 2022-07-04T02:42:03Z

applications/text_classification/doccano.md

+- pip (Python 3.8+)
+
+在终端(terminal)运行以下命令行：
+```shell


根据code的规范，在代码的中的注释需要是英文，这块统一一下了

已修改：

在终端(terminal)使用pip安装doccano：

pip install doccano==1.6.2

安装完成后，运行以下命令行：

# Initialize database. doccano init # Create a super user. doccano createuser --username admin --password pass # Start a web server. doccano webserver --port 8000

在新的终端(terminal)运行如下命令行：

# Start the task queue to handle file upload/download. doccano task

wawltor · 2022-07-04T02:50:30Z

applications/text_classification/doccano.md

+# 启动任务队列来处理文件上传/下载。
+doccano task
+```
+在浏览器打开http://127.0.0.1:8000/，输入用户名和密码登录，开始使用doccano进行标注。doccano支持中文版本，可以点击右上角选择ZH(中文)。


这块的链接可以用markdown的链接方式

已修改：

在浏览器打开http://127.0.0.1:8000/，输入用户名和密码登录，开始使用doccano进行标注。doccano支持中文版本，可以点击右上角选择ZH(中文)。

wawltor · 2022-07-04T03:00:10Z

applications/text_classification/doccano.md

+<div align="center">
+    <img src=https://user-images.githubusercontent.com/63761690/175248039-ce1673f1-9b03-4804-b1cb-29e4b4193f86.png height=300 hspace='15'/>
+</div>
+对于层次分类任务的分类标签我们建议使用标签层次结构中叶结点标签路径作为标签，以上图的标签结构为例，我们建议使用`--`作为分隔符，分隔不同层之间的标签：


这块建议的分割符和UIE的分隔符一致，##符号

已修改默认为##

wawltor · 2022-07-04T03:06:01Z

applications/text_classification/doccano.py

+    if not os.path.exists(args.save_dir):
+        os.makedirs(args.save_dir)
+
+    if len(args.splits) != 0 and len(args.splits) != 2 and len(


这里len(args.splits) != 0 这个条件是不必要的

已删除，目前只允许划分成train/dev/test和train/dev两种

wawltor · 2022-07-04T03:09:22Z

applications/text_classification/doccano.py

+                count += 1
+        logger.info("Save %d examples to %s." % (count, save_path))
+
+    if len(args.splits) == 0:


有个疑问的话，不过不提供验证集，各个算法代码部分是不是就不能跑了？

对，所以现在划分必须有验证集

wawltor · 2022-07-04T03:18:31Z

applications/text_classification/multi_class/README.md

+输入待预测数据和数据标签对照列表，模型预测数据对应的标签
+
+启动预测：
+```shell


把中文改成英文

已修改：

使用默认数据进行预测：

python predict.py --params_path ./checkpoint/model_state.pdparams

也可以选择使用本地数据文件KUAKE_QIC/data.tsv进行预测：

python predict.py --params_path ./checkpoint/model_state.pdparams --dataset_dir KUAKE_QIC

出现类似问题的多处已同步修改

wawltor · 2022-07-04T03:24:20Z

applications/text_classification/multi_class/README.md

+使用裁剪功能需要安装 paddleslim 包
+
+```shell
+pip install paddleslim


这块后续可能要固定一下paddleslim的版本，最近paddleslim的版本变动还挺大的，API接口有一些改动

固定为我当前使用版本2.2.2

wawltor · 2022-07-04T03:27:00Z

applications/text_classification/multi_class/deploy/paddle_serving/README.md

+
+用已安装的paddle_serving_client将静态图参数模型转换成serving格式。如何使用[静态图导出脚本](../../export_model.py)将训练后的模型转为静态图模型详见[模型静态图导出](../../README.md)。
+
+```bash


这块的bash和shell统一一下

已全部统一为shell

wawltor · 2022-07-04T03:29:34Z

applications/text_classification/multi_class/deploy/predictor/predictor.py

+
+import six
+import os
+import numpy as np


这块的import关系注意一下顺序

已修改：

import os
import json
import time

import six
import numpy as np
import paddle

wawltor · 2022-07-04T03:30:51Z

applications/text_classification/multi_class/deploy/triton_serving/README.md

+
+### 安装Triton Server
+下载Triton Server镜像，并启动
+```


已修改为：

wawltor · 2022-07-04T03:31:01Z

applications/text_classification/multi_class/deploy/triton_serving/README.md

+### 进入容器并准备PaddleNLP环境
+整个服务的前后处理依赖PaddleNLP，需要在容器内安装相关python包
+
+```


同上已修改

…add_multi_classification

wawltor

LGTM

add_multi_classification

d084f18

lugimzzz added enhancement New feature or request text classification labels Jun 28, 2022

lugimzzz requested a review from wawltor June 28, 2022 09:58

lugimzzz self-assigned this Jun 28, 2022

lugimzzz and others added 9 commits June 29, 2022 09:38

add_multi_classification

cf6053b

add_multi_classification

2a3f729

add_multi_classification

c7b19d4

evaluate_on_test_and_dev

74a76a3

add_triton_server

78f694b

add_triton _server

23f00d6

add_triton_server

c40f438

Merge branch 'PaddlePaddle:develop' into develop

6032a94

add_doccano_for_text_classification

8c50325

wawltor reviewed Jul 4, 2022

View reviewed changes

lugimzzz added 4 commits July 4, 2022 07:15

modify_multi_classification

3e62ce2

Merge branch 'develop' of https://github.com/lugimzzz/PaddleNLP into …

b8aed02

…add_multi_classification

modify_multi_classification

a282342

modify_multi_classification

10d68b9

wawltor approved these changes Jul 4, 2022

View reviewed changes

wawltor merged commit 0ca48c5 into PaddlePaddle:develop Jul 4, 2022

lugimzzz deleted the add_multi_classification branch July 4, 2022 09:36

lugimzzz mentioned this pull request Aug 1, 2022

PaddleNLP 2.3.5 Release Note Candidate #2907

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add multi classification application #2675

add multi classification application #2675

lugimzzz commented Jun 28, 2022 •

edited

Loading

lugimzzz commented Jul 1, 2022

wawltor Jul 4, 2022

lugimzzz Jul 4, 2022

wawltor Jul 4, 2022

lugimzzz Jul 4, 2022

wawltor Jul 4, 2022

lugimzzz Jul 4, 2022

wawltor Jul 4, 2022

lugimzzz Jul 4, 2022

wawltor Jul 4, 2022

lugimzzz Jul 4, 2022

wawltor Jul 4, 2022

lugimzzz Jul 4, 2022

wawltor Jul 4, 2022

lugimzzz Jul 4, 2022

lugimzzz Jul 4, 2022

wawltor Jul 4, 2022

lugimzzz Jul 4, 2022

wawltor Jul 4, 2022

lugimzzz Jul 4, 2022

wawltor Jul 4, 2022

lugimzzz Jul 4, 2022

wawltor Jul 4, 2022

lugimzzz Jul 4, 2022

wawltor Jul 4, 2022

lugimzzz Jul 4, 2022

wawltor left a comment


		用已安装的paddle_serving_client将静态图参数模型转换成serving格式。如何使用[静态图导出脚本](../../export_model.py)将训练后的模型转为静态图模型详见[模型静态图导出](../../README.md)。

		```bash

add multi classification application #2675

add multi classification application #2675

Conversation

lugimzzz commented Jun 28, 2022 • edited Loading

PR types

PR changes

Description

lugimzzz commented Jul 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wawltor left a comment

Choose a reason for hiding this comment

lugimzzz commented Jun 28, 2022 •

edited

Loading