Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add multi classification application #2675

Merged
merged 14 commits into from
Jul 4, 2022
Merged

add multi classification application #2675

merged 14 commits into from
Jul 4, 2022

Conversation

lugimzzz
Copy link
Contributor

@lugimzzz lugimzzz commented Jun 28, 2022

PR types

Others

PR changes

Others

Description

新增多分类应用,完成训练,预测,静态图导出,裁剪等功能,目前支持onnxruntime、paddle serving、triton server三种部署方式。

以下是本项目主要代码结构及说明:

multi_class/
├── deploy # 部署
│   └── predictor # 导出ONNX模型并基于ONNXRuntime部署
│   │   ├── infer.py # ONNXRuntime推理部署示例
│   │   ├── predictor.py
│   │   └── README.md # 使用说明
│   ├── paddle_serving # 基于Paddle Serving 部署
│   │   ├──config.yml # 分类任务启动服务端的配置文件
│   │   ├──rpc_client.py # 分类任务发送pipeline预测请求的脚本
│   │   ├──service.py # 分类任务启动服务端的脚本
│   │   └── README.md # 使用说明
│   └── triton_serving # 基于Triton server部署
│       ├── README.md # 使用说明
│       ├── seqcls_grpc_client.py # 客户端测试代码
│       └── models # 部署模型
│           ├── seqcls
│           │   └── config.pbtxt
│           ├── seqcls_model
│           │   └──config.pbtxt
│           ├── seqcls_postprocess
│           │   ├── 1
│           │   │   └── model.py
│           │   └── config.pbtxt
│           └── tokenizer
│               ├── 1
│               │   └── model.py
│               └── config.pbtxt
├── train.py # 训练评估脚本
├── predict.py # 预测脚本
├── export_model.py # 动态图参数导出静态图参数脚本
├── utils.py # 工具函数脚本
├── metric.py # metric脚本
├── prune.py # 裁剪脚本
├── prune_trainer.py # 裁剪trainer脚本
├── prune_config.py # 裁剪训练参数配置
├── requirements.txt # 环境依赖
└── README.md # 使用说明

@lugimzzz lugimzzz added enhancement New feature or request text classification labels Jun 28, 2022
@lugimzzz lugimzzz requested a review from wawltor June 28, 2022 09:58
@lugimzzz lugimzzz self-assigned this Jun 28, 2022
@lugimzzz
Copy link
Contributor Author

lugimzzz commented Jul 1, 2022

新增了doccano标注

**以下标注示例用到的环境配置:**

- doccano 1.6.2
- pip (Python 3.8+)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块可以先限定一下python的版本

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改:

以下标注示例用到的环境配置:

  • Python 3.8+
  • doccano 1.6.2

- pip (Python 3.8+)

在终端(terminal)运行以下命令行:
```shell
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

根据code的规范,在代码的中的注释需要是英文,这块统一一下了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改:

在终端(terminal)使用pip安装doccano:

pip install doccano==1.6.2

安装完成后,运行以下命令行:

# Initialize database.
doccano init
# Create a super user.
doccano createuser --username admin --password pass
# Start a web server.
doccano webserver --port 8000

在新的终端(terminal)运行如下命令行:

# Start the task queue to handle file upload/download.
doccano task

# 启动任务队列来处理文件上传/下载。
doccano task
```
在浏览器打开http://127.0.0.1:8000/,输入用户名和密码登录,开始使用doccano进行标注。doccano支持中文版本,可以点击右上角选择ZH(中文)。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

这块的链接可以用markdown的链接方式

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改:

在浏览器打开http://127.0.0.1:8000/,输入用户名和密码登录,开始使用doccano进行标注。doccano支持中文版本,可以点击右上角选择ZH(中文)。

<div align="center">
<img src=https://user-images.githubusercontent.com/63761690/175248039-ce1673f1-9b03-4804-b1cb-29e4b4193f86.png height=300 hspace='15'/>
</div>
对于层次分类任务的分类标签我们建议使用标签层次结构中叶结点标签路径作为标签,以上图的标签结构为例,我们建议使用`--`作为分隔符,分隔不同层之间的标签:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块建议的分割符和UIE的分隔符一致,##符号

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改默认为##

if not os.path.exists(args.save_dir):
os.makedirs(args.save_dir)

if len(args.splits) != 0 and len(args.splits) != 2 and len(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里len(args.splits) != 0 这个条件是不必要的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除,目前只允许划分成train/dev/test和train/dev两种

count += 1
logger.info("Save %d examples to %s." % (count, save_path))

if len(args.splits) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有个疑问的话,不过不提供验证集,各个算法代码部分是不是就不能跑了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对,所以现在划分必须有验证集

输入待预测数据和数据标签对照列表,模型预测数据对应的标签

启动预测:
```shell
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把中文改成英文

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改:

使用默认数据进行预测:

python predict.py --params_path ./checkpoint/model_state.pdparams

也可以选择使用本地数据文件KUAKE_QIC/data.tsv进行预测:

python predict.py --params_path ./checkpoint/model_state.pdparams --dataset_dir KUAKE_QIC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

出现类似问题的多处已同步修改

使用裁剪功能需要安装 paddleslim 包

```shell
pip install paddleslim
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块后续可能要固定一下paddleslim的版本,最近paddleslim的版本变动还挺大的,API接口有一些改动

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

固定为我当前使用版本2.2.2


用已安装的paddle_serving_client将静态图参数模型转换成serving格式。如何使用[静态图导出脚本](../../export_model.py)将训练后的模型转为静态图模型详见[模型静态图导出](../../README.md)。

```bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块的bash和shell统一一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已全部统一为shell


import six
import os
import numpy as np
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块的import关系注意一下顺序

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改:

import os
import json
import time

import six
import numpy as np
import paddle


### 安装Triton Server
下载Triton Server镜像,并启动
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改为:

### 进入容器并准备PaddleNLP环境
整个服务的前后处理依赖PaddleNLP,需要在容器内安装相关python包

```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上已修改

Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wawltor wawltor merged commit 0ca48c5 into PaddlePaddle:develop Jul 4, 2022
@lugimzzz lugimzzz deleted the add_multi_classification branch July 4, 2022 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants