-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add multi classification application #2675
add multi classification application #2675
Conversation
新增了doccano标注 |
**以下标注示例用到的环境配置:** | ||
|
||
- doccano 1.6.2 | ||
- pip (Python 3.8+) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块可以先限定一下python的版本
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改:
以下标注示例用到的环境配置:
- Python 3.8+
- doccano 1.6.2
- pip (Python 3.8+) | ||
|
||
在终端(terminal)运行以下命令行: | ||
```shell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
根据code的规范,在代码的中的注释需要是英文,这块统一一下了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改:
在终端(terminal)使用pip安装doccano:
pip install doccano==1.6.2
安装完成后,运行以下命令行:
# Initialize database.
doccano init
# Create a super user.
doccano createuser --username admin --password pass
# Start a web server.
doccano webserver --port 8000
在新的终端(terminal)运行如下命令行:
# Start the task queue to handle file upload/download.
doccano task
# 启动任务队列来处理文件上传/下载。 | ||
doccano task | ||
``` | ||
在浏览器打开http://127.0.0.1:8000/,输入用户名和密码登录,开始使用doccano进行标注。doccano支持中文版本,可以点击右上角选择ZH(中文)。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改:
在浏览器打开http://127.0.0.1:8000/,输入用户名和密码登录,开始使用doccano进行标注。doccano支持中文版本,可以点击右上角选择ZH(中文)。
<div align="center"> | ||
<img src=https://user-images.githubusercontent.com/63761690/175248039-ce1673f1-9b03-4804-b1cb-29e4b4193f86.png height=300 hspace='15'/> | ||
</div> | ||
对于层次分类任务的分类标签我们建议使用标签层次结构中叶结点标签路径作为标签,以上图的标签结构为例,我们建议使用`--`作为分隔符,分隔不同层之间的标签: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块建议的分割符和UIE的分隔符一致,##符号
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改默认为##
if not os.path.exists(args.save_dir): | ||
os.makedirs(args.save_dir) | ||
|
||
if len(args.splits) != 0 and len(args.splits) != 2 and len( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里len(args.splits) != 0 这个条件是不必要的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除,目前只允许划分成train/dev/test和train/dev两种
count += 1 | ||
logger.info("Save %d examples to %s." % (count, save_path)) | ||
|
||
if len(args.splits) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有个疑问的话,不过不提供验证集,各个算法代码部分是不是就不能跑了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对,所以现在划分必须有验证集
输入待预测数据和数据标签对照列表,模型预测数据对应的标签 | ||
|
||
启动预测: | ||
```shell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把中文改成英文
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改:
使用默认数据进行预测:
python predict.py --params_path ./checkpoint/model_state.pdparams
也可以选择使用本地数据文件KUAKE_QIC/data.tsv进行预测:
python predict.py --params_path ./checkpoint/model_state.pdparams --dataset_dir KUAKE_QIC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
出现类似问题的多处已同步修改
使用裁剪功能需要安装 paddleslim 包 | ||
|
||
```shell | ||
pip install paddleslim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块后续可能要固定一下paddleslim的版本,最近paddleslim的版本变动还挺大的,API接口有一些改动
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
固定为我当前使用版本2.2.2
|
||
用已安装的paddle_serving_client将静态图参数模型转换成serving格式。如何使用[静态图导出脚本](../../export_model.py)将训练后的模型转为静态图模型详见[模型静态图导出](../../README.md)。 | ||
|
||
```bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块的bash和shell统一一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已全部统一为shell
|
||
import six | ||
import os | ||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这块的import关系注意一下顺序
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改:
import os
import json
import time
import six
import numpy as np
import paddle
|
||
### 安装Triton Server | ||
下载Triton Server镜像,并启动 | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改为:
### 进入容器并准备PaddleNLP环境 | ||
整个服务的前后处理依赖PaddleNLP,需要在容器内安装相关python包 | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上已修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Others
Description
新增多分类应用,完成训练,预测,静态图导出,裁剪等功能,目前支持onnxruntime、paddle serving、triton server三种部署方式。
以下是本项目主要代码结构及说明: