From 423eb548a166427327da6102e99c5b230fbdc4db Mon Sep 17 00:00:00 2001 From: Huijuan Wang Date: Fri, 13 Jan 2023 15:56:26 +0800 Subject: [PATCH 1/4] [UTC] Add English documents --- .../zero_shot_text_classification/README.md | 4 +- .../README_en.md | 247 ++++++++++++++++++ .../label_studio_text.md | 4 +- .../label_studio_text_en.md | 135 ++++++++++ 4 files changed, 387 insertions(+), 3 deletions(-) create mode 100644 applications/zero_shot_text_classification/README_en.md create mode 100644 applications/zero_shot_text_classification/label_studio_text_en.md diff --git a/applications/zero_shot_text_classification/README.md b/applications/zero_shot_text_classification/README.md index 38d8fd89b912..7bf3adecd2f4 100644 --- a/applications/zero_shot_text_classification/README.md +++ b/applications/zero_shot_text_classification/README.md @@ -156,7 +156,9 @@ python -u -m paddle.distributed.launch --gpus "0,1" run_train.py \ * `seed`:全局随机种子,默认为 42。 * `model_name_or_path`:进行 few shot 训练使用的预训练模型。默认为 "utc-large"。 * `output_dir`:必须,模型训练或压缩后保存的模型目录;默认为 `None` 。 -* `dev_path`:开发集路径;默认为 `None` 。 +* `dataset_path`:数据集文件所在目录;默认为 `./data/` 。 +* `train_file`:训练集后缀;默认为 `train.txt` 。 +* `dev_file`:开发集后缀;默认为 `dev.txt` 。 * `max_seq_len`:文本最大切分长度,包括标签的输入超过最大长度时会对输入文本进行自动切分,标签部分不可切分,默认为512。 * `per_device_train_batch_size`:用于训练的每个 GPU 核心/CPU 的batch大小,默认为8。 * `per_device_eval_batch_size`:用于评估的每个 GPU 核心/CPU 的batch大小,默认为8。 diff --git a/applications/zero_shot_text_classification/README_en.md b/applications/zero_shot_text_classification/README_en.md new file mode 100644 index 000000000000..86f1c7b30b47 --- /dev/null +++ b/applications/zero_shot_text_classification/README_en.md @@ -0,0 +1,247 @@ +# Zero-shot Text Classification + +**Table of contents** +- [1. Zero-shot Text Classification Application](#1) +- [2. Quick Start](#2) + - [2.1 Code Structure](#21) + - [2.2 Data Annotation](#22) + - [2.3 Finetuning](#23) + - [2.4 Evaluation](#24) + - [2.5 Inference](#25) + - [2.6 Deployment](#26) + - [2.7 Experiments](#27) + + + +## 1. Zero-shot Text Classification + +This project provides an end-to-end application solution for universal text classification based on Universal Task Classification (UTC) finetuning and goes through the full lifecycle of **data labeling, model training and model deployment**. We hope this guide can help you apply Text Classification techniques with zero-shot ability in your own products or models. + +Text Classification refers to assigning a set of categories to given input text. Despite the advantages of tuning, applying text classification techniques in practice remains a challenge due to domain adaption and lack of labeled data, etc. This PaddleNLP Zero-shot Text Classification Guide builds on our UTC from the Unified Semantic Matching (USM) model series and provides an industrial-level solution that supports universal text classification tasks, including but not limited to **semantic analysis, semantic matching, intention recognition and event detection**, allowing you accomplish multiple tasks with a single model. Besides, our method brings good generation performance through multi-task pretraining. + +**Highlights:** + +- **Comprehensive Coverage**🎓: Covers various mainstream tasks of text classification, including but not limited to semantic analysis, semantic matching, intention recognition and event detection. + +- **State-of-the-Art Performance**🏃: Strong performance from the UTC model. with good zero-shot performance and practicable few-shot ability. + +- **Easy to use**⚡: Three lines of code to use our Taskflow for out-of-box Zero-shot Text Classification capability. One line of command to model training and model deployment. + +- **Efficient Tuning**✊: Developers can easily get started with the data labeling and model training process without a background in Machine Learning. + + + +## 2. Quick start + +For quick start, you can directly use ```paddlenlp.Taskflow``` out-of-the-box, leveraging the zero-shot performance. For production use cases, we recommend labeling a small amount of data for model fine-tuning to further improve the performance. + + + +### 2.1 Code structure + +```shell +. +├── deploy/simple_serving/ # model deployment script +├── utils.py # data processing tools +├── run_train.py # model fine-tuning script +├── run_eval.py # model evaluation script +├── label_studio.py # data format conversion script +├── label_studio_text.md # data annotation instruction +└── README.md +``` + + +### 2.2 Data labeling + +We recommend using [Label Studio](https://labelstud.io/) for data labeling. You can export labeled data in Label Studio and convert them into the required input format. Please refer to [Label Studio Data Labeling Guide](./label_studio_text_en.md) for more details. + +Here we provide a pre-labeled example dataset `Medical Question Intent Classification Dataset`, which you can download with the following command. We will show how to use the data conversion script to generate training/validation/test set files for fine-tuning. + +Download the medical question intent classification dataset: + +```shell +wget https://bj.bcebos.com/paddlenlp/datasets/utc-medical.tar.gz +tar -xvf utc-medical.tar.gz +mv utc-medical data +rm utc-medical.tar.gz +``` + +Generate training/validation set files: + +```shell +python label_studio.py \ + --label_studio_file ./data/label_studio.json \ + --save_dir ./data \ + --splits 0.8 0.1 0.1 \ + --options ./data/label.txt +``` + +For multi-task training, you can convert data with script seperately and move them to the same directory. + + + +### 2.3 Finetuning + +Use the following command to fine-tune the model using `utc-large` as the pre-trained model, and save the fine-tuned model to `./checkpoint/model_best/`: + +Single GPU: + +```shell +python run_train.py \ + --device gpu \ + --logging_steps 10 \ + --save_steps 100 \ + --eval_steps 100 \ + --seed 1000 \ + --model_name_or_path utc-large \ + --output_dir ./checkpoint/model_best \ + --dataset_path ./data/ \ + --max_seq_length 512 \ + --per_device_train_batch_size 2 \ + --per_device_eval_batch_size 2 \ + --gradient_accumulation_steps 8 \ + --num_train_epochs 20 \ + --learning_rate 1e-5 \ + --do_train \ + --do_eval \ + --do_export \ + --export_model_dir ./checkpoint/model_best \ + --overwrite_output_dir \ + --disable_tqdm True \ + --metric_for_best_model macro_f1 \ + --load_best_model_at_end True \ + --save_total_limit 1 +``` + +Multiple GPUs: + +```shell +python -u -m paddle.distributed.launch --gpus "0,1" run_train.py \ + --device gpu \ + --logging_steps 10 \ + --save_steps 100 \ + --eval_steps 100 \ + --seed 1000 \ + --model_name_or_path utc-large \ + --output_dir ./checkpoint/model_best \ + --dataset_path ./data/ \ + --max_seq_length 512 \ + --per_device_train_batch_size 2 \ + --per_device_eval_batch_size 2 \ + --gradient_accumulation_steps 8 \ + --num_train_epochs 20 \ + --learning_rate 1e-5 \ + --do_train \ + --do_eval \ + --do_export \ + --export_model_dir ./checkpoint/model_best \ + --overwrite_output_dir \ + --disable_tqdm True \ + --metric_for_best_model macro_f1 \ + --load_best_model_at_end True \ + --save_total_limit 1 +``` + +Parameters: + +* `device`: Training device, one of 'cpu' and 'gpu' can be selected; the default is GPU training. +* `logging_steps`: The interval steps of log printing during training, the default is 10. +* `save_steps`: The number of interval steps to save the model checkpoint during training, the default is 100. +* `eval_steps`: The number of interval steps to save the model checkpoint during training, the default is 100. +* `seed`: global random seed, default is 42. +* `model_name_or_path`: The pre-trained model used for few shot training. Defaults to "utc-large". +* `output_dir`: Required, the model directory saved after model training or compression; the default is `None`. +* `dataset_path`: The directory to dataset; defaults to `./data`. +* `train_file`: Training file name; defaults to `train.txt`. +* `dev_file`: Development file name; defaults to `dev.txt`. +* `max_seq_len`: The maximum segmentation length of the text and label candidates. When the input exceeds the maximum length, the input text will be automatically segmented. The default is 512. +* `per_device_train_batch_size`: The batch size of each GPU core/CPU used for training, the default is 8. +* `per_device_eval_batch_size`: Batch size per GPU core/CPU for evaluation, default is 8. +* `num_train_epochs`: Training rounds, 100 can be selected when using early stopping method; the default is 10. +* `learning_rate`: The maximum learning rate for training, UTC recommends setting it to 1e-5; the default value is 3e-5. +* `do_train`: Whether to perform fine-tuning training, setting this parameter means to perform fine-tuning training, and it is not set by default. +* `do_eval`: Whether to evaluate, setting this parameter means to evaluate, the default is not set. +* `do_export`: Whether to export, setting this parameter means to export static graph, and it is not set by default. +* `export_model_dir`: Static map export address, the default is `./checkpoint/model_best`. +* `overwrite_output_dir`: If `True`, overwrite the contents of the output directory. If `output_dir` points to a checkpoint directory, use it to continue training. +* `disable_tqdm`: Whether to use tqdm progress bar. +* `metric_for_best_model`: Optimal model metric, UTC recommends setting it to `macro_f1`, the default is None. +* `load_best_model_at_end`: Whether to load the best model after training, usually used in conjunction with `metric_for_best_model`, the default is False. +* `save_total_limit`: If this parameter is set, the total number of checkpoints will be limited. Remove old checkpoints `output directory`, defaults to None. + + + +### 2.4 Evaluation + +Model evaluation: + +```shell +python evaluate.py \ + --model_path ./checkpoint/model_best \ + --test_path ./data/test.txt \ + --per_device_eval_batch_size 2 \ + --max_seq_len 512 \ + --output_dir ./checkpoint_test +``` + +Parameters: + +- `model_path`: The path of the model folder for evaluation, which must contain the model weight file `model_state.pdparams` and the configuration file `model_config.json`. +- `test_path`: The test set file for evaluation. +- `per_device_eval_batch_size`: Batch size, please adjust it according to the machine situation, the default is 8. +- `max_seq_len`: The maximum segmentation length of the text and label candidates. When the input exceeds the maximum length, the input text will be automatically segmented. The default is 512. + + + +### 2.5 Inference + +You can use `paddlenlp.Taskflow` to load your custom model by specifying the path of the model weight file through `task_path`. + +```python +>>> from pprint import pprint +>>> from paddlenlp import Taskflow +>>> schema = ["病情诊断", "治疗方案", "病因分析", "指标解读", "就医建议", "疾病表述", "后果表述", "注意事项", "功效作用", "医疗费用", "其他"] +>>> my_cls = Taskflow("zero_shot_text_classification", schema=schema, task_path='./checkpoint/model_best', precision="fp16") +>>> pprint(my_cls("中性粒细胞比率偏低")) +``` + + + +### 2.6 Deployment + +We provide the deployment solution on the foundation of PaddleNLP SimpleServing, where you can easily build your own deployment service with three-line code. + +``` +# Save at server.py +from paddlenlp import SimpleServer, Taskflow + +schema = ["病情诊断", "治疗方案", "病因分析", "指标解读", "就医建议"] +utc = Taskflow("zero_shot_text_classification", + schema=schema, + task_path="../../checkpoint/model_best/", + precision="fp32") +app = SimpleServer() +app.register_taskflow("taskflow/utc", utc) +``` + +``` +# Start the server +paddlenlp server server:app --host 0.0.0.0 --port 8990 +``` + +It supports FP16 (half-precision) and multiple process for inference acceleration. + + + +### 2.7 Experiments + +The results reported here are based on the development set of KUAKE-QIC. + + | | Accuracy | Micro F1 | Macro F1 | + | :------: | :--------: | :--------: | :--------: | + | 0-shot | 28.69 | 87.03 | 60.90 | + | 5-shot | 64.75 | 93.34 | 80.33 | + | 10-shot | 65.88 | 93.76 | 81.34 | + | full-set | 81.81 | 96.65 | 89.87 | + +where k-shot means that there are k annotated samples per label for training. diff --git a/applications/zero_shot_text_classification/label_studio_text.md b/applications/zero_shot_text_classification/label_studio_text.md index 41a285449df6..6a441f199a4f 100644 --- a/applications/zero_shot_text_classification/label_studio_text.md +++ b/applications/zero_shot_text_classification/label_studio_text.md @@ -105,7 +105,7 @@ label-studio start 将导出的文件重命名为``label_studio.json``后,放入``./data``目录下。通过[label_studio.py](./label_studio.py)脚本可转为UTC的数据格式。 -在数据转换阶段,我们会自动构造用于模型训练的标签候选信息。例如在医疗意图分类中,标签候选为``["病情诊断", "治疗方案", "病因分析", "指标解读", "就医建议", "疾病表述", "后果表述", "注意事项", "功效作用", "医疗费用", "其他"]``,可通过``options``参数进行配置。 +在数据转换阶段,还需要提供标签候选信息,放在`./data/label.txt`文件中,每个标签占一行。例如在医疗意图分类中,标签候选为``["病情诊断", "治疗方案", "病因分析", "指标解读", "就医建议", "疾病表述", "后果表述", "注意事项", "功效作用", "医疗费用", "其他"]``,也可通过``options``参数直接进行配置。 ```shell python label_studio.py \ @@ -122,7 +122,7 @@ python label_studio.py \ - ``label_studio_file``: 从label studio导出的数据标注文件。 - ``save_dir``: 训练数据的保存目录,默认存储在``data``目录下。 - ``splits``: 划分数据集时训练集、验证集所占的比例。默认为[0.8, 0.1, 0.1]表示按照``8:1:1``的比例将数据划分为训练集、验证集和测试集。 -- ``options``: 指定分类任务的类别标签。若输入类型为文件,则文件中每行一个标签。默认为None,自动从输入数据中构造标签候选集合,当数据量大时耗时较长。 +- ``options``: 指定分类任务的类别标签。若输入类型为文件,则文件中每行一个标签。 - ``is_shuffle``: 是否对数据集进行随机打散,默认为True。 - ``seed``: 随机种子,默认为1000. diff --git a/applications/zero_shot_text_classification/label_studio_text_en.md b/applications/zero_shot_text_classification/label_studio_text_en.md new file mode 100644 index 000000000000..8dc6fa0aa82a --- /dev/null +++ b/applications/zero_shot_text_classification/label_studio_text_en.md @@ -0,0 +1,135 @@ +# Label Studio User Guide - Text Classification + +**Table of contents** + +- [1. Installation](#1) +- [2. Text Classification Task Annotation](#2) + - [2.1 Project Creation](#21) + - [2.2 Data Upload](#22) + - [2.3 Label Construction](#23) + - [2.4 Task Annotation](#24) + - [2.5 Data Export](#25) + - [2.6 Data Conversion](#26) + - [2.7 More Configuration](#27) + + + +## 1. Installation + +**Environmental configuration used in the following annotation examples:** + +- Python 3.8+ +- label-studio == 1.6.0 + +Use pip to install label-studio in the terminal: + +```shell +pip install label-studio==1.6.0 +``` + +Once the installation is complete, run the following command line: +```shell +label-studio start +``` + +Open [http://localhost:8080/](http://127.0.0.1:8080/) in the browser, enter the user name and password to log in, and start using label-studio for labeling. + + + +2. Text Classification Task Annotation + + + +#### 2.1 Project Creation + +Click Create to start creating a new project, fill in the project name, description, and select ``Text Classification`` in ``Labeling Setup``. + +- Fill in the project name, description + +
+ +
+ +- Upload the txt format file locally, select ``List of tasks``, and then choose to import this project. + + + +
+ +
+ +- Define labels + + + +
+ +
+ +
+ +
+ + + +#### 2.2 Data Upload + +You can continue to import local txt format data after project creation. See more details in [Project Creation](#data). + + + +#### 2.3 Label Construction + +After project creation, you can add/delete labels in Setting/Labeling Interface just as in [Project Creation](#label) + + + +#### 2.4 Task annotation + +
+ +
+ + + +#### 2.5 Data Export + +Check the marked text ID, select the exported file type as ``JSON``, and export the data: + +
+ +
+ + + +#### 2.6 Data conversion + +First, create a label file in the `./data` directory, with one label candidate per line. You can also directly set label condidates list by `options`. Rename the exported file to ``label_studio.json`` and put it in the ``./data`` directory. Through the [label_studio.py](./label_studio.py) script, it can be converted to the data format of UTC. + + +```shell +python label_studio.py \ + --label_studio_file ./data/label_studio.json \ + --save_dir ./data \ + --splits 0.8 0.1 0.1 \ + --options ./data/label.txt +``` + + + +#### 2.7 More Configuration + +- ``label_studio_file``: Data labeling file exported from label studio. +- ``save_dir``: The storage directory of the training data, which is stored in the ``data`` directory by default. +- ``splits``: The proportion of training set and validation set when dividing the data set. The default is [0.8, 0.1, 0.1], which means that the data is divided into training set, verification set and test set according to the ratio of ``8:1:1``. +- ``options``: Specify the label candidates set. For filename, there should be one label per line in the file. For list, the length should be longer than 1. +- ``is_shuffle``: Whether to randomly shuffle the data set, the default is True. +- ``seed``: random seed, default is 1000. + +Note: +- By default the [label_studio.py](./label_studio.py) script will divide the data proportionally into train/dev/test datasets +- Each time the [label_studio.py](./label_studio.py) script is executed, the existing data file with the same name will be overwritten +- For files exported from label_studio, each piece of data in the default file is correctly labeled manually. + +## References +- **[Label Studio](https://labelstud.io/)** From d33d2a5d7b5ef5487587e91e19b89aad61d5a5aa Mon Sep 17 00:00:00 2001 From: Huijuan Wang Date: Fri, 13 Jan 2023 16:16:03 +0800 Subject: [PATCH 2/4] [UTC] Add English documents --- applications/zero_shot_text_classification/README_en.md | 6 +++++- .../zero_shot_text_classification/label_studio_text_en.md | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/applications/zero_shot_text_classification/README_en.md b/applications/zero_shot_text_classification/README_en.md index 86f1c7b30b47..7e9b67763367 100644 --- a/applications/zero_shot_text_classification/README_en.md +++ b/applications/zero_shot_text_classification/README_en.md @@ -17,13 +17,17 @@ This project provides an end-to-end application solution for universal text classification based on Universal Task Classification (UTC) finetuning and goes through the full lifecycle of **data labeling, model training and model deployment**. We hope this guide can help you apply Text Classification techniques with zero-shot ability in your own products or models. +
+ UTC模型结构图 +
+ Text Classification refers to assigning a set of categories to given input text. Despite the advantages of tuning, applying text classification techniques in practice remains a challenge due to domain adaption and lack of labeled data, etc. This PaddleNLP Zero-shot Text Classification Guide builds on our UTC from the Unified Semantic Matching (USM) model series and provides an industrial-level solution that supports universal text classification tasks, including but not limited to **semantic analysis, semantic matching, intention recognition and event detection**, allowing you accomplish multiple tasks with a single model. Besides, our method brings good generation performance through multi-task pretraining. **Highlights:** - **Comprehensive Coverage**🎓: Covers various mainstream tasks of text classification, including but not limited to semantic analysis, semantic matching, intention recognition and event detection. -- **State-of-the-Art Performance**🏃: Strong performance from the UTC model. with good zero-shot performance and practicable few-shot ability. +- **State-of-the-Art Performance**🏃: Strong performance from the UTC model. with good zero-shot and few-shot performance. - **Easy to use**⚡: Three lines of code to use our Taskflow for out-of-box Zero-shot Text Classification capability. One line of command to model training and model deployment. diff --git a/applications/zero_shot_text_classification/label_studio_text_en.md b/applications/zero_shot_text_classification/label_studio_text_en.md index 8dc6fa0aa82a..d353a78a2370 100644 --- a/applications/zero_shot_text_classification/label_studio_text_en.md +++ b/applications/zero_shot_text_classification/label_studio_text_en.md @@ -36,7 +36,7 @@ Open [http://localhost:8080/](http://127.0.0.1:8080/) in the browser, enter the -2. Text Classification Task Annotation +## 2. Text Classification Task Annotation From 37399b659a3ea82074fd61ee8a5fc018b57d0d12 Mon Sep 17 00:00:00 2001 From: Huijuan Wang Date: Tue, 17 Jan 2023 11:59:35 +0800 Subject: [PATCH 3/4] [utc] update readmes --- applications/zero_shot_text_classification/README.md | 4 +++- applications/zero_shot_text_classification/README_en.md | 4 +++- .../zero_shot_text_classification/label_studio_text.md | 2 ++ .../zero_shot_text_classification/label_studio_text_en.md | 4 +++- 4 files changed, 11 insertions(+), 3 deletions(-) diff --git a/applications/zero_shot_text_classification/README.md b/applications/zero_shot_text_classification/README.md index 7bf3adecd2f4..399f21fbd04a 100644 --- a/applications/zero_shot_text_classification/README.md +++ b/applications/zero_shot_text_classification/README.md @@ -1,3 +1,5 @@ +简体中文 | [English](README_en.md) + # 零样本文本分类 **目录** @@ -27,7 +29,7 @@ **零样本文本分类应用亮点:** - **覆盖场景全面🎓:** 覆盖文本分类各类主流任务,支持多任务训练,满足开发者多样文本分类落地需求。 -- **效果领先🏃:** 具有突出分类效果的UTC模型作为训练基座,提供良好的零样本和小样本学习能力。 +- **效果领先🏃:** 具有突出分类效果的UTC模型作为训练基座,提供良好的零样本和小样本学习能力。该模型在ZeroCLUE和FewCLUE均取得榜首(截止2023年1月11日)。 - **简单易用:** 通过Taskflow实现三行代码可实现无标注数据的情况下进行快速调用,一行命令即可开启文本分类,轻松完成部署上线,降低多任务文本分类落地门槛。 - **高效调优✊:** 开发者无需机器学习背景知识,即可轻松上手数据标注及模型训练流程。 diff --git a/applications/zero_shot_text_classification/README_en.md b/applications/zero_shot_text_classification/README_en.md index 7e9b67763367..ad72e16f116c 100644 --- a/applications/zero_shot_text_classification/README_en.md +++ b/applications/zero_shot_text_classification/README_en.md @@ -1,3 +1,5 @@ +[简体中文](README.md) | English + # Zero-shot Text Classification **Table of contents** @@ -27,7 +29,7 @@ Text Classification refers to assigning a set of categories to given input text. - **Comprehensive Coverage**🎓: Covers various mainstream tasks of text classification, including but not limited to semantic analysis, semantic matching, intention recognition and event detection. -- **State-of-the-Art Performance**🏃: Strong performance from the UTC model. with good zero-shot and few-shot performance. +- **State-of-the-Art Performance**🏃: Strong performance from the UTC model. which ranks first on ZeroCLUE/FewCLUE as of 01/11/2023. - **Easy to use**⚡: Three lines of code to use our Taskflow for out-of-box Zero-shot Text Classification capability. One line of command to model training and model deployment. diff --git a/applications/zero_shot_text_classification/label_studio_text.md b/applications/zero_shot_text_classification/label_studio_text.md index 6a441f199a4f..a128560e68de 100644 --- a/applications/zero_shot_text_classification/label_studio_text.md +++ b/applications/zero_shot_text_classification/label_studio_text.md @@ -1,3 +1,5 @@ +简体中文 | [English](label_studio_text_en.md) + # 文本分类任务Label Studio使用指南 **目录** diff --git a/applications/zero_shot_text_classification/label_studio_text_en.md b/applications/zero_shot_text_classification/label_studio_text_en.md index d353a78a2370..458dc4b48981 100644 --- a/applications/zero_shot_text_classification/label_studio_text_en.md +++ b/applications/zero_shot_text_classification/label_studio_text_en.md @@ -1,3 +1,5 @@ +[简体中文](label_studio_text.md) | English + # Label Studio User Guide - Text Classification **Table of contents** @@ -16,7 +18,7 @@ ## 1. Installation -**Environmental configuration used in the following annotation examples:** +** Dependencies used in the following annotation examples:** - Python 3.8+ - label-studio == 1.6.0 From 2f6f69f90a2069b4c9a09e17046262fa00d2f7dd Mon Sep 17 00:00:00 2001 From: Huijuan Wang Date: Tue, 17 Jan 2023 14:45:53 +0800 Subject: [PATCH 4/4] [utc] add xclue link --- applications/zero_shot_text_classification/README.md | 2 +- applications/zero_shot_text_classification/README_en.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/applications/zero_shot_text_classification/README.md b/applications/zero_shot_text_classification/README.md index 399f21fbd04a..674b37628cda 100644 --- a/applications/zero_shot_text_classification/README.md +++ b/applications/zero_shot_text_classification/README.md @@ -29,7 +29,7 @@ **零样本文本分类应用亮点:** - **覆盖场景全面🎓:** 覆盖文本分类各类主流任务,支持多任务训练,满足开发者多样文本分类落地需求。 -- **效果领先🏃:** 具有突出分类效果的UTC模型作为训练基座,提供良好的零样本和小样本学习能力。该模型在ZeroCLUE和FewCLUE均取得榜首(截止2023年1月11日)。 +- **效果领先🏃:** 具有突出分类效果的UTC模型作为训练基座,提供良好的零样本和小样本学习能力。该模型在[ZeroCLUE](https://www.cluebenchmarks.com/zeroclue.html)和[FewCLUE](https://www.cluebenchmarks.com/fewclue.html)均取得榜首(截止2023年1月11日)。 - **简单易用:** 通过Taskflow实现三行代码可实现无标注数据的情况下进行快速调用,一行命令即可开启文本分类,轻松完成部署上线,降低多任务文本分类落地门槛。 - **高效调优✊:** 开发者无需机器学习背景知识,即可轻松上手数据标注及模型训练流程。 diff --git a/applications/zero_shot_text_classification/README_en.md b/applications/zero_shot_text_classification/README_en.md index ad72e16f116c..6a7fb0f8a878 100644 --- a/applications/zero_shot_text_classification/README_en.md +++ b/applications/zero_shot_text_classification/README_en.md @@ -29,7 +29,7 @@ Text Classification refers to assigning a set of categories to given input text. - **Comprehensive Coverage**🎓: Covers various mainstream tasks of text classification, including but not limited to semantic analysis, semantic matching, intention recognition and event detection. -- **State-of-the-Art Performance**🏃: Strong performance from the UTC model. which ranks first on ZeroCLUE/FewCLUE as of 01/11/2023. +- **State-of-the-Art Performance**🏃: Strong performance from the UTC model, which ranks first on [ZeroCLUE](https://www.cluebenchmarks.com/zeroclue.html)/[FewCLUE](https://www.cluebenchmarks.com/fewclue.html) as of 01/11/2023. - **Easy to use**⚡: Three lines of code to use our Taskflow for out-of-box Zero-shot Text Classification capability. One line of command to model training and model deployment.