Add model Prohetnet #1698

d294270681 · 2022-02-21T10:25:19Z

Description

Add new model Prophetnet
The model weight:
链接：https://pan.baidu.com/s/1FOnd01rNvDJoONYegacq1Q
提取码：o28q
The tokenizer vocab file:
链接：https://pan.baidu.com/s/1pUxLy6eGTZFqzf85OlIzUg
提取码：ltp6

smallv0221

数据集读取的问题再看看，cnn_daliymail和gigaword数据集都可以通过load_dataset传入名称加载，不同点是前者是paddlenlp数据集，后者是HuggingFace数据集。但是访问和处理方式应该没什么差别

smallv0221 · 2022-02-28T05:06:04Z

examples/text_summarization/prophetnet/README.md

+    --epochs=6 \
+    --lr=0.0001 \
+    --warmup_init_lr=1e-07 \
+    --warmup_updates=1000 \


用warmup_steps比较好

smallv0221 · 2022-02-28T05:21:01Z

examples/text_summarization/prophetnet/generate.py

+    test_data_src = 'data/' + args.dataset + '_data/uncased_tok_data/test.src'
+    test_data_tgt = 'data/' + args.dataset + '_data/uncased_tok_data/test.tgt'
+
+    test_dataset = load_dataset(


这里可以直接使用paddlenlp内置的cnn_daliymail数据集么

源码使用的是GLGE baseline的cnn_dailymail，和paddlenlp的cnn_daliymail有点区别，GLGE的文本会多个[S_SEP]标签，不知道会不会产生影响。

那GLGE baseline的这两个数据集和hugging face的这两个数据集一样么

cnndm和gigaword都存在一些差别

smallv0221 · 2022-02-28T05:24:12Z

paddlenlp/transformers/prophetnet/tokenizer.py

+from .. import PretrainedTokenizer, BasicTokenizer, WordpieceTokenizer
+
+
+class Trie:


这个trie在基类里有，应该不用重新定义吧

smallv0221 · 2022-02-28T05:26:41Z

examples/text_summarization/prophetnet/train_prophetnet.py

+dev_data_src = 'data/' + args.dataset + '_data/uncased_tok_data/dev.src'
+dev_data_tgt = 'data/' + args.dataset + '_data/uncased_tok_data/dev.tgt'
+
+train_dataset = load_dataset(


这里应该可以直接读内置的cnn_daliymail数据集，gigaword数据集在huggingface上也有，paddlenlp的load_dataset也可以读取HF的数据集

smallv0221 · 2022-02-28T05:49:18Z

如果都能通过传入数据集名称直接加载，应该可以省略一些数据处理代码

smallv0221

example下的那个__init__.py去掉吧

d294270681 · 2022-03-04T03:23:20Z

example下的那个__init__.py去掉吧

已修改

smallv0221

LGTM

d294270681 added 5 commits February 21, 2022 18:23

add Prohetnet model

a5cf998

update prohetnet

6f984b9

update format

555ad31

pre commit

74e7318

add prophetnet example

fb76a3b

d294270681 changed the title ~~add Prohetnet model~~ Add model Prohetnet Feb 22, 2022

yingyibiao added the contributions label Feb 25, 2022

smallv0221 requested changes Feb 28, 2022

View reviewed changes

update tokenizer.py,run_train.sh,train_prophetnet.py

80e2dca

smallv0221 reviewed Mar 3, 2022

View reviewed changes

remove evaluate/gigaword/__init__.py

7518275

smallv0221 approved these changes Mar 4, 2022

View reviewed changes

Merge branch 'develop' into develop

76385a7

smallv0221 merged commit 4871622 into PaddlePaddle:develop Mar 7, 2022

yingyibiao mentioned this pull request Mar 17, 2022

PaddleNLP 2.2.5 Release Note Candidate #1772

Closed

guoshengCS mentioned this pull request Apr 29, 2022

PaddleNLP v2.3rc Release Note Candidate #2031

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add model Prohetnet #1698

Add model Prohetnet #1698

d294270681 commented Feb 21, 2022

smallv0221 left a comment

smallv0221 Feb 28, 2022

d294270681 Mar 1, 2022

smallv0221 Feb 28, 2022

d294270681 Mar 1, 2022

smallv0221 Mar 1, 2022

d294270681 Mar 1, 2022

smallv0221 Feb 28, 2022

d294270681 Mar 1, 2022

smallv0221 Feb 28, 2022

smallv0221 commented Feb 28, 2022

smallv0221 left a comment

d294270681 commented Mar 4, 2022

smallv0221 left a comment

		from .. import PretrainedTokenizer, BasicTokenizer, WordpieceTokenizer


		class Trie:

Add model Prohetnet #1698

Add model Prohetnet #1698

Conversation

d294270681 commented Feb 21, 2022

Description

smallv0221 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smallv0221 commented Feb 28, 2022

smallv0221 left a comment

Choose a reason for hiding this comment

d294270681 commented Mar 4, 2022

smallv0221 left a comment

Choose a reason for hiding this comment