新数据集的训练 #46

xxllp · 2022-08-05T02:00:06Z

Agreement

Fill the space in brackets with x to check the agreement items.
Before submitting this issue, I've fully checked the instructions in README.md.
Before submitting this issue, I'd searched in the issue area and didn't find a solved issue that covers my problem.
This issue is about the toolkit itself, not Python, pip or other programming basics.
I understand if I do not check all the agreemnt items above, my issue MAY BE CLOSED OR REMOVED WITHOUT FURTHER EXPLANATIONS.

Problem

在自己新数据的训练数据处理这块如何入手有无具体的步骤指引

Environment

Environment	Values
System	Windows/Linux
GPU Device
CUDA Version
Python Version
PyTorch Version
dee (the Toolkit) Version

The text was updated successfully, but these errors were encountered:

Spico197 · 2022-08-05T04:20:41Z

您可在现有数据集的基础上单步调试一下，参考一下每个模块的功能。也可以参考这个issue #41 的讨论。

xxllp · 2022-08-05T06:17:16Z

还有个问题：这个代码里面单个事件的某些role 是否可用支持多个metion。因为有些role 实体存在连续的这种

Spico197 · 2022-08-05T08:20:46Z

可以参考这个issue的讨论：#38 (comment)

xxllp · 2022-08-05T09:14:28Z

very thx ，数据可用跑起来了这块对单个文件直接预测这块是已经具备了吗

Spico197 · 2022-08-05T11:07:53Z

inference.py 文件中提供了预测单个instance的例子。如果是预测一个文件的话，建议手写下 batch 化的预测，可以快一点。

xxllp · 2022-08-06T02:16:24Z

看模型评估的时候有对比gold_span 和predict_span 结果前者是ner的gt 是吧
我这边数据 predict 里面的 role F1 跟 gold_span 都差距10多个百分点

xxllp · 2022-08-06T03:15:19Z

而且我现在在test 数据集上面的指标都是0 而dev上面是正常的这个是啥原因导致的。dev test两个文件的格式目前是完全一样的

Spico197 · 2022-08-06T04:42:13Z

看模型评估的时候有对比gold_span 和predict_span 结果前者是ner的gt 是吧我这边数据 predict 里面的 role F1 跟 gold_span 都差距10多个百分点

什么是“ner的gt”？没太明白。。。您指的role F1是什么？

Spico197 · 2022-08-06T04:42:43Z

而且我现在在test 数据集上面的指标都是0 而dev上面是正常的这个是啥原因导致的。dev test两个文件的格式目前是完全一样的

不清楚，需要再检查检查

xxllp · 2022-08-06T06:40:13Z

看模型评估的时候有对比gold_span 和predict_span 结果前者是ner的gt 是吧我这边数据 predict 里面的 role F1 跟 gold_span 都差距10多个百分点

什么是“ner的gt”？没太明白。。。您指的role F1是什么？

这个输出目录中有 dee_eval.dev.gold_span.TriggerAwarePrunedCompleteGraph.json 这种命名这个gold_span 应该就是用的gold ner 是吧，role F1 就是这个json 里面 overall-overall 里面的 MacroF1 ，就是所有role的F1 .

xxllp · 2022-08-08T02:35:50Z

大佬 predict_one 返回的json 里面 comments 和event_list 是啥关系为啥event_list 的论元少于在comments中的数量

Spico197 · 2022-08-08T02:37:50Z

看模型评估的时候有对比gold_span 和predict_span 结果前者是ner的gt 是吧我这边数据 predict 里面的 role F1 跟 gold_span 都差距10多个百分点

什么是“ner的gt”？没太明白。。。您指的role F1是什么？

这个输出目录中有 dee_eval.dev.gold_span.TriggerAwarePrunedCompleteGraph.json 这种命名这个gold_span 应该就是用的gold ner 是吧，role F1 就是这个json 里面 overall-overall 里面的 MacroF1 ，就是所有role的F1 .

嗯啊是的，gold_span 是指预测结果时使用金标实体。后面您说的 role F1 我们称之为 overall F1 结果，因为首先要确保类别相同。NER 部分在篇章事件抽取任务中很重要，所以金标 NER 的 overall F1 会高很多。

Spico197 · 2022-08-08T02:38:23Z

大佬 predict_one 返回的json 里面 comments 和event_list 是啥关系为啥event_list 的论元少于在comments中的数量

因为并不是每个实体都是参与事件的论元

xxllp · 2022-08-08T02:53:45Z

但是我看了下应该是缺少的居多，没参与的还是不多的，奇怪

Spico197 · 2022-08-08T02:59:06Z

但是我看了下应该是缺少的居多，没参与的还是不多的，奇怪

确实很奇怪，可能是潜在的bug。您是在自己的数据集上训练的吗？repo中公开的模型里有没有发现这个问题？我看看能不能复现一下

xxllp · 2022-08-08T03:26:53Z

是自己的数据集公开的这块我没细看因为我看ner span 的F1都有了0.90+ 但是最终事件的role F1 却只有0.82 这个明显差了不少。可能就是过了那个连接图一些实体间的连接都是0 才少了

xxllp · 2022-08-15T06:02:26Z

有个新问题 PTPCG 模型 train 多卡训练是否哪里需要改下直接使用 scripts/train_multi.sh 带起来的话只有一张卡实际在跑

Spico197 · 2022-08-15T06:05:57Z

有个新问题 PTPCG 模型 train 多卡训练是否哪里需要改下直接使用 scripts/train_multi.sh 带起来的话只有一张卡实际在跑

可以参考Doc2EDAG脚本的启动方法，加入--parallel_decorateflag。鉴于目前讨论事项与本issue无关，先将这个issue关闭了，其它问题欢迎新开issue。

xxllp added the question Further information is requested label Aug 5, 2022

Spico197 closed this as completed Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

新数据集的训练 #46

新数据集的训练 #46

xxllp commented Aug 5, 2022 •

edited

Spico197 commented Aug 5, 2022

xxllp commented Aug 5, 2022

Spico197 commented Aug 5, 2022

xxllp commented Aug 5, 2022

Spico197 commented Aug 5, 2022

xxllp commented Aug 6, 2022

xxllp commented Aug 6, 2022

Spico197 commented Aug 6, 2022

Spico197 commented Aug 6, 2022

xxllp commented Aug 6, 2022 •

edited

xxllp commented Aug 8, 2022

Spico197 commented Aug 8, 2022

Spico197 commented Aug 8, 2022

xxllp commented Aug 8, 2022

Spico197 commented Aug 8, 2022

xxllp commented Aug 8, 2022 •

edited

xxllp commented Aug 15, 2022

Spico197 commented Aug 15, 2022

新数据集的训练 #46

新数据集的训练 #46

Comments

xxllp commented Aug 5, 2022 • edited

Agreement

Problem

Environment

Spico197 commented Aug 5, 2022

xxllp commented Aug 5, 2022

Spico197 commented Aug 5, 2022

xxllp commented Aug 5, 2022

Spico197 commented Aug 5, 2022

xxllp commented Aug 6, 2022

xxllp commented Aug 6, 2022

Spico197 commented Aug 6, 2022

Spico197 commented Aug 6, 2022

xxllp commented Aug 6, 2022 • edited

xxllp commented Aug 8, 2022

Spico197 commented Aug 8, 2022

Spico197 commented Aug 8, 2022

xxllp commented Aug 8, 2022

Spico197 commented Aug 8, 2022

xxllp commented Aug 8, 2022 • edited

xxllp commented Aug 15, 2022

Spico197 commented Aug 15, 2022

xxllp commented Aug 5, 2022 •

edited

xxllp commented Aug 6, 2022 •

edited

xxllp commented Aug 8, 2022 •

edited