这个库里面哪些代码是ptpcg这个算法用到的 #45

xxllp · 2022-08-02T02:24:15Z

Agreement

Fill the space in brackets with x to check the agreement items.
Before submitting this issue, I've fully checked the instructions in README.md.
Before submitting this issue, I'd searched in the issue area and didn't find a solved issue that covers my problem.
This issue is about the toolkit itself, not Python, pip or other programming basics.
I understand if I do not check all the agreemnt items above, my issue MAY BE CLOSED OR REMOVED WITHOUT FURTHER EXPLANATIONS.

Problem

我想单独看这个算法的相关的部分内容不看其他的是否有历史的分支项目代码

Environment

Environment	Values
System	Windows/Linux
GPU Device
CUDA Version
Python Version
PyTorch Version
dee (the Toolkit) Version

Full Log

Log:

The text was updated successfully, but these errors were encountered:

Spico197 · 2022-08-02T07:39:19Z

没有。。您可以从入口单步调试一下

xxllp · 2022-08-02T07:44:36Z

而且我感觉这个工具换一个新的数据需要设置的东西太多了

Spico197 · 2022-08-02T07:46:18Z

而且我感觉这个工具换一个新的数据需要设置的东西太多了

可以详细说下吗？欢迎PR

xxllp · 2022-08-02T07:49:15Z

现在的 event type 里面的类生成是否可以自动化一些
不然新的数据都需要加个类事件类型一多写起来很恶心这个明显属于扩展性的问题

Spico197 · 2022-08-02T07:53:37Z

嗯啊，这部分主要是延续了Doc2EDAG的风格。拓展性确实是个问题。在Data/trigger.py中挑选伪触发词时其实会自动生成相关的template。搭配一些自动化处理的脚本，所以只要数据集是现成的，一般不会手写template。

xxllp · 2022-08-02T08:00:02Z

对这个ptpcg 论文里面还有一些疑问：
1.这个伪触法词组合是提前计算好的后续在哪里用到了这个目前只是看到了 trigger个数有限制
2. Pruned Complete Graph Construction 里面获取 Combination 这一步里面 确定那些节点是伪触法词 这个看是根据一个图算法来的。但是这个里面都不知道每个实体的role 怎么能保障图出来的那些就是伪触法词呢

上面 1和2 这里面的伪触法词是一个意思的话没太理解是怎么对应上的还是必须到最后才能确定？

Spico197 · 2022-08-02T08:06:11Z

嗯嗯好的。

在构图部分是需要利用伪触发词的特点的。如论文所述，伪触发词个数不同时图的结构也不一样。这部分可以看这个部分：

DocEE/dee/helper/arg_rel.py

Lines 280 to 282 in 692a1a2

    
           def build_directed_graph( 
        
               self, event_args_objs, event_idx, event_type_fields_list, at_least_one=False 
        
           ):

伪触发词在剪枝完全图中是符合一些图结构特点的（互相全连接，有共同的邻居节点），所以可以从图的性质中直接抽取出来。role是后面分类之后再确定的，这里其实是没有做role和伪触发词的强约束的，这是后续可以继续优化的一个提升点。

xxllp · 2022-08-02T08:26:04Z

好的~~ 谢谢
还有个问题开始的ner 模型实体的类型只是某些很粗的类型而不是论元的类型对吧。
但是如果用粗的类型应该会又标注不完全的可能这种是不是也会产生一些干扰

Spico197 · 2022-08-02T08:30:09Z

嗯对的，是实体类型，比如人名机构时间什么的。如果更换数据集，且数据集里没提供这种实体类别的话，可以用论元角色作为实体类型。
用粗粒度的标注方法应该还好。如果用论元角色作为类型，可能会有标注不完全的可能。会造成一些干扰。所以可以用正则或其它方法对实体做个补全，在PTPCG上是有一些提升的。

xxllp · 2022-08-02T08:48:51Z

了解大体思路还是可以的~~就是感觉这块跟后面role 分类有些冗余了

Spico197 · 2022-08-02T08:50:54Z

如果伪触发词数是1的话还好。数量大于1的时候还是必须要做role分类的，不过这里确实少个约束。

Spico197 · 2022-08-02T08:51:29Z

如果伪触发词数是1的话还好。数量大于1的时候还是必须要做role分类的，不过这里确实少个约束。

这里role分类是指伪触发词的role分类，普通实体role分类肯定是要做的。

xxllp · 2022-08-02T09:56:29Z

这种dee的模型对文件长度不是很长的文本效果也能有保障吗

Spico197 · 2022-08-02T11:55:04Z

这种dee的模型对文件长度不是很长的文本效果也能有保障吗

缺乏实验结果。文本长度不长的一般都有触发词，用不上这种无触发词的模型。不过PTPCG的伪触发词可以作为一种补充，在篇章事件抽取任务上来看是有提升的。

xxllp · 2022-08-03T01:24:22Z

论文里面几个模型的效果对比都是复现来的但是我看不少都比算法原始的论文的指标要低
不知道是否有多此随机取平均值啥的保障结果的可靠

Spico197 · 2022-08-03T04:14:09Z

论文里面几个模型的效果对比都是复现来的但是我看不少都比算法原始的论文的指标要低不知道是否有多此随机取平均值啥的保障结果的可靠

还好吧，有的高有的低。我们想要做的分析之前的论文里没有给出明确的结果，所以只能跑他们的代码了。由于没有那么多的资源去跑不同随机种子的baseline（一个模型就要4卡跑一周），所以只汇报了官方代码中固定种子的结果。

xxllp · 2022-08-03T06:05:40Z

还有个不太明白的地方实体相似度矩阵训练得到实体之间的连接矩阵
那个真实的gt 是从哪里来的？是从重要性role 来的还是？

Spico197 · 2022-08-03T06:08:45Z

您是指金标连接矩阵吗？就是金标的剪枝完全图

xxllp · 2022-08-03T06:15:09Z

对的这个金标的剪枝完全图又是从哪里来的

Spico197 · 2022-08-03T06:17:17Z

嗷嗷。是按论文里介绍的方法构建的。arxiv版论文中的Pruned complete graph building小节里有说明。

xxllp · 2022-08-03T06:29:02Z

ok 看了下大概是懂了
但是这块如果遇到那种一个实体出现在多个段落的不同位置，
这个时候计算相似度的时候取哪个位置的entity representions ？还是取了个啥平均池化

Spico197 · 2022-08-03T06:31:33Z

您好，论文的Entity Representation小节里有介绍的。

xxllp · 2022-08-03T06:57:45Z

看了下论文里面的对比实验在多事件的数据上本论文的效果会差很多

Spico197 · 2022-08-03T06:59:35Z

嗯，是要差一点。我们也在future discussion里给出了一些量化的错误分析结果。是需要研究的可提升点。

xxllp · 2022-08-03T08:05:49Z

还有个问题 Event Records Generation 里面的event type 和combinations 对应看直接用的笛卡儿积
是对每个事件类型分别对 combinations 进行分类是吗
是不是这块导致在多事件下效果比较一般的呢~~

Spico197 · 2022-08-03T16:06:49Z

还有个问题 Event Records Generation 里面的event type 和combinations 对应看直接用的笛卡儿积是对每个事件类型分别对 combinations 进行分类是吗是不是这块导致在多事件下效果比较一般的呢~~

是的。
应该还是组合的问题比较大，笛卡尔积这部分似乎问题不大。

xxllp · 2022-08-04T06:12:33Z

好的，你的意思是在相似图预测这块精准性有些问题？还是在预测后的图拆解这块呢我想后面做一些效果提升的工作

Spico197 · 2022-08-04T06:16:17Z

相似度预测可以继续尝试一下。论文future discussion一节中有更多的说明。欢迎您cite我们的工作~

Spico197 · 2022-08-04T10:46:29Z

如果有其它问题，可以新开issue。针对前面的代码结构问题，欢迎PR~

xxllp added the question Further information is requested label Aug 2, 2022

Spico197 closed this as completed Aug 2, 2022

Spico197 reopened this Aug 2, 2022

Spico197 added the discussion Discussion on DocEE and SentEE label Aug 2, 2022

Spico197 closed this as completed Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

这个库里面哪些代码是ptpcg这个算法用到的 #45

这个库里面哪些代码是ptpcg这个算法用到的 #45

xxllp commented Aug 2, 2022 •

edited by Spico197

Spico197 commented Aug 2, 2022

xxllp commented Aug 2, 2022

Spico197 commented Aug 2, 2022

xxllp commented Aug 2, 2022

Spico197 commented Aug 2, 2022

xxllp commented Aug 2, 2022

Spico197 commented Aug 2, 2022

xxllp commented Aug 2, 2022

Spico197 commented Aug 2, 2022

xxllp commented Aug 2, 2022

Spico197 commented Aug 2, 2022

Spico197 commented Aug 2, 2022

xxllp commented Aug 2, 2022

Spico197 commented Aug 2, 2022

xxllp commented Aug 3, 2022

Spico197 commented Aug 3, 2022

xxllp commented Aug 3, 2022

Spico197 commented Aug 3, 2022

xxllp commented Aug 3, 2022

Spico197 commented Aug 3, 2022

xxllp commented Aug 3, 2022

Spico197 commented Aug 3, 2022

xxllp commented Aug 3, 2022

Spico197 commented Aug 3, 2022

xxllp commented Aug 3, 2022

Spico197 commented Aug 3, 2022

xxllp commented Aug 4, 2022

Spico197 commented Aug 4, 2022

Spico197 commented Aug 4, 2022

这个库里面哪些代码是ptpcg这个算法用到的 #45

这个库里面哪些代码是ptpcg这个算法用到的 #45

Comments

xxllp commented Aug 2, 2022 • edited by Spico197

Agreement

Problem

Environment

Full Log

Spico197 commented Aug 2, 2022

xxllp commented Aug 2, 2022

Spico197 commented Aug 2, 2022

xxllp commented Aug 2, 2022

Spico197 commented Aug 2, 2022

xxllp commented Aug 2, 2022

Spico197 commented Aug 2, 2022

xxllp commented Aug 2, 2022

Spico197 commented Aug 2, 2022

xxllp commented Aug 2, 2022

Spico197 commented Aug 2, 2022

Spico197 commented Aug 2, 2022

xxllp commented Aug 2, 2022

Spico197 commented Aug 2, 2022

xxllp commented Aug 3, 2022

Spico197 commented Aug 3, 2022

xxllp commented Aug 3, 2022

Spico197 commented Aug 3, 2022

xxllp commented Aug 3, 2022

Spico197 commented Aug 3, 2022

xxllp commented Aug 3, 2022

Spico197 commented Aug 3, 2022

xxllp commented Aug 3, 2022

Spico197 commented Aug 3, 2022

xxllp commented Aug 3, 2022

Spico197 commented Aug 3, 2022

xxllp commented Aug 4, 2022

Spico197 commented Aug 4, 2022

Spico197 commented Aug 4, 2022

xxllp commented Aug 2, 2022 •

edited by Spico197