Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

为什么ner_token_labels 里面没有包含扩充的OtherType的实体? #33

Closed
chenxshuo opened this issue May 22, 2022 · 6 comments
Closed
Labels
bug Something isn't working discussion Discussion on DocEE and SentEE documentation Improvements or additions to documentation

Comments

@chenxshuo
Copy link

** Problems **
请问为什么在NER模型训练部分输入进模型的ner_token_labels 里面没有论文中提到扩充的Money, Time等实体?

我发现在这里会对entity label 进行in的判断,判断基于的dict来自于 DEEExample。 但是这个list里面没有B-OtherTypeI-OtherType.

@chenxshuo chenxshuo added the discussion Discussion on DocEE and SentEE label May 22, 2022
@Spico197
Copy link
Owner

Hi,感谢对本项目的关注。
论文中提到的 OtherType 包括两类:

  1. 原始 ChFinAnn 数据中已经包括的 OtherType (前提是要使用common_fields中包含OtherType的template):

common_fields = ["StockCode", "StockAbbr", "CompanyName", "OtherType"]

  1. 使用正则匹配的(设置include_complementary_ents=True,在数据处理时hack):

DocEE/dee/helper/dee.py

Lines 48 to 67 in c6640b3

if inlcude_complementary_ents:
# build index
comp_ents_sent_index = defaultdict(list)
comp_ents_start_index = defaultdict(list)
comp_ents_end_index = defaultdict(list)
for raw_field, ents in self.complementary_field2ents.items():
# field = 'Other' + field.title()
field = "OtherType"
for ent, pos_span in ents:
pos_span = list(pos_span)
if ent not in detail_align_dict["ann_valid_mspans"]:
comp_ents_sent_index[pos_span[0]].append(
[ent, raw_field, pos_span]
)
comp_ents_start_index[(pos_span[0], pos_span[1])].append(
[ent, raw_field, pos_span]
)
comp_ents_end_index[(pos_span[0], pos_span[2])].append(
[ent, raw_field, pos_span]
)

@chenxshuo
Copy link
Author

chenxshuo commented May 23, 2022

谢谢回复! 数据处理里面的确向DEEExample加入了OtherType。 但是我发现在NERFeature 对象里面,缺少应当扩充的实体,就导致在用ner_token_lables训练NER部分的时候,ner_token_lables没有扩充的实体,也就没起到enhancement的作用?(针对的是DocEE-Fin)。我个人觉得是否应该在 此处 添加如下两行代码?

entity_label_list.append("B-OtherType")
entity_label_list.append("I-OtherType")

@Spico197 Spico197 added the bug Something isn't working label May 23, 2022
@Spico197
Copy link
Owner

是的您说的没错,看来是个bug。不过我建议在下面两个模板的common_fields中按需添加或删除OtherType即可,否则一旦硬编码至repo中,会不方便后续其他人做ablation study。我会更新一下新的template。

common_fields = []

@Spico197 Spico197 added the documentation Improvements or additions to documentation label May 23, 2022
@chenxshuo
Copy link
Author

的确~ 直接硬编码时如果不扩充的话就会有问题,感谢回复~

@Spico197
Copy link
Owner

感谢您对本项目的贡献

@Spico197
Copy link
Owner

Fixed in the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working discussion Discussion on DocEE and SentEE documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants