Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIBM-2019-Distantly Supervised Biomedical Named Entity Recognition with Dictionary Expansion #282

Open
BrambleXu opened this issue Nov 8, 2019 · 1 comment
Assignees
Labels
NER(T) Named Entity Recognition Task

Comments

@BrambleXu
Copy link
Owner

Summary:

#275 在Bio领域的应用。主体是基于AutoNER的,只不过重心放在了如何自动构建高质量字典上。

Resource:

  • pdf
  • [code](
  • [paper-with-code](

Paper information:

  • Author:
  • Dataset:
  • keywords:

Notes:

感觉这篇文章和我的大方向一样,把AutoNER用到某个具体的domain里。

A recent work, AutoNER [24], uses a neural model that leverages distant supervision from entity dictionaries. However, these existing studies can only use limited information from the user-input dictionaries, especially when the dictionaries are incomplete in real word applications.

Our AUTOBIONER framework does not need any human annotated data and relies on incomplete entity dictionaries.

AUTOBIONER first exploits statistical signals from massive corpora for candidate entity generation and user-input dictionaries for training example annotation. Since the dictionaries are assumed to be incomplete, AUTOBIONER performs a novel automatic entity set expansion for corpus-level new entity recognition and dictionary completion.

It treats matched entities as positive examples to infer the type of unmatched candidates using context information. The expanded dictionaries are then used as distant supervision to train a neural model for BioNER.

根据上面的介绍,他们的重点是放在如何自动构建一个 expanded dictionary上了。

Model Graph:

image

我上面的看法是正确的,这篇文章主要是构建dictionary的。

A. Phrase Mining and Dictionary Matching

Phrase Mining. 使用AutoPhrase。

Dictionary Tailoring. 为了防止在匹配alias时产生过多的false-positive,添加了一个dictionary tailoring步骤。把字典针对corpus进行过滤。即如果正规名称没有出现在coppus里出现过哪怕一次的话,就删除这个单词。(这个主要是考虑提高precision。但是在真正的环境中,是没有corpus这种东西的。谁都不知道公司名会出现在什么地方。)

B. Entity Expansion

image

这部分是这篇论文最核心的内容。

Result:

Thoughts:

Next Reading:

@BrambleXu BrambleXu self-assigned this Nov 8, 2019
@BrambleXu BrambleXu added DA(T) Domain Adaptation/Domain Specific Task NER(T) Named Entity Recognition Task and removed DA(T) Domain Adaptation/Domain Specific Task labels Nov 8, 2019
@JackySnake
Copy link

请问这篇论文在哪里看?我一直没有找到

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NER(T) Named Entity Recognition Task
Projects
None yet
Development

No branches or pull requests

2 participants