Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation mapping IOB tags with the DEFT paper's Tables 2 and 3 #12

Closed
Franck-Dernoncourt opened this issue Sep 30, 2019 · 7 comments

Comments

@Franck-Dernoncourt
Copy link
Contributor

Franck-Dernoncourt commented Sep 30, 2019

It could be interesting to have some documentation mapping IOB tags with the DEFT paper's Tables 2 and 3. E.g.

DNA /Users/sspala/dev/definition_extraction/textbook_sentences/adjudication_files_082219_FINAL/ksun/biology/t1_biology_jlee_0.txt 17742 17745 B-Definiti-frag T123-frag T123 fragment
has the IOB tag B-Definiti-frag, which might not be obvious to link to DEFT paper's Tables 2

@Franck-Dernoncourt Franck-Dernoncourt changed the title Add documentation mapping IOB tags with the paper Add documentation mapping IOB tags with the DEFT paper's Tables 2 and 3 Sep 30, 2019
@sashaspala
Copy link
Collaborator

You're right - we don't really have a discussion about fragments in the paper. They're not super common in the textbooks case, but they do come up occasionally when the definition or term phrase is non-contiguous (often when the term is plopped in the definition phrase). I'll add this to the to-do list.

@Franck-Dernoncourt
Copy link
Contributor Author

Franck-Dernoncourt commented Sep 30, 2019

Thanks, FYI this is the tag list I had generated two months ago, but I am not sure if it is still up-to-date:

image

@mukesh-mehta
Copy link

There are few other tags present in the data which are not BIO tags.

“	data/source_txt/train/t1_biology_1_404.txt	 22015	 22016	 I-Qualifier	 T221	 T220	 Supplements
seahorse”)—a	data/source_txt/train/t1_biology_1_404.txt	 22016	 22028	 Qualifier	 T221	 T220	 Supplements
transfer	data/source_txt/train/t1_biology_1_404.txt	 26543	 26551	 B-Qualifier	 T253	 T251	 Supplements
steady	data/source_txt/train/t1_biology_1_0.txt	 2683	 2689	 I-Alias-Term	 T211	 T210	 AKA
state”)—the	data/source_txt/train/t1_biology_1_0.txt	 2690	 2701	 Alias-Term	 T211	 T210	 AKA
DNA	data/source_txt/train/t1_biology_1_0.txt	 3323	 3326	 B-Alias-Term	 T221	 T220	 AKA

I found the following list of TAGS in data.

  • 'O',
  • 'I-Definition',
  • 'I-Term',
  • 'I-Secondary-Definition',
  • 'B-Term',
  • 'B-Definition',
  • 'I-Definiti-frag',
  • 'I-Qualifier',
  • 'I-Alias-Term',
  • 'B-Alias-Term',
  • 'B-Secondary-Definition',
  • 'I-Referential-Definition',
  • 'B-Referential-Definition',
  • 'B-Qualifier',
  • 'B-Referential-Term',
  • 'I-Referential-Term',
  • 'B-Definiti-frag',
  • 'I-Ordered-Definition',
  • 'Definition',
  • 'Term',
  • 'I-Ordered-Term',
  • 'Alias-Term',
  • 'B-Te-frag',
  • 'B-Ordered-Definition',
  • 'B-Ordered-Term',
  • 'Secondary-Definition',
  • 'I-Te-frag',
  • 'Qualifier',
  • 'Referential-Definition',
  • 'B-Alias-Te-frag'

@mukesh-mehta
Copy link

Any update on the issue??

@mukesh-mehta
Copy link

Thanks @sashaspala for resolving the issue

@Franck-Dernoncourt
Copy link
Contributor Author

Franck-Dernoncourt commented Jan 8, 2020

Thanks @sashaspala, this fixes the tagging issue raised Mukesh Mehta, but was some documentation added to explain all tags, such as B-Definiti-frag?

@mukesh-mehta
Copy link

Updated list of BIO Tags from Train and Dev set

['B-Referential-Definition',
 'I-Referential-Term',
 'I-Alias-Term',
 'I-Qualifier',
 'B-Ordered-Term',
 'B-Ordered-Definition',
 'B-Referential-Term',
 'O',
 'B-Qualifier',
 'I-Term-frag',
 'I-Definition',
 'I-Definition-frag',
 'I-Referential-Definition',
 'I-Term',
 'B-Secondary-Definition',
 'I-Ordered-Definition',
 'B-Alias-Term',
 'I-Ordered-Term',
 'B-Definition',
 'B-Term-frag',
 'B-Definition-frag',
 'I-Secondary-Definition',
 'B-Term',
 'B-Alias-Term-frag']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants