noun chunks are not consistently extracted #1818

pengyu · 2018-01-09T18:10:04Z

The following example shows that sometime "Oxtr(-/-) mice" is extracted as a noun chunk, sometimes it is not. How to make the result to be consistent? Thanks.

$ cat main2.py 
#!/usr/bin/env python
# vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8:

import spacy
nlp = spacy.load('en', disable=['tokenizer', 'ner', 'textcat'])
## 'tagger' and 'parser' can not be disabled.

doc = nlp(u'Male Oxtr(-/-) mice failed to maintain their body temperatures during exposure to a cold environment.')
print [x for x in doc.noun_chunks]
doc = nlp(u'Oxtr(-/-) mice also showed decreased neuronal activation in the thermoregulatory hypothalamic region during cold exposure.')
print [x for x in doc.noun_chunks]
$ ./main2.py 
[Male Oxtr(-/-) mice, their body temperatures, exposure, a cold environment]
[mice, decreased neuronal activation, the thermoregulatory hypothalamic region, cold exposure]

The text was updated successfully, but these errors were encountered:

ines · 2018-12-14T11:28:52Z

The noun chunks depend on the part-of-speech tags and dependency parse, so this issue likely comes down to incorrect predictions made by the tagger or parser.

I'm merging this with #3052. We've now added a master thread for incorrect predictions and related reports – see the issue for more details.

lock · 2019-01-13T16:58:58Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the performance label Jan 12, 2018

ines added lang / en English language data and models feat / tagger Feature: Part-of-speech tagger feat / parser Feature: Dependency Parser perf / accuracy Performance: accuracy and removed performance labels Aug 15, 2018

ines closed this as completed Dec 14, 2018

lock bot locked as resolved and limited conversation to collaborators Jan 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

noun chunks are not consistently extracted #1818

noun chunks are not consistently extracted #1818

pengyu commented Jan 9, 2018

ines commented Dec 14, 2018

lock bot commented Jan 13, 2019

noun chunks are not consistently extracted #1818

noun chunks are not consistently extracted #1818

Comments

pengyu commented Jan 9, 2018

ines commented Dec 14, 2018

lock bot commented Jan 13, 2019