Noun chunking inconsistency #2053

johnnyleitrim · 2018-03-03T10:08:26Z

Doc.noun_chunks()is sometimes handing me back shorter noun chunks than I would have expected.

For example, in the code below, kobe bryant is sometimes split across noun chunks:

Example

import en_core_web_lg

nlp = en_core_web_lg.load()
sent = u'''Add kobe bryant shoes to my cart'''.strip()
tokens = nlp(sent)
print(sent)
for chunk in tokens.noun_chunks:
    print("\t{}".format(chunk.orth_))

sent = u'''Add kobe beef and kobe bryant shoes to my cart'''.strip()
tokens = nlp(sent)
print(sent)
for chunk in tokens.noun_chunks:
    print("\t{}".format(chunk.orth_))

Output

Add kobe bryant shoes to my cart
    kobe bryant shoes
    my cart
Add kobe beef and kobe bryant shoes to my cart
    kobe beef
    kobe
    bryant shoes
    my cart

In the second example, I would not have expected kobe and bryant shoes to be split across noun chunks, given that they were not split in the first example.

Environment

spaCy version: 2.0.5
Python version: 3.5.1
Models: en_core_web_lg
Platform: Darwin-17.4.0-x86_64-i386-64bit

The text was updated successfully, but these errors were encountered:

ines · 2018-12-14T11:27:40Z

The noun chunks depend on the part-of-speech tags and dependency parse, so this issue likely comes down to incorrect predictions made by the tagger or parser.

I'm merging this with #3052. We've now added a master thread for incorrect predictions and related reports – see the issue for more details.

lock · 2019-01-13T16:59:06Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the performance label Mar 27, 2018

ines added the lang / en English language data and models label Apr 29, 2018

ines added perf / accuracy Performance: accuracy and removed performance labels Aug 15, 2018

ines closed this as completed Dec 14, 2018

lock bot locked as resolved and limited conversation to collaborators Jan 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noun chunking inconsistency #2053

Noun chunking inconsistency #2053

johnnyleitrim commented Mar 3, 2018

ines commented Dec 14, 2018

lock bot commented Jan 13, 2019

Noun chunking inconsistency #2053

Noun chunking inconsistency #2053

Comments

johnnyleitrim commented Mar 3, 2018

Example

Output

Environment

ines commented Dec 14, 2018

lock bot commented Jan 13, 2019