French model is confused by "tu" #2251

randomstuff · 2018-04-23T07:44:16Z

Spacy (2.0.11) seems to be confused by the word "tu" in French:

import spacy
from IPython.core.display import display, HTML
from spacy import displacy
from jinja2 import Environment, PackageLoader, DictLoader, FileSystemLoader, select_autoescape

def show(doc):
    return display(HTML(displacy.render(doc, style='dep')))

loader = DictLoader({
    "words.html": """
        <table>
            <thead>
                <tr>
                    <th>Texte</th>
                    <th>Tag</th>
                    <th>Lemma</th>
                </tr>
            </thead>
            <tbody>
                {% for word in words %}
                <tr>
                    <td>{{ word.text   | escape }}</td>
                    <td>{{ word.tag_   | escape }}</td>
                    <td>{{ word.lemma_ | escape }}</td>
                </tr>
                {% endfor %}
            </tbody>
        </table>
    """
})

env = Environment(
    loader=loader,
    autoescape=select_autoescape(['html', 'xml']))

def word_analyze(doc): 
    html = env.get_template('words.html').render(words=doc)
    return HTML(html)

nlp = spacy.load("fr_core_news_md")

Texte	Tag	Lemma
Je	PRON__Number=Sing\|Person=1	Je
vais	VERB__Mood=Ind\|Number=Sing\|Person=1\|Tense=Pres\|VerbForm=Fin	aller
bien	ADV___	bien
.	PUNCT___	.

Texte	Tag	Lemma
Tu	AUX__Mood=Ind\|Number=Sing\|Person=3\|Tense=Pres\|VerbForm=Fin	Tu
vas	AUX__Tense=Past\|VerbForm=Part	aller
bien	ADV___	bien
.	PUNCT___	.

Texte	Tag	Lemma
Comment	ADV__PronType=Int	Comment
vas	VERB__Mood=Ind\|Number=Sing\|Person=1\|Tense=Pres\|VerbForm=Fin	aller
-	PUNCT___	-
tu	AUX__Gender=Masc\|Number=Sing\|Tense=Past\|VerbForm=Part	taire
?	PUNCT___	?

The correct lemma for "tu" should be "PRON__Number=Sing|Person=2" in both cases ans the lemma should be "tu"/"Tu".

The text was updated successfully, but these errors were encountered:

GaneshBaronAloir · 2018-06-26T18:22:54Z

I have the same issue. Did you find a solution?

ines · 2018-07-02T14:05:29Z

@randomstuff @mrsaboteur Thanks for the reports – we're actually just working on improving
the model tests, so we can run the new v2.1 models against examples like this.

I just created simplified test cases from the two examples above. Are those correct (do they describe the correct, intended behaviour), or did I miss something here?

doc = nlp("Tu vas bien.")
assert doc[0].tag_ == 'PRON__Number=Sing|Person=2'

doc = nlp("Comment vas-tu?")
assert doc[3].tag_ == 'PRON__Number=Sing|Person=2'
assert doc[3].lemma_ == 'tu'

randomstuff · 2018-07-02T14:38:11Z

@ines, yes these tags for "tu" are correct.

GaneshBaronAloir · 2018-07-03T13:24:00Z

@ines I confirm the tags are correct. Thank you for looking into it.

ines · 2018-12-14T11:25:50Z

Merging this with #3052. We've now added a master thread for incorrect predictions and related reports – see the issue for more details.

lock · 2019-01-13T16:59:12Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added performance models Issues related to the statistical models lang / fr French language data and models labels Apr 28, 2018

ines added feat / tagger Feature: Part-of-speech tagger feat / lemmatizer Feature: Rule-based and lookup lemmatization labels Jul 2, 2018

ines mentioned this issue Aug 14, 2018

💫 Improve rule-based lemmatization and replace lookups #2668

Closed

ines added perf / accuracy Performance: accuracy and removed performance labels Aug 15, 2018

ines closed this as completed Dec 14, 2018

lock bot locked as resolved and limited conversation to collaborators Jan 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

French model is confused by "tu" #2251

French model is confused by "tu" #2251

randomstuff commented Apr 23, 2018

GaneshBaronAloir commented Jun 26, 2018

ines commented Jul 2, 2018

randomstuff commented Jul 2, 2018

GaneshBaronAloir commented Jul 3, 2018

ines commented Dec 14, 2018

lock bot commented Jan 13, 2019

French model is confused by "tu" #2251

French model is confused by "tu" #2251

Comments

randomstuff commented Apr 23, 2018

GaneshBaronAloir commented Jun 26, 2018

ines commented Jul 2, 2018

randomstuff commented Jul 2, 2018

GaneshBaronAloir commented Jul 3, 2018

ines commented Dec 14, 2018

lock bot commented Jan 13, 2019