Issues with SLP model and POS tagging (pattern.en) #182

markus-beuckelmann · 2017-06-13T21:01:41Z

There are currently some issues with POS misclassification in pattern.en on Python 2.7 which cause multiple tests in test_en.py and test_text.py to fail. Let's take a look at test_tag() for instance (see Travis log), which looks like this:

# Assert [("black", "JJ"), ("cats", "NNS")].
v = en.tag("black cats")
self.assertEqual(v, [("black", "JJ"), ("cats", "NNS")])

The test fails because 'black' get classified as JJS (adjective, superlative) instead of JJ (adjective).

Here is what happens: When we call en.tag() it gets passed down to en.parse() which will then be handled by parse() in text/__init__.py (source) which in turn calls find_tags() (source). Inside find_tags() the word gets looked up in the lexicon (here) which assigns the correct (!) label JJ. Then, this label is overruled (here) by the model (because 'black' is listed in model.unkown) and classify() (source) returns the wrong label 'JJS'.

There are many similar examples that you can look at: test_parse (see e.g. misclassification for 'sat'), test_find_tags, test_tagged_string, test_word, test_document.

Sure, the SLP model is a statistical model and consequently is allowed to be wrong in some cases, but what bothers me is that it apparently used to work some time ago. Sentences of the form "The black cat sat on..." are scattered everywhere across unit tests that I can't believe that the model got that wrong all the time.

I just can't find the cause for this change. @tom-de-smedt, what am I missing?

The text was updated successfully, but these errors were encountered:

markus-beuckelmann · 2017-07-28T15:39:17Z

I finally narrowed down the cause for this problem. Looks like this line in pattern/text/__init__.py introduced in dc85534 is responsible for the problems mentioned above.

piyush0609 · 2017-07-30T21:28:17Z

@markus-beuckelmann I would be glad to work on the issue and try to resolve it, if you are not already working it.

markus-beuckelmann · 2017-08-01T16:51:56Z

@piyush0609, it is already fixed as of 93235fe. It's great that you want to contribute though, keep an eye out for issues tagged with the "help" label.

markus-beuckelmann added the bug label Jun 13, 2017

Xsardas1000 closed this as completed in 93235fe Jul 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with SLP model and POS tagging (pattern.en) #182

Issues with SLP model and POS tagging (pattern.en) #182

markus-beuckelmann commented Jun 13, 2017

markus-beuckelmann commented Jul 28, 2017

piyush0609 commented Jul 30, 2017

markus-beuckelmann commented Aug 1, 2017

Issues with SLP model and POS tagging (pattern.en) #182

Issues with SLP model and POS tagging (pattern.en) #182

Comments

markus-beuckelmann commented Jun 13, 2017

markus-beuckelmann commented Jul 28, 2017

piyush0609 commented Jul 30, 2017

markus-beuckelmann commented Aug 1, 2017