Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with SLP model and POS tagging (pattern.en) #182

Closed
markus-beuckelmann opened this issue Jun 13, 2017 · 3 comments
Closed

Issues with SLP model and POS tagging (pattern.en) #182

markus-beuckelmann opened this issue Jun 13, 2017 · 3 comments
Labels

Comments

@markus-beuckelmann
Copy link
Collaborator

There are currently some issues with POS misclassification in pattern.en on Python 2.7 which cause multiple tests in test_en.py and test_text.py to fail. Let's take a look at test_tag() for instance (see Travis log), which looks like this:

# Assert [("black", "JJ"), ("cats", "NNS")].
v = en.tag("black cats")
self.assertEqual(v, [("black", "JJ"), ("cats", "NNS")])

The test fails because 'black' get classified as JJS (adjective, superlative) instead of JJ (adjective).

Here is what happens: When we call en.tag() it gets passed down to en.parse() which will then be handled by parse() in text/__init__.py (source) which in turn calls find_tags() (source). Inside find_tags() the word gets looked up in the lexicon (here) which assigns the correct (!) label JJ. Then, this label is overruled (here) by the model (because 'black' is listed in model.unkown) and classify() (source) returns the wrong label 'JJS'.

There are many similar examples that you can look at: test_parse (see e.g. misclassification for 'sat'), test_find_tags, test_tagged_string, test_word, test_document.

Sure, the SLP model is a statistical model and consequently is allowed to be wrong in some cases, but what bothers me is that it apparently used to work some time ago. Sentences of the form "The black cat sat on..." are scattered everywhere across unit tests that I can't believe that the model got that wrong all the time.

I just can't find the cause for this change. @tom-de-smedt, what am I missing?

@markus-beuckelmann
Copy link
Collaborator Author

I finally narrowed down the cause for this problem. Looks like this line in pattern/text/__init__.py introduced in dc85534 is responsible for the problems mentioned above.

@piyush0609
Copy link

@markus-beuckelmann I would be glad to work on the issue and try to resolve it, if you are not already working it.

@markus-beuckelmann
Copy link
Collaborator Author

@piyush0609, it is already fixed as of 93235fe. It's great that you want to contribute though, keep an eye out for issues tagged with the "help" label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants