You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are currently some issues with POS misclassification in pattern.en on Python 2.7 which cause multiple tests in test_en.py and test_text.py to fail. Let's take a look at test_tag() for instance (see Travis log), which looks like this:
The test fails because 'black' get classified as JJS (adjective, superlative) instead of JJ (adjective).
Here is what happens: When we call en.tag() it gets passed down to en.parse() which will then be handled by parse() in text/__init__.py (source) which in turn calls find_tags() (source). Inside find_tags() the word gets looked up in the lexicon (here) which assigns the correct (!) label JJ. Then, this label is overruled (here) by the model (because 'black' is listed in model.unkown) and classify() (source) returns the wrong label 'JJS'.
Sure, the SLP model is a statistical model and consequently is allowed to be wrong in some cases, but what bothers me is that it apparently used to work some time ago. Sentences of the form "The black cat sat on..." are scattered everywhere across unit tests that I can't believe that the model got that wrong all the time.
I just can't find the cause for this change. @tom-de-smedt, what am I missing?
The text was updated successfully, but these errors were encountered:
I finally narrowed down the cause for this problem. Looks like this line in pattern/text/__init__.py introduced in dc85534 is responsible for the problems mentioned above.
@piyush0609, it is already fixed as of 93235fe. It's great that you want to contribute though, keep an eye out for issues tagged with the "help" label.
There are currently some issues with POS misclassification in
pattern.en
on Python 2.7 which cause multiple tests intest_en.py
andtest_text.py
to fail. Let's take a look attest_tag()
for instance (see Travis log), which looks like this:The test fails because 'black' get classified as
JJS
(adjective, superlative) instead ofJJ
(adjective).Here is what happens: When we call
en.tag()
it gets passed down toen.parse()
which will then be handled byparse()
intext/__init__.py
(source) which in turn callsfind_tags()
(source). Insidefind_tags()
the word gets looked up in the lexicon (here) which assigns the correct (!) labelJJ
. Then, this label is overruled (here) by the model (because 'black' is listed inmodel.unkown
) andclassify()
(source) returns the wrong label 'JJS'.There are many similar examples that you can look at:
test_parse
(see e.g. misclassification for 'sat'),test_find_tags
,test_tagged_string
,test_word
,test_document
.Sure, the SLP model is a statistical model and consequently is allowed to be wrong in some cases, but what bothers me is that it apparently used to work some time ago. Sentences of the form "The black cat sat on..." are scattered everywhere across unit tests that I can't believe that the model got that wrong all the time.
I just can't find the cause for this change. @tom-de-smedt, what am I missing?
The text was updated successfully, but these errors were encountered: