POS taggers comparison

I compared three taggers:  
Pymorhy2  
TreeTagger  
Mystem3  


Our text is some short description of Medieval Boyar Duma taken from the Wikipedia (I also lowercased it because pymorphy interprets lowercased and uppercased words differently).

In [1]:
text = 'В состав думы Московского государства входили только бояре в древнем значении этого слова, то есть свободные землевладельцы. Затем с превращением их в служилых людей возникло разделение на бояр вообще и бояр служилых в точном смысле. Высший класс служилых именуется «боярами введёнными», то есть введёнными во дворец для постоянной помощи великому князю в делах управления.'.lower()

Let's start with the pymorphy2. 

In [2]:
import pymorphy2
morph = pymorphy2.MorphAnalyzer()

In [3]:
listoftags = []
for i in text.split():
    listoftags.append(morph.parse(i))

In [4]:
#Just some first several elements
listoftags[1:4]

[[Parse(word='состав', tag=OpencorporaTag('NOUN,inan,masc sing,accs'), normal_form='состав', score=0.794871, methods_stack=((<DictionaryAnalyzer>, 'состав', 33, 3),)),
  Parse(word='состав', tag=OpencorporaTag('NOUN,inan,masc sing,nomn'), normal_form='состав', score=0.205128, methods_stack=((<DictionaryAnalyzer>, 'состав', 33, 0),))],
 [Parse(word='думы', tag=OpencorporaTag('NOUN,inan,femn sing,gent'), normal_form='дума', score=0.894736, methods_stack=((<DictionaryAnalyzer>, 'думы', 55, 1),)),
  Parse(word='думы', tag=OpencorporaTag('NOUN,inan,femn plur,nomn'), normal_form='дума', score=0.052631, methods_stack=((<DictionaryAnalyzer>, 'думы', 55, 7),)),
  Parse(word='думы', tag=OpencorporaTag('NOUN,inan,femn plur,accs'), normal_form='дума', score=0.052631, methods_stack=((<DictionaryAnalyzer>, 'думы', 55, 10),))],
 [Parse(word='московского', tag=OpencorporaTag('ADJF neut,sing,gent'), normal_form='московский', score=0.333333, methods_stack=((<DictionaryAnalyzer>, 'московского', 16, 15),)

So, looking at the first three elements of our tags list, we can conclude that pymorhy does not like being concrete and prefers being pretty ambiguous: it gives the word "состав" two different analyses with different cases and "думы" - three. Also the cool  thing about pymorhy2 is that it gives the scores (possibilities) of the given interpretation in the context. If it finds possibility of the interpretation equal to 1, it provides only this interpretation.  
Pretty interesting is interpretation of 'в' as a noun, though.

Let's consider TreeTagger.

In [5]:
import pprint
import treetaggerwrapper
tagger = treetaggerwrapper.TreeTagger(TAGLANG='ru')

In [6]:
tags = tagger.tag_text(text)
tags2 = treetaggerwrapper.make_tags(tags)

In [7]:
pprint.pprint(tags2)

[Tag(word='в', pos='Sp-a', lemma='в'),
 Tag(word='состав', pos='Ncmsan', lemma='состав'),
 Tag(word='думы', pos='Ncfsgn', lemma='дума'),
 Tag(word='московского', pos='Afpnsgf', lemma='московский'),
 Tag(word='государства', pos='Ncnsgn', lemma='государство'),
 Tag(word='входили', pos='Vmis-p-a-e', lemma='входить'),
 Tag(word='только', pos='Q', lemma='только'),
 Tag(word='бояре', pos='Ncmpny', lemma='боярин'),
 Tag(word='в', pos='Sp-l', lemma='в'),
 Tag(word='древнем', pos='Afpnslf', lemma='древний'),
 Tag(word='значении', pos='Ncnsln', lemma='значение'),
 Tag(word='этого', pos='P--nsga', lemma='этот'),
 Tag(word='слова', pos='Ncnsgn', lemma='слово'),
 Tag(word=',', pos=',', lemma=','),
 Tag(word='то', pos='P--nsnn', lemma='то'),
 Tag(word='есть', pos='Vmip3s-a-e', lemma='быть'),
 Tag(word='свободные', pos='Afpmpnf', lemma='свободный'),
 Tag(word='землевладельцы', pos='Ncmpny', lemma='землевладелец'),
 Tag(word='.', pos='SENT', lemma='.'),
 Tag(word='затем', pos='R', lemma='затем'),
 Tag

As we can see, TreeTagger gives only one interpretation, but it matches the high score results from pymorphy2 (Noun/masculine/singular/accusative for 'состав' and Noun/feminine/singular/genitive).

Let's look at mystem.

In [8]:
from pymystem3 import Mystem
import json
m = Mystem()

In [9]:
tags3 = json.dumps(m.analyze(text), ensure_ascii=False)

In [10]:
print (tags3)

[{"analysis": [{"gr": "PR=", "lex": "в"}], "text": "в"}, {"text": " "}, {"analysis": [{"gr": "S,муж,неод=(вин,ед|им,ед)", "lex": "состав"}], "text": "состав"}, {"text": " "}, {"analysis": [{"gr": "S,жен,неод=(вин,мн|род,ед|им,мн)", "lex": "дума"}], "text": "думы"}, {"text": " "}, {"analysis": [{"gr": "A=(вин,ед,полн,муж,од|род,ед,полн,муж|род,ед,полн,сред)", "lex": "московский"}], "text": "московского"}, {"text": " "}, {"analysis": [{"gr": "S,сред,неод=(вин,мн|род,ед|им,мн)", "lex": "государство"}], "text": "государства"}, {"text": " "}, {"analysis": [{"gr": "V,нп=прош,мн,изъяв,несов", "lex": "входить"}], "text": "входили"}, {"text": " "}, {"analysis": [{"gr": "PART=", "lex": "только"}], "text": "только"}, {"text": " "}, {"analysis": [{"gr": "S,муж,од=им,мн", "lex": "боярин"}], "text": "бояре"}, {"text": " "}, {"analysis": [{"gr": "PR=", "lex": "в"}], "text": "в"}, {"text": " "}, {"analysis": [{"gr": "A=(пр,ед,полн,муж|пр,ед,полн,сред)", "lex": "древний"}], "text": "древнем"}, {"text":

Here we can see that mystem also gives several interpretations as pymorphy but without explicit probabilities which is not that useful. Also mystem interprets punctuation and spaces: two previous taggers ignore them automatically. Moreover, this tagger looks like one of the most unfriendly to the user.

I also looked over the results and tried to find differences in tagging (I took only interesting cases of differences).  
Well, the first major difference comes with the word <b>'этого'</b> (demonstrative pronoun): for pymorphy it is NPRO (noun-pronoun) (with 0.97 score), treegagger considers it just a pronoun with any further classification and for mystem it is APRO (adjective-pronoun). If I am not mistaken, mystem gives here the most correct interpretation, but this can be due the difference in terminology.  
The next word <b>'слова'</b> was tagged correctly by treetagger (1 correct interpretation as noun/genitive/singular) and pymystem (1 correct and 2 wrong); pymorphy somehow failed to give its variants at all.  
Then goes <b>'то есть'</b> - compound сonjunction. Surely, these three taggers could not interpret these two words as one, sadly. But their interpretations of <b>'то'</b> and <b>'есть'</b> are pretty decent.  
Also we get into trouble with the word <b>'затем'</b>: pymorphy and mystem give correct interpretation but treetaggers considers it R. I looked up the doсumentation, but did not find such a tag there (mystery!). 
<b>'С'</b> - not troubles at all, but there are extra interpretations by pymorphy: <b>'с'</b> can be the short form of <b>'секунд'</b>. Interesting! 
<b>'Превращением'</b> - here we can see difference in terminology (ablative vs. instrumental).  
<b>'Их'</b> - only treetagger gives unambiguous correct interpretation, other taggers give variants.
<b>'Служилых'</b>: pymorphy gives 3 equal score interpretations (so does mystem), treetagger gives one correct interpretation.
<b>'Бояр'</b>: treetagger and mystem give correct interpretation (+ lemma <b>'боярин'</b>), but pymorphy provides 3 equal score interpretations, one of them with lemma <b>'бояр'</b>, not <b>'боярин'</b>.
<b>'Введёнными'</b>: pymorphy and treetagger fail completely, mystem is a champion! It provides a correct interpretation as participle of <b>'вводить'</b>.  
'Для': all taggers provide correct interpretation as preposition, but pymorphy provides a wonderful variant: gerund of <b>'длить'</b>.


In terms of speed, pymorphy is the fastest and treetagger is the slowest tagger. In general I can tell that my sympathies are with TreeTagger for being short and accurate, but mystem looks like the most accurate tagger of these three.