# PyMorphy2: the most popular library for morphological parsing of Russian

 **The morphological parsing** is a part of **natural language processing (NLP)**, which includes defining and recognising of the morphological properties of a given word (or token). Morphological analysis is based on the information containing in the word form, the information about the surrounding words is not used here. It includes also **lemmatization** which is a process of bringing a word to the normal form - the one presented in the dictionary.
 
 I am going to observe PyMorphy2 - a library that allows to do that all for Russian language.

## PyMorphy2

This is how the library is installed and imported

In [1]:
!pip install pymorphy2



In [2]:
import pymorphy2

For morphological analysis a special class is used - *MorphAnalyzer*

Here we create an object *morph* which is our instrument able to inform us all the necessary information about word properties

Russian language is set by default.

In [3]:
morph = pymorphy2.MorphAnalyzer()
type(morph)

pymorphy2.analyzer.MorphAnalyzer

Method **.parse()** is used to create a list of all possible morphological analyses of the word

In [4]:
steklo_all = morph.parse('стекло')
print(steklo_all, '\n')
print(type(steklo_all))
print(type(steklo_all[0]))

[Parse(word='стекло', tag=OpencorporaTag('NOUN,inan,neut sing,nomn'), normal_form='стекло', score=0.690476, methods_stack=((DictionaryAnalyzer(), 'стекло', 157, 0),)), Parse(word='стекло', tag=OpencorporaTag('NOUN,inan,neut sing,accs'), normal_form='стекло', score=0.285714, methods_stack=((DictionaryAnalyzer(), 'стекло', 157, 3),)), Parse(word='стекло', tag=OpencorporaTag('VERB,perf,intr neut,sing,past,indc'), normal_form='стечь', score=0.023809, methods_stack=((DictionaryAnalyzer(), 'стекло', 1015, 3),))] 

<class 'list'>
<class 'pymorphy2.analyzer.Parse'>


Each element of this list has a **tag** - a set of grammemes, the grammatical properties of the word. For example, here are the possible grammeme sets of the ambiguous word "стекло" as a noun and as a verb:

In [5]:
steklo_n = steklo_all[0]
print(steklo_n.tag)
steklo_v = steklo_all[2]
print(steklo_v.tag)

NOUN,inan,neut sing,nomn
VERB,perf,intr neut,sing,past,indc


Here are some attributes to get a particular grammeme:
* **.POS** (Part Of Speech)
* **.case**
* **.number** (There can be a special mark for Pluralia tantum and Singularia tantum)
* **.gender** (There can be a special mark for "common" gender - *Ms-f* and for noun without expressed gender - *GNdr*)
* **.tense**
* **.aspect**

In [6]:
print(steklo_v.tag.aspect, '\n', steklo_n.tag.POS,'\n', steklo_v.tag.tense)

perf 
 NOUN 
 past


In [7]:
morph.parse('дрова')[0].tag

OpencorporaTag('NOUN,inan,GNdr,Pltm plur,nomn')

In [8]:
morph.parse('сирота')[0].tag

OpencorporaTag('NOUN,anim,ms-f sing,nomn')

In [9]:
morph.parse('свет')[0].tag

OpencorporaTag('NOUN,inan,masc,Sgtm sing,nomn')

The similar attributes are used to find out person, transitivity, animacy, mood, voice.

It is possible to check if the word has a particular grammeme by mean of **in** operator:

In [10]:
'Sgtm' in morph.parse('свет')[0].tag

True

The attributes **.normal_form** and **.normalized** return the lemma of a word as a string or parse object

In [11]:
steklo_v.normal_form

'стечь'

In [12]:
steklo_v.normalized

Parse(word='стечь', tag=OpencorporaTag('INFN,perf,intr'), normal_form='стечь', score=1.0, methods_stack=((DictionaryAnalyzer(), 'стечь', 1015, 0),))

If a word is unknown to the parser, it tries to predict which grammemes does it have

In [13]:
for i in morph.parse('кудрячит'):
    print(i.tag, i.normal_form)

VERB,impf,intr sing,3per,pres,indc кудрячать
VERB,perf,tran sing,3per,futr,indc кудрячить
VERB,impf,tran sing,3per,pres,indc кудрячить
VERB,perf,tran sing,3per,futr,indc кудрячить


The **.inflect()** method allows to put a lexeme in a particular word form. NB: it must be applied to a lemmatized parse object

In [14]:
cloud = morph.parse('облаками')[0].normalized
cloud.inflect({'loct'})

Parse(word='облаке', tag=OpencorporaTag('NOUN,inan,neut sing,loct'), normal_form='облако', score=1.0, methods_stack=((DictionaryAnalyzer(), 'облаке', 2242, 5),))

And the **.lexeme** attribute returnes the whole set of word forms of the lexeme

In [15]:
cloud.lexeme

[Parse(word='облако', tag=OpencorporaTag('NOUN,inan,neut sing,nomn'), normal_form='облако', score=1.0, methods_stack=((DictionaryAnalyzer(), 'облако', 2242, 0),)),
 Parse(word='облака', tag=OpencorporaTag('NOUN,inan,neut sing,gent'), normal_form='облако', score=1.0, methods_stack=((DictionaryAnalyzer(), 'облака', 2242, 1),)),
 Parse(word='облаку', tag=OpencorporaTag('NOUN,inan,neut sing,datv'), normal_form='облако', score=1.0, methods_stack=((DictionaryAnalyzer(), 'облаку', 2242, 2),)),
 Parse(word='облако', tag=OpencorporaTag('NOUN,inan,neut sing,accs'), normal_form='облако', score=1.0, methods_stack=((DictionaryAnalyzer(), 'облако', 2242, 3),)),
 Parse(word='облаком', tag=OpencorporaTag('NOUN,inan,neut sing,ablt'), normal_form='облако', score=1.0, methods_stack=((DictionaryAnalyzer(), 'облаком', 2242, 4),)),
 Parse(word='облаке', tag=OpencorporaTag('NOUN,inan,neut sing,loct'), normal_form='облако', score=1.0, methods_stack=((DictionaryAnalyzer(), 'облаке', 2242, 5),)),
 Parse(word='о