# KRNNT 

KRNNT is a morphological tagger for Polish based on **recurrent naural networks** that obtained the best lemmatization accuracy in the [PolEval 2017 competition](http://poleval.pl/). Unlike Morfeusz, it uses information about the context of a word while computing its lemma. 

### How to run the following notebook?

You may install KRNNT locally. The instructions are given on [this Github page](https://github.com/kwrobel-nlp/krnnt).

Alternatively, you may use [this Dockerfile](../docker/krnnt). 

How? Just navigate to the home directory of the `nlp_workshop` repository and run in your terminal:

```make docker-build-krnnt
 make docker-run-krnnt```
 
 The above will initialize jupyter notebook kernel.

Note: Building the docker may take a while.

Now you are ready to run this notebook!

#### 1. Import krnnt and specify settings

In [1]:
from krnnt.keras_models import BEST
from krnnt.pipeline import KRNNTSingle

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
pref_args = {'keras_batch_size': 32, 
             'internal_neurons': 256, 
             'feature_name': 'tags4e3', 
             'label_name': 'label',
             'keras_model_class': BEST,
             'output_format': 'plain', 
             'weight_path': '../data/reana/weights_reana.hdf5', 
             'lemmatisation_path': '../data/reana/lemmatisation_reana.pkl', 
             'UniqueFeaturesValues': '../data/reana/dictionary_reana.pkl'}

In [3]:
krnnt = KRNNTSingle(pref_args)

  consume_less='gpu', dropout_W=0.0, dropout_U=0.5), input_shape=(None, features_length))(inputs)
  consume_less='gpu', dropout_W=0.0, dropout_U=0.5), input_shape=(None, features_length))(x)
  self.model = Model(input=inputs, output=x)


#### 2. Feed it with a list of sentences

In [8]:
simple_example = 'Ala ma kota.\nAlo, czy to Twój kot?'.split('\n')

In [11]:
results = krnnt.tag_sentences(simple_example) 
results[0] # first sentence

[{'lemmas': ['Ala'],
  'prob': 0.9973284,
  'sep': 'none',
  'tag': 'subst:sg:nom:f',
  'token': 'Ala'},
 {'lemmas': ['mieć'],
  'prob': 0.99783665,
  'sep': 'space',
  'tag': 'fin:sg:ter:imperf',
  'token': 'ma'},
 {'lemmas': ['kot'],
  'prob': 0.9716419,
  'sep': 'space',
  'tag': 'subst:sg:acc:m2',
  'token': 'kota'},
 {'lemmas': ['.'],
  'prob': 0.9999964,
  'sep': 'none',
  'tag': 'interp',
  'token': '.'}]

In [5]:
[x['lemmas'] for y in [0, 1] for x in results[y]]

[['Ala'],
 ['mieć'],
 ['kot'],
 ['.'],
 ['Alo'],
 [','],
 ['czy'],
 ['to'],
 ['twój'],
 ['kot'],
 ['?']]

Note that even in such a simple example lemmatizations makes mistakes

#### 3. Experiment with more examples

In [None]:
# lemmatize sth yourself!