Skip to content
No description, website, or topics provided.
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.
config Add files via upload Jun 3, 2019
graph_outputs Add files via upload Jun 14, 2019
output added final outputs Jun 17, 2019
resources Add files via upload Jun 3, 2019
src Delete cnn_rnn_with_context.cpython-36.pyc Jun 14, 2019 Update Jun 14, 2019
requirements.txt Bump tensorflow from 1.10.0 to 1.15.0 Dec 16, 2019

Multi-Task Deep Morph Analyzer made-with-python

A multi-task learning CNN-RNN model combined together with the potential of task-optimized phonetic features to predict the Lemma, POS category, Gender, Number, Person, Case, and Tense-aspect-mood (TAM) of Hindi words.




Getting started

Clone the repository

git clone
cd morph_analyzer

Provide the arguments

The file takes the following command-line arguments:

Argument Values Required Specification
lang hindi, urdu Yes Language
mode train, test and predict (i.e., no gold labels required) Yes Training, testing and predictions.
phonetic True/1/yes/y/t and False/0/no/n/f No (default=False) Use MOO-driven phonological features or not.
freezing " " and " " No (default=False) Use progressive freezing for training or not (see FreezeOut).

train and test modes operate upon the standard train-test split specified by the HDTB and UDTB datasets (see datasets README while predict uses the text provided manually in src/[lang]_predict_data/.

Sample run commands:


>>> python --lang urdu --mode train --phonetic true --freezing true #train


>>> python --lang urdu --mode test --phonetic true --freezing true #test


>>> python --lang urdu --mode predict --phonetic true --freezing true #predict

For prediction, the plain text should be provided within src/[lang]_predict_data/test_data.txt.


For the test mode:

  • the predicted roots and features as well as their gold-labelled counterparts are written to separate files within output/[lang]/roots.txt, feature_0.txt, ..., feature_6.txt.
  • Micro-averaged precision-recall graphs are stored in graph_outputs/[lang]/.

For the predict mode, all the predictions (i.e., roots + features) are written to: output/[lang]/predictions.txt.

Graph outputs

Micro-averaged precision-recall cuves for each class arranged by increasing F1 scores:



If this repo was helpful in your research, consider citing our work:

  title={Multi Task Deep Morphological Analyzer: Context Aware Joint Morphological Tagging and Lemma Prediction},
  author={Jha, Saurav and Sudhakar, Akhilesh and Singh, Anil Kumar},
  journal={arXiv preprint arXiv:1811.08619},
You can’t perform that action at this time.