Skip to content
No description, website, or topics provided.
Python
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
config Add files via upload Jun 3, 2019
datasets
graph_outputs Add files via upload Jun 14, 2019
output added final outputs Jun 17, 2019
resources Add files via upload Jun 3, 2019
src Delete cnn_rnn_with_context.cpython-36.pyc Jun 14, 2019
README.md Update README.md Jun 14, 2019
main.py
requirements.txt Bump tensorflow from 1.10.0 to 1.15.0 Dec 16, 2019

README.md

Multi-Task Deep Morph Analyzer made-with-python

A multi-task learning CNN-RNN model combined together with the potential of task-optimized phonetic features to predict the Lemma, POS category, Gender, Number, Person, Case, and Tense-aspect-mood (TAM) of Hindi words.

image

Framework

image2

Getting started

Clone the repository

git clone git@github.com:Saurav0074/morph_analyzer.git
cd morph_analyzer

Provide the arguments

The file main.py takes the following command-line arguments:

Argument Values Required Specification
lang hindi, urdu Yes Language
mode train, test and predict (i.e., no gold labels required) Yes Training, testing and predictions.
phonetic True/1/yes/y/t and False/0/no/n/f No (default=False) Use MOO-driven phonological features or not.
freezing " " and " " No (default=False) Use progressive freezing for training or not (see FreezeOut).

train and test modes operate upon the standard train-test split specified by the HDTB and UDTB datasets (see datasets README while predict uses the text provided manually in src/[lang]_predict_data/.

Sample run commands:

Training:

>>> python main.py --lang urdu --mode train --phonetic true --freezing true #train

Testing:

>>> python main.py --lang urdu --mode test --phonetic true --freezing true #test

Predicting:

>>> python main.py --lang urdu --mode predict --phonetic true --freezing true #predict

For prediction, the plain text should be provided within src/[lang]_predict_data/test_data.txt.

Outputs

For the test mode:

  • the predicted roots and features as well as their gold-labelled counterparts are written to separate files within output/[lang]/roots.txt, feature_0.txt, ..., feature_6.txt.
  • Micro-averaged precision-recall graphs are stored in graph_outputs/[lang]/.

For the predict mode, all the predictions (i.e., roots + features) are written to: output/[lang]/predictions.txt.

Graph outputs

Micro-averaged precision-recall cuves for each class arranged by increasing F1 scores:

pr-curves

Citation

If this repo was helpful in your research, consider citing our work:

@article{jha2018multi,
  title={Multi Task Deep Morphological Analyzer: Context Aware Joint Morphological Tagging and Lemma Prediction},
  author={Jha, Saurav and Sudhakar, Akhilesh and Singh, Anil Kumar},
  journal={arXiv preprint arXiv:1811.08619},
  year={2018}
}
You can’t perform that action at this time.