IWNLP-py

IWNLP-py is a Python port of IWNLP.Lemmatizer. IWNLP-py offers a lemmatization of German words based on the German Wiktionary which is processed by IWNLP.

How to setup IWNLP-py

Use pip to install iwnlp

pip install iwnlp

Download the latest processed IWNLP dump from https://dbs.cs.uni-duesseldorf.de/datasets/iwnlp/IWNLP.Lemmatizer_20181001.zip and unzip it.

How to use IWNLP-py

The Python package consists of the IWNLPWrapper class. Keep in mind that the lemmatizer will return None for unknown words rather than guessing a lemma. If more than one lemma is found, all lemmas are returned. In order to lemmatize single words, you can choose between two functions:

lemmatize: If you have access to POS tags of your words, you should use this function. The POS tagset is Google's universal POS tagset. The lemmatization performance is tuned to be as high as possible, as listed here. Our paper describes our approach in more detail. Keep in mind, that our results have improved a lot over the last two years.

def lemmatize(self, word, pos_universal_google)

Usage:

from iwnlp.iwnlp_wrapper import IWNLPWrapper
lemmatizer = IWNLPWrapper(lemmatizer_path='data/IWNLP.Lemmatizer_20181001.json')
lemmatizer.lemmatize('Lkws', pos_universal_google='NOUN')
# ['Lkw']
lemmatizer.lemmatize('Onlineauftritten', pos_universal_google='NOUN')
# ['Onlineauftritt']
lemmatizer.lemmatize('gespielt', pos_universal_google='VERB')
# ['spielen']

lemmatize: If you don't have access to POS tags or don't want to use them you can simply pass the word without any POS tag and retrieve any lemma that is present in IWNLP. You may also specify if you want the lookup to be case sensitive, which it is by default.

def lemmatize_plain(self, word, ignore_case=False):

Usage:

from iwnlp.iwnlp_wrapper import IWNLPWrapper
lemmatizer = IWNLPWrapper(lemmatizer_path='data/IWNLP.Lemmatizer_20181001.json')
lemmatizer.lemmatize_plain('birne')
# no result since the noun is lowercased
lemmatizer.lemmatize_plain('birne', ignore_case=True)
# ['Birne']
lemmatizer.lemmatize_plain('zerstreut', ignore_case=True)
# ['zerstreut', 'zerstreuen']

Citation

Please include the following BibTeX if you use IWNLP in your work:

@InProceedings{liebeck-conrad:2015:ACL-IJCNLP,
  author    = {Liebeck, Matthias  and  Conrad, Stefan},
  title     = {{IWNLP: Inverse Wiktionary for Natural Language Processing}},
  booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
  year      = {2015},
  publisher = {Association for Computational Linguistics},
  pages     = {414--418},
  url       = {http://www.aclweb.org/anthology/P15-2068}
}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.travis		.travis
iwnlp		iwnlp
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.travis

.travis

iwnlp

iwnlp

tests

tests

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE.txt

LICENSE.txt

README.md

README.md

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

IWNLP-py

How to setup IWNLP-py

How to use IWNLP-py

Citation

About

Releases

Packages

Contributors 3

Languages

License

Liebeck/IWNLP-py

Folders and files

Latest commit

History

Repository files navigation

IWNLP-py

How to setup IWNLP-py

How to use IWNLP-py

Citation

About

Resources

License

Stars

Watchers

Forks

Languages