DEMorphy

DEMorphy is a morphological analyzer for German language. DEMorphy provides gender, person, singular/plural etc. full inflection information as well as word lemma.

source code and usage docs: Github
companion German morphological dictionaries Github

Installation

OS X & Linux, directly from Github:

$ git clone https://github.com/DuyguA/DEMorphy
$ cd DEMorphy

Download the dictionary file demorphy/data/words.dg and replace it under the corresponding directory again. Then you're ready to launch the setup script:

$ python setup.py install

Usage

Basic usage:

$ python

>>> from demorphy import Analyzer
>>> analyzer = Analyzer(char_subs_allowed=True)
>>> s = analyzer.analyze(u"gegangen")
>>> for anlyss in s:
        print anlyss
{'CATEGORY': u'ADJ', 'PTB_TAG': u'JJ', 'STTS_TAG': u'ADJD', 'ADDITIONAL_ATTRIBUTES': u'<adv>', 'DEGREE': u'pos', 'LEMMA': u'gegangen'}
{'CATEGORY': u'ADJ', 'PTB_TAG': u'JJ', 'STTS_TAG': u'ADJD', 'ADDITIONAL_ATTRIBUTES': u'<pred>', 'DEGREE': u'pos', 'LEMMA': u'gegangen'}
{'CATEGORY': u'V', 'LEMMA': u'gehen', 'STTS_TAG': u'V', 'TENSE': u'ppast', 'PTB_TAG': u'V'}

Usage with cache decorators:

>>> from demorphy import Analyzer
>>> from demorphy.cache import memoize, lrudecorator
>>> analyzer = Analyzer(char_subs_allowed=True)
>>> cache_size = 200 #you can arrange the size or unlimited cache. For German lang, we recommed 200 as cache size.
>>> cached = memoize if cache_size=="unlim" else (lrudecorator(cache_size) if cache_size else (lambda x: x))
>>> analyze = cached(analyzer.analyze)
>>> s = analyze(u"gegangen")
>>> for anlyss in s:
        print anlyss
{'CATEGORY': u'ADJ', 'PTB_TAG': u'JJ', 'STTS_TAG': u'ADJD', 'ADDITIONAL_ATTRIBUTES': u'<adv>', 'DEGREE': u'pos', 'LEMMA': u'gegangen'}
{'CATEGORY': u'ADJ', 'PTB_TAG': u'JJ', 'STTS_TAG': u'ADJD', 'ADDITIONAL_ATTRIBUTES': u'<pred>', 'DEGREE': u'pos', 'LEMMA': u'gegangen'}
{'CATEGORY': u'V', 'LEMMA': u'gehen', 'STTS_TAG': u'V', 'TENSE': u'ppast', 'PTB_TAG': u'V'}

Iterating over all the lexicon:

>>> from demorphy import Analyzer
>>> analyzer = Analyzer(char_subs_allowed=True)
>>> ix = analyzer.iter_lexicon_formatted()
>>> for i in ix:
        print(i)

One can iterate over the lexicon words with a given prefix. Following code will iterate over all the words that begins with "ge":

>>> ix = analyzer.iter_lexicon_formatted(prefix=u"ge")
>>> for i in ix:
        print(i)

Citing

Altinok, D.: DEMorphy, German Language Analyzer
Berlin, 2018

Links:

Authors

Duygu Altinok

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
benchmarks		benchmarks
demorphy		demorphy
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEMorphy

Installation

Usage

Citing

Authors

License

About

Releases

Packages

Contributors 2

Languages

License

DuyguA/DEMorphy

Folders and files

Latest commit

History

Repository files navigation

DEMorphy

Installation

Usage

Citing

Authors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages