Skip to content
German Morphological Analyzer
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
benchmarks added benchmarking code Mar 2, 2018
demorphy Cache words so they are only loaded once Jan 8, 2019
tests tests Feb 26, 2018
.gitattributes
.gitignore added gitignore Feb 26, 2018
LICENSE.md
MANIFEST.in manifest file Feb 26, 2018
README.md
setup.cfg setup file Feb 26, 2018
setup.py fixed typo Feb 26, 2018

README.md

DEMorphy

DEMorphy is a morphological analyzer for German language. DEMorphy provides gender, person, singular/plural etc. full inflection information as well as word lemma.

  • source code and usage docs: Github
  • companion German morphological dictionaries Github

Installation

OS X & Linux, directly from Github:

$ git clone https://github.com/DuyguA/DEMorphy
$ cd DEMorphy

Download the dictionary file demorphy/data/words.dg and replace it under the corresponding directory again. Then you're ready to launch the setup script:

$ python setup.py install

Usage

Basic usage:

$ python
>>> from demorphy import Analyzer
>>> analyzer = Analyzer(char_subs_allowed=True)
>>> s = analyzer.analyze(u"gegangen")
>>> for anlyss in s:
        print anlyss
{'CATEGORY': u'ADJ', 'PTB_TAG': u'JJ', 'STTS_TAG': u'ADJD', 'ADDITIONAL_ATTRIBUTES': u'<adv>', 'DEGREE': u'pos', 'LEMMA': u'gegangen'}
{'CATEGORY': u'ADJ', 'PTB_TAG': u'JJ', 'STTS_TAG': u'ADJD', 'ADDITIONAL_ATTRIBUTES': u'<pred>', 'DEGREE': u'pos', 'LEMMA': u'gegangen'}
{'CATEGORY': u'V', 'LEMMA': u'gehen', 'STTS_TAG': u'V', 'TENSE': u'ppast', 'PTB_TAG': u'V'}

Usage with cache decorators:

>>> from demorphy import Analyzer
>>> from demorphy.cache import memoize, lrudecorator
>>> analyzer = Analyzer(char_subs_allowed=True)
>>> cache_size = 200 #you can arrange the size or unlimited cache. For German lang, we recommed 200 as cache size.
>>> cached = memoize if cache_size=="unlim" else (lrudecorator(cache_size) if cache_size else (lambda x: x))
>>> analyze = cached(analyzer.analyze)
>>> s = analyze(u"gegangen")
>>> for anlyss in s:
        print anlyss
{'CATEGORY': u'ADJ', 'PTB_TAG': u'JJ', 'STTS_TAG': u'ADJD', 'ADDITIONAL_ATTRIBUTES': u'<adv>', 'DEGREE': u'pos', 'LEMMA': u'gegangen'}
{'CATEGORY': u'ADJ', 'PTB_TAG': u'JJ', 'STTS_TAG': u'ADJD', 'ADDITIONAL_ATTRIBUTES': u'<pred>', 'DEGREE': u'pos', 'LEMMA': u'gegangen'}
{'CATEGORY': u'V', 'LEMMA': u'gehen', 'STTS_TAG': u'V', 'TENSE': u'ppast', 'PTB_TAG': u'V'}

Iterating over all the lexicon:

>>> from demorphy import Analyzer
>>> analyzer = Analyzer(char_subs_allowed=True)
>>> ix = analyzer.iter_lexicon_formatted()
>>> for i in ix:
        print(i)

One can iterate over the lexicon words with a given prefix. Following code will iterate over all the words that begins with "ge":

>>> ix = analyzer.iter_lexicon_formatted(prefix=u"ge")
>>> for i in ix:
        print(i)

Citing

Altinok, D.: DEMorphy, German Language Analyzer
Berlin, 2018

Links:

Authors

  • Duygu Altinok

License

MIT

You can’t perform that action at this time.