Skip to content

DuyguA/DEMorphy

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DEMorphy

DEMorphy is a morphological analyzer for German language. DEMorphy provides gender, person, singular/plural etc. full inflection information as well as word lemma.

  • source code and usage docs: Github
  • companion German morphological dictionaries Github

Installation

OS X & Linux, directly from Github:

$ git clone https://github.com/DuyguA/DEMorphy
$ cd DEMorphy

Download the dictionary file demorphy/data/words.dg and replace it under the corresponding directory again. Then you're ready to launch the setup script:

$ python setup.py install

Usage

Basic usage:

$ python
>>> from demorphy import Analyzer
>>> analyzer = Analyzer(char_subs_allowed=True)
>>> s = analyzer.analyze(u"gegangen")
>>> for anlyss in s:
        print anlyss
{'CATEGORY': u'ADJ', 'PTB_TAG': u'JJ', 'STTS_TAG': u'ADJD', 'ADDITIONAL_ATTRIBUTES': u'<adv>', 'DEGREE': u'pos', 'LEMMA': u'gegangen'}
{'CATEGORY': u'ADJ', 'PTB_TAG': u'JJ', 'STTS_TAG': u'ADJD', 'ADDITIONAL_ATTRIBUTES': u'<pred>', 'DEGREE': u'pos', 'LEMMA': u'gegangen'}
{'CATEGORY': u'V', 'LEMMA': u'gehen', 'STTS_TAG': u'V', 'TENSE': u'ppast', 'PTB_TAG': u'V'}

Usage with cache decorators:

>>> from demorphy import Analyzer
>>> from demorphy.cache import memoize, lrudecorator
>>> analyzer = Analyzer(char_subs_allowed=True)
>>> cache_size = 200 #you can arrange the size or unlimited cache. For German lang, we recommed 200 as cache size.
>>> cached = memoize if cache_size=="unlim" else (lrudecorator(cache_size) if cache_size else (lambda x: x))
>>> analyze = cached(analyzer.analyze)
>>> s = analyze(u"gegangen")
>>> for anlyss in s:
        print anlyss
{'CATEGORY': u'ADJ', 'PTB_TAG': u'JJ', 'STTS_TAG': u'ADJD', 'ADDITIONAL_ATTRIBUTES': u'<adv>', 'DEGREE': u'pos', 'LEMMA': u'gegangen'}
{'CATEGORY': u'ADJ', 'PTB_TAG': u'JJ', 'STTS_TAG': u'ADJD', 'ADDITIONAL_ATTRIBUTES': u'<pred>', 'DEGREE': u'pos', 'LEMMA': u'gegangen'}
{'CATEGORY': u'V', 'LEMMA': u'gehen', 'STTS_TAG': u'V', 'TENSE': u'ppast', 'PTB_TAG': u'V'}

Iterating over all the lexicon:

>>> from demorphy import Analyzer
>>> analyzer = Analyzer(char_subs_allowed=True)
>>> ix = analyzer.iter_lexicon_formatted()
>>> for i in ix:
        print(i)

One can iterate over the lexicon words with a given prefix. Following code will iterate over all the words that begins with "ge":

>>> ix = analyzer.iter_lexicon_formatted(prefix=u"ge")
>>> for i in ix:
        print(i)

Citing

Altinok, D.: DEMorphy, German Language Analyzer
Berlin, 2018

Links:

Authors

  • Duygu Altinok

License

MIT

About

German Morphological Analyzer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages