German lemmatization with IWNLP as extension for spaCy
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.travis
spacy_iwnlp
tests
.gitignore
.travis.yml
LICENSE.md
MANIFEST
README.md
requirements.txt
setup.cfg
setup.py

README.md

spacy-iwnlp

license Build Status

This package uses the spaCy 2.0 extensions to add IWNLP-py as German lemmatizer directly into your spaCy pipeline.

Please report bugs with spacy-iwnlp as issue in IWNLP-py.

Usage

import spacy
from spacy_iwnlp import spaCyIWNLP
nlp = spacy.load('de')
iwnlp = spaCyIWNLP(lemmatizer_path='data/IWNLP.Lemmatizer_20181001.json')
nlp.add_pipe(iwnlp)
doc = nlp('Wir mögen Fußballspiele mit ausgedehnten Verlängerungen.')
for token in doc:
    print('POS: {}\tIWNLP:{}'.format(token.pos_, token._.iwnlp_lemmas))

Installation

  1. Use pip to install spacy-iwnlp
pip install spacy-iwnlp
  1. Download the latest processed IWNLP dump from http://lager.cs.uni-duesseldorf.de/NLP/IWNLP/IWNLP.Lemmatizer_20181001.zip and unzip it.

Citation

Please include the following BibTeX if you use IWNLP in your work:

@InProceedings{liebeck-conrad:2015:ACL-IJCNLP,
  author    = {Liebeck, Matthias  and  Conrad, Stefan},
  title     = {{IWNLP: Inverse Wiktionary for Natural Language Processing}},
  booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
  year      = {2015},
  publisher = {Association for Computational Linguistics},
  pages     = {414--418},
  url       = {http://www.aclweb.org/anthology/P15-2068}
}