A plugin for the GATE language technology framework for finding lemmata of words.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib
resources
src/gate
tests
.gitattributes
.gitignore
LICENSE.txt
LICENSES.txt
README.md
build.properties.template
build.xml
creole.xml
makedist.sh

README.md

gateplugin-Lemmatizer

A plugin for the GATE language technology framework for finding Lemmata for words.

This plugin combines word lists from Wiktionary and, if available, morphological transducers created for the Helsinki Finite-State Transducer (FST) software to find lemmata for tokens.

Currently, the following languages are supported:

  • en (English)
  • de (German)
  • fr (French)
  • it (Italian)
  • nl (Dutch)
  • es (Spanish)

The input for the PR must already be tokenised and every token must have a universal dependency POS tag as a feature.

This plugin is partly based on the code developed by Ahmet Aker for POS tagging and lemmatization in several languages.