A plugin for the GATE language technology framework for finding Lemmata for words.
This plugin combines word lists from Wiktionary and, if available, morphological transducers created for the Helsinki Finite-State Transducer (FST) software to find lemmata for tokens.
Currently, the following languages are supported:
- en (English)
- de (German)
- fr (French)
- it (Italian)
- nl (Dutch)
- es (Spanish)
The input for the PR must already be tokenised and every token must have a universal dependency POS tag as a feature.
This plugin is partly based on the code developed by Ahmet Aker for POS tagging and lemmatization in several languages.