Skip to content

Dictionary-based lemmatizer [LUCENE-6254] #7316

@asfimport

Description

@asfimport

The only way to achieve lemmatization today is to use the SynonymFilterFactory. The available stemmers are also inaccurate since they are only following simplistic rules.

A dictionary-based lemmatizer will be more precise because it has the opportunity to know the part of speech. Thus it provides a more precise method to stem words compared to other dictionary-based stemmers such as Hunspell.

This is my effort to develop such a lemmatizer for Apache Lucene. The documentation is temporarily placed here:
http://folk.uio.no/erlendfg/solr/lemmatizer.html


Migrated from LUCENE-6254 by Erlend Garåsen
Attachments: LUCENE-6254.patch

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions