Skip to content

apache/opennlp-models

Welcome to Apache OpenNLP Models!

GitHub license Twitter Follow

The Apache OpenNLP library provides binary models for processing of natural language text. This repository is intended for the distribution of model files as a Maven artifacts.

Useful Links

For additional information, visit the OpenNLP Home Page

You can use OpenNLP with any language, further demo models are provided here.

The models are fully compatible with the latest release, they can be used for testing or getting started.

Please train your own models for all other use cases.

Documentation, including JavaDocs, code usage and command-line interface examples are available here

You can also follow our mailing lists for news and updates.

Overview

Component Language Compatibility Description README and Reports
Language Detector Detects 103 languages >= 1.8.3 Detects 103 languages in ISO 693-3 standard. Works well with longer texts that have at least 2 sentences or more from the same language. README Effectiveness Misclassified
Sentence fr >= 1.0.0 Sentence detection model for French README Evaluation Logs
Sentence de >= 1.0.0 Sentence detection model for German README Evaluation Logs
Sentence en >= 1.0.0 Sentence detection model for English README Evaluation Logs
Sentence it >= 1.0.0 Sentence detection model for Italian README Evaluation Logs
Sentence nl >= 1.0.0 Sentence detection model for Dutch README Evaluation Logs
Parts of Speech de >= 1.0.0 Parts of speech model for German README Evaluation Logs
Parts of Speech en >= 1.0.0 Parts of speech model for English README Evaluation Logs
Parts of Speech fr >= 1.0.0 Parts of speech model for French README Evaluation Logs
Parts of Speech it >= 1.0.0 Parts of speech model for Italian README Evaluation Logs
Parts of Speech nl >= 1.0.0 Parts of speech model for Dutch README Evaluation Logs
Parts of Speech it >= 1.0.0 Parts of speech model for Italian README Evaluation Logs
Tokens de >= 1.0.0 Tokenizer model for German README Evaluation Logs
Tokens en >= 1.0.0 Tokenizer model for English README Evaluation Logs
Tokens fr >= 1.0.0 Tokenizer model for French README Evaluation Logs
Tokens it >= 1.0.0 Tokenizer model for Italien README Evaluation Logs
Tokens nl >= 1.0.0 Tokenizer model for Dutch README Evaluation Logs

Getting Started

You can import a model artifact directly via Maven, SBT or Gradle, for instance:

Maven

<dependency>
    <groupId>org.apache.opennlp</groupId>
    <artifactId>opennlp-models-langdetect</artifactId>
    <version>${opennlp.models.version}</version>
</dependency>

SBT

libraryDependencies += "org.apache.opennlp" % "opennlp-models-langdetect" % "${opennlp.version}"

Gradle

compile group: "org.apache.opennlp", name: "opennlp-models-langdetect", version: "${opennlp.version}"

For more details please check our documentation

Adding a new Model

Ensure to add a new model to the expected-models.txt file located in opennlp-models-test.

Contributing

The Apache OpenNLP project is developed by volunteers and is always looking for new contributors to work on all parts of the project. Every contribution is welcome and needed to make it better. A contribution can be anything from a small documentation typo fix to a new component.

If you would like to get involved please follow the instructions here