Skip to content

Ruthwik/Language-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language Detection using Apache OpenNLP

This is a Java project for Language Detection using Apache OpenNLP.

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.

Language Detector

The Apache OpenNLP team announced the release of Language Detector Model 1.8.3 for Apache OpenNLP 1.8.3.The Language Detector Model can detect 103 languages and outputs ISO 639-3 codes.

This model is trained for and works well with longer texts that have at least 2 sentences or more from the same language.

More information about Language Detector Model can be found in the README.txt. The model effectiveness details can be found in the following report

Use

How to build a Language Detector

Following are the steps to use Language Detector from Apache Opennlp. Language Detector Model is used and therefore the traning step is not required.

The steps for traning an own model can be found here.

How to load the model

// load the trained Language Detector Model file
File modelFile = new File(".\\resources\\langdetect-183.bin");
	    	
LanguageDetectorModel trainedModel = new LanguageDetectorModel(modelFile);
	    	
// load the model
LanguageDetector languageDetector = new LanguageDetectorME(trainedModel);

How to predict the language

Input the sentence of a language.

Language[] languages = languageDetector.predictLanguages("Puedo darte ejemplos de los métodos");
System.out.println("Predicted language: "+ languages[0].getLang());

The list contains languages and respective confidences. The first element in the list gives the language with highest confidence which is the required one.

Tools used

  • Java 1.8
  • opennlp-tools-1.8.3
  • Eclipse

LICENSE

The OpenNLP is released under an Apache Licence, version 2.0 (http://www.apache.org/licenses/LICENSE-2.0.html).

About

Language Detection in Java using OpenNLP

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages