Skip to content

MarkusKrug/NERDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NERDetection

Project NE Detection as DK Pro Component

This (plain Java Project-no Maven - no Ant - no Eclipse plugin) can be used as a UIMA component to detect names in German novels. It uses a MaxEnt-Classifier ( which showed to perform better than a Linear Chain CRF ) to do so.

The required model is stored in the resource Folder ( however it is expected that there will be changes during the next days !!!) This Project comes with all its dependend jars included.

-(Its only requirements are Mallet 2.07 RC2 , UIMA-Core and UIMA-Fit).

Basic usage ( in accordance with DKPro):

public static void main(String[] args) throws Exception {

CollectionReaderDescription cr = createReaderDescription(TextReader.class,
        TextReader.PARAM_PATH,
        "<Input-File>",
        TextReader.PARAM_LANGUAGE, "de");

AnalysisEngineDescription segmenter = createEngineDescription(OpenNlpSegmenter.class);

AnalysisEngineDescription tagger = createEngineDescription(OpenNlpPosTagger.class);

// ========PARAMS FOR THIS ANALYSIS ENGINE it requires to have POS-tags and Sentences!! ======

String modelLocation = "resources\\modelNERRegular.bin";
String featuresFile = "resources\\features.txt";

AnalysisEngineDescription neDetection = createEngineDescription(RomaneNERAnnotator.class,
        RomaneNERAnnotator.PARAM_FEATURE_FILE_LOCATION, featuresFile,
        RomaneNERAnnotator.PARAM_MODEL_LOCATION, modelLocation);

// =========

AnalysisEngineDescription cc = createEngineDescription(CasDumpWriter.class,
        CasDumpWriter.PARAM_OUTPUT_FILE, "<outputfile>");

runPipeline(cr, segmenter, tagger, neDetection, cc);

}

About

Project NE Detection as DK Pro Component

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages