Skip to content
Named Entity Recognition (NER) models (neural and sparse) implemented based on package LibN3L
C++ Other
  1. C++ 99.5%
  2. Other 0.5%
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.settings
Debug
basic
example
.cproject
.project
CMakeLists.txt
GRNNCRFMLLabeler.cpp
GRNNCRFMLLabeler.h
GRNNCRFMMLabeler.cpp
GRNNCRFMMLabeler.h
GRNNLabeler.cpp
GRNNLabeler.h
GatedCRFMLLabeler.cpp
GatedCRFMLLabeler.h
GatedCRFMMLabeler.cpp
GatedCRFMMLabeler.h
GatedLabeler.cpp
GatedLabeler.h
LSTMCRFMLLabeler.cpp
LSTMCRFMLLabeler.h
LSTMCRFMMLabeler.cpp
LSTMCRFMMLabeler.h
LSTMLabeler.cpp
LSTMLabeler.h
Options.h
README.md
RNNCRFMLLabeler.cpp
RNNCRFMLLabeler.h
RNNCRFMMLabeler.cpp
RNNCRFMMLabeler.h
RNNLabeler.cpp
RNNLabeler.h
SparseCRFMLLabeler.cpp
SparseCRFMLLabeler.h
SparseCRFMMLabeler.cpp
SparseCRFMMLabeler.h
SparseGRNNCRFMLLabeler.cpp
SparseGRNNCRFMLLabeler.h
SparseGRNNCRFMMLabeler.cpp
SparseGRNNCRFMMLabeler.h
SparseGRNNLabeler.cpp
SparseGRNNLabeler.h
SparseGatedCRFMLLabeler.cpp
SparseGatedCRFMLLabeler.h
SparseGatedCRFMMLabeler.cpp
SparseGatedCRFMMLabeler.h
SparseGatedLabeler.cpp
SparseGatedLabeler.h
SparseLSTMCRFMLLabeler.cpp
SparseLSTMCRFMLLabeler.h
SparseLSTMCRFMMLabeler.cpp
SparseLSTMCRFMMLabeler.h
SparseLSTMLabeler.cpp
SparseLSTMLabeler.h
SparseLabeler.cpp
SparseLabeler.h
SparseRNNCRFMLLabeler.cpp
SparseRNNCRFMLLabeler.h
SparseRNNCRFMMLabeler.cpp
SparseRNNCRFMMLabeler.h
SparseRNNLabeler.cpp
SparseRNNLabeler.h
SparseTNNCRFMLLabeler.cpp
SparseTNNCRFMLLabeler.h
SparseTNNCRFMMLabeler.cpp
SparseTNNCRFMMLabeler.h
SparseTNNLabeler.cpp
SparseTNNLabeler.h
TNNCRFMLLabeler.cpp
TNNCRFMLLabeler.h
TNNCRFMMLabeler.cpp
TNNCRFMMLabeler.h
TNNLabeler.cpp
TNNLabeler.h
cleanall.sh
debug.sh
demo-entity.sh
store.sh

README.md

NNNamedEntity

NNNamedEntity is a package for Named Entity Recognition using neural networks based on package LibN3L. It includes different combination of Neural network architectures (TNN, RNN, GatedNN, LSTM and GRNN) with Objective function(sigmoid, CRF max-margin, CRF maximum likelihood). It also provides the capability of combination of Sparse feature with above models. In addition, this package can easily support various user-defined neural network structures.

Demo system

The demo system includes English name entity recognition sample data("Entity.train", "Entity.dev" and "Entity.test", English word embeding sample file("sena.emb" and parameter setting file("demo.option". All of these files are gathered at folder NNNamedEntity/example.

This demo system runs a SparseTNNCRFMLLabeler model which means a traditional neural network with sparse feature and use CRF maximun likelihood as the objective function.

The demo system will generate three files: "Entity.devOUTdemo", "demo.model" and "Entity.test.output" at NNNamedEntity/example. "Entity.devOUTdemo" is the tagged dev file during training process. "demo.model" is the best predicting model in all training process. "Entity.test.output" is the final tagged result for "Entity.test" in tagger process based on the generated model "demo.model".

Note:

  • Current version only compatible with LibN3L after Dec. 10th 2015 , which contains the model saving and loading module.
  • The example files are just to verify the running for the code. For copyright consideration, we take only hundreds of sentences as example. Hence the results on those example datasets does not represent the real performance on large dataset.
  • The .cpp file also provide gradient checking for verify your code, it is commentted out by defult. It is necessary to set "dropout = 0" in demo.option before you enable the gradient checking. Note the check gradient is not available for all max-margin(MM) models (such as LSTMCRFMMLabeler, SparseLSTMCRFMMLabeler).
  • If you already set wordEmbFineTune/charEmbFineTune as false, and want to check gradient, please comment out the "checkgrad" line for _words/_chars in file "basic/*Classifier.h".

Feature format

Consider following sentence in Entity.test:

Foreign - invested enterprises have played a prominent role in improving China 's export commodity structure .

The sample features for word China is

China [S]PoCNNP [S]PoBiLVBG.NNP [S]PoBiNNNP.POS [S]PoTrVBG.NNP.POS [S]WPCChina.NNP [S]UnCChina [S]UnLimproving [S]UnN's [S]CaC1 [S]CaL0 [S]CaN0 [S]CaCC1China [S]CaLC0China [S]CaNC0China [S]CaLL0improving [S]CaNN0's [S]ShC2111 [S]ShL1111 [S]ShN3133 [S]BShL11112111 [S]BShN21113133 [S]ConCnone [S]ConLnone [S]ConNnone [S]ConCaL0none [S]ConCaN0none [S]CaConL1none [S]BiLTin.improving [S]BiNT's.export [S]BiLimproving.China [S]BiNChina.'s [S]ClC301 [S]ClL448 [S]ClN181 [S]BClL448.301 [S]BClN301.181 [S]PrC0C [S]PrC1h [S]PrC2i [S]PrC3n [S]SuC0h [S]SuC1i [S]SuC2n [S]SuC3a [S]SuL0v [S]SuL1i [S]SuL2n [S]SuL3g [S]PrN0' [S]PrN1s [S]PrN2*N* [S]PrN3*N* [T]NNP B-GPE

where

  • The first word China means current word.
  • Feature starts with [S] means sparse feature. For example, [S]BiNChina.'s can be divided as three part([S]+BiN+China.'s): [S] represents sparse feature; BiN means this feature represents word bigram information for current word and next word; China.'s is the combination fo current word and next word. The BiN part can be replaced by any word except the same word in other features.
  • Feature starts with "[T" are additional targets which need to be embedded. In our example, [T]NNP means the POS tagger information for each word will be embedded in our neural network. You can add other features which need to be embedded in following format [T1]feature1 [T2]feature2.
  • The last tag B-GPE is the label for current word, it is used for model evaluation.

Monitoring information

During the running of this NER system, it may print out the follow log information:

Recall: P=97/199=0.487437, Accuracy: P=97/162=0.598765, Fmeasure: 0.537396

test:

Recall: P=158/267=0.59176, Accuracy: P=158/226=0.699115, Fmeasure: 0.640974

Exceeds best previous performance of 0.523161. Saving model file..

The first "Recall..." line shows the performance of the dev set and the second "Recall..." line shows you the performance of the test set.

Updating...

  • 2017-01-02: update CMAkeList, no need to comment out compilier manually when switch between MacOS and Linux.
  • 2016-01-15: fix init() with setwordembeddingfinetune bug.
  • 2015-12-11: fix test() output file bug. Previous version's output is golden test file, this version sets the output to be the predicted file.
  • 2015-12-02: support model saving and loading.
You can’t perform that action at this time.