Hierarchical Lexicon Embedding Architecture for CNER

This is the implementation of our paper "Hierarchical Lexicon Embedding Architecture for Chinese Named Entity Recognition", our method keeps high efficiency, high performance and high transferability at the same time, more details can be found at paper.

Source code description

Requirement:

Python 3.6
Pytorch 0.4.1
jieba

Input format:

CoNLL format, with each character and its label split by a whitespace in a line. The "BMES" tag scheme is prefered.

走 O
过 O
南 B-GPE
京 M-GPE
市 E-GPE
长 B-LOC
江 M-LOC
大 M-LOC
桥 E-LOC

Pretrain embedding:

The POS tag and relative position embeddings are randomly initialized, while the word embedding, char embedding and bichar embedding are the same with Lattice LSTM

Run the code:

Download the character embeddings and word embeddings from Lattice LSTM and put them in the data folder.
Download your datasets and put them in the data folder.
To train on the dataset:

python main.py --model_type MODEL_TYPE --train TRAIN_FILE_PATH --dev DEV_FILE_PATH --test TEST_FILE_PATH --model_path MODEL_PATH --modelname MODEL_NAME --savedset DATASET_SAVE_PATH --num_iter NUM_ITER --seed SEED --hidden_dim HIDDEN_DIM --batch_size BATCH_SIZE --drop DROP --lr LR

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
model		model
utils		utils
README.MD		README.MD
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model

model

utils

utils

README.MD

README.MD

main.py

main.py

Repository files navigation

Hierarchical Lexicon Embedding Architecture for CNER

Source code description

Requirement:

Input format:

Pretrain embedding:

Run the code:

About

Releases

Packages

Languages

BUAAJustin/HLEA

Folders and files

Latest commit

History

Repository files navigation

Hierarchical Lexicon Embedding Architecture for CNER

Source code description

Requirement:

Input format:

Pretrain embedding:

Run the code:

About

Resources

Stars

Watchers

Forks

Languages