GitHub - SuperBruceJia/Chinese-Chat-Title-NER-BERT-BiLSTM-CRF: This is a task on Chinese chat title NER via BERT-BiLSTM-CRF model.

This is a step-by-step implementation of Chinese Title Entity Recognition via BERT-BiLSTM-CRF Model.

For original codes and tutorials, please visit here.
For Chinese readers, please visit here.

My job lies in the Chinese chat title named entity recognition by fine-tuning the BERT model.
So I changed several lines of codes and extended more Chinese chat title entities to the original dataset.
96.73% accuracy has been achieved via the BERT-BiLSTM-CRF.

The main purpose of the job（examples）:

Input One: 小贾你最近忙什么呢？
Input Two: 贾舒越
Output: 小贾 is 贾舒越

Input One: 建勋师兄你何时来实验室？
Input Two: 邸建勋
Output: 建勋师兄 is 王建勋

Input One: 最近王宇航学习怎么样呀
Input Two: 王海生
Output: There is no match for 王海生.

Input One: 贾泽阳现在回家了嘛
Input Two: 吴泽阳
Output: There is no match for 吴泽阳.

For Chinese readers, you guys could read the 提取聊天对方的称谓 - 方案与deadline.pdf to get into the details.

Step One: configure the tensorflow and bert environment

pip install bert-base==0.0.7 -i https://pypi.python.org/simple
tensorflow >= 1.12.0
tensorflow-gpu >= 1.12.0  # GPU version of TensorFlow.
GPUtil >= 1.3.0  # no need if you dont have GPU
pyzmq >= 17.1.0  # python zmq

Step Two: Download the BERT pre-trained model and training dataset

Download the BERT pre-trained model from here.
Be sure to place the extracted folder "chinese_L-12_H-768_A-12" on "init_checkpoint" folder.
training dataset from here.
Be sure to place "train.txt" on the "data" folder.

Step Three: Train the model via command line

Open the CMD terminal or the Anaconda Prompt and be sure to guide it to the working path and tensorflow environment:
e.g. my working path is /Users/shuyuej/Desktop/Python-Files/Chinese-Chat-Title-NER-BERT-BiLSTM-CRF/.

Then input the command:

bert-base-ner-train -data_dir /Users/shuyuej/Desktop/Python-Files/BERT-BiLSTM-CRF-NER/data/ -output_dir /Users/shuyuej/Desktop/Python-Files/BERT-BiLSTM-CRF-NER/final_output/ -init_checkpoint /Users/shuyuej/Desktop/Python-Files/BERT-BiLSTM-CRF-NER/init_checkpoint/chinese_L-12_H-768_A-12\bert_model.ckpt -bert_config_file /Users/shuyuej/Desktop/Python-Files/BERT-BiLSTM-CRF-NER/init_checkpoint/chinese_L-12_H-768_A-12/bert_config.json -vocab_file /Users/shuyuej/Desktop/Python-Files/BERT-BiLSTM-CRF-NER/init_checkpoint/chinese_L-12_H-768_A-12/vocab.txt -batch_size 8

FYI, be sure to change my "/Users/shuyuej/Desktop/Python-Files/BERT-BiLSTM-CRF-NER/" to your own BERT-BiLSTM-CRF-NER path.

For Windows OS System: I use the following command line:

bert-base-ner-train -data_dir E:\BERT-BiLSTM-CRF-NER\data\ -output_dir E:\BERT-BiLSTM-CRF-NER\final_output\ -init_checkpoint E:\BERT-BiLSTM-CRF-NER\init_checkpoint\chinese_L-12_H-768_A-12\bert_model.ckpt -bert_config_file E:\BERT-BiLSTM-CRF-NER\init_checkpoint\chinese_L-12_H-768_A-12\bert_config.json -vocab_file E:\BERT-BiLSTM-CRF-NER\init_checkpoint\chinese_L-12_H-768_A-12\vocab.txt -batch_size 8

The final trained model will be in the "final_output" folder.

Step Four: Test and enjoy the model

The Test File is "test.py" and could test the "Test-set.xlsx" and get a result.
Before you execute the file, be sure to change the paths of trained BERT model, original pre-trained BERT model, and Test-set.xlsx to your own.
And you could see the results and power of BERT.

Another executive file is "predict-test.py" in which you could input the sentence and name and finally get the match results. Be sure to change the paths same as "test.py" file.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bert_base		bert_base
data		data
README.md		README.md
Test-set.xlsx		Test-set.xlsx
data_process.py		data_process.py
predict-test.py		predict-test.py
run.py		run.py
setup.py		setup.py
test.py		test.py
提取聊天对方的称谓 - 方案与deadline.pdf		提取聊天对方的称谓 - 方案与deadline.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bert_base

bert_base

data

data

README.md

README.md

Test-set.xlsx

Test-set.xlsx

data_process.py

data_process.py

predict-test.py

predict-test.py

run.py

run.py

setup.py

setup.py

test.py

test.py

提取聊天对方的称谓 - 方案与deadline.pdf

提取聊天对方的称谓 - 方案与deadline.pdf

Repository files navigation

This is a step-by-step implementation of Chinese Title Entity Recognition via BERT-BiLSTM-CRF Model.

Step One: configure the tensorflow and bert environment

Step Two: Download the BERT pre-trained model and training dataset

Step Three: Train the model via command line

Step Four: Test and enjoy the model

About

Releases

Packages

Languages

SuperBruceJia/Chinese-Chat-Title-NER-BERT-BiLSTM-CRF

Folders and files

Latest commit

History

Repository files navigation

This is a step-by-step implementation of Chinese Title Entity Recognition via BERT-BiLSTM-CRF Model.

Step One: configure the tensorflow and bert environment

Step Two: Download the BERT pre-trained model and training dataset

Step Three: Train the model via command line

Step Four: Test and enjoy the model

About

Topics

Resources

Stars

Watchers

Forks

Languages