GapBasedCWS

The implementation of paper A Gap-Based Framework for Chinese Word Segmentation via Very Deep Convolutional Networks (https://arxiv.org/abs/1712.09509).

We would like to thank FudanNLP, as we copied and pasted some code from https://github.com/FudanNLP/adversarial-multi-criteria-learning-for-CWS .

Dependencies

Python 3.6.3 :: Anaconda custom (64-bit)

Tensorflow: 1.4.1

Numpy: 1.13.3

Pandas: 0.20.3

Data Format

For dev, train, test in each data_directory, its format is:

有#有#1

一#一#0

家#家#1

眼#眼#0

镜#镜#1

店#店#1

销#销#0

售#售#1

的#的#1

３#<NUM>#0

０#<NUM>#0

度#度#1

老#老#0

花#花#0

镜#镜#1

，#<PUNC>#1

The first one is the original char (，), the second one is the processed char (<PUNC>), the last one is the segmentation tag (1).

Code Usage

prepare_data_index.py is used produce .csv that is used as direct input

model.py, train.py & test.py are paired model, train & test file

Run

The hyper parameters are defined in config.py and tf.FLAGS

When you have all necessary files:

python train.py

When you have necessary checkpoint files which are produced by train.py once at a time:

python test.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
config.py		config.py
data_helpers.py		data_helpers.py
model.py		model.py
prepare_data_index.py		prepare_data_index.py
readme.md		readme.md
test.py		test.py
train.py		train.py
voc.py		voc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GapBasedCWS

Dependencies

Data Format

Code Usage

Run

About

Releases

Packages

Languages

Edward-Sun/GapBasedCWS

Folders and files

Latest commit

History

Repository files navigation

GapBasedCWS

Dependencies

Data Format

Code Usage

Run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages