Introduction

Code for Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition.

Usage

N-gram and trained BERT classifier cannot be public since privacy policy.

Use in command lines

python -m graces -s 饮食可，睡眠可，大便不规律，小便正常，体重无明显减轻。
python -m graces -f ./input.txt -o ./output.txt

Import from python

import graces
graces.cut("饮食可，睡眠可，大便不规律，小便正常，体重无明显减轻。") # Segment a single sentence
graces.cut_k("饮食可，睡眠可，大便不规律，小便正常，体重无明显减轻。", k=8) # Segment a single sentence with fixed word count k.
graces.cut_file("./input.txt", "./output.txt") # Segment a file

Data

We ask MD students to construct coarse and fine level word segmentation on EHRs for validation. We do not use data for training!

dev.txt: Unlabeled EHRs from part of CCKS2019.
dev_label_coarse.txt: Coarse-level word segmentation labels.
dev_label_fine.txt: Fine-level word segmentation labels.

Citation

If you find our codes or data useful, please cite:

@article{YUAN2020103542,
title = "Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition",
journal = "Journal of Biomedical Informatics",
volume = "110",
pages = "103542",
year = "2020",
issn = "1532-0464",
doi = "https://doi.org/10.1016/j.jbi.2020.103542",
url = "http://www.sciencedirect.com/science/article/pii/S1532046420301702",
author = "Zheng Yuan and Yuanhao Liu and Qiuyang Yin and Boyao Li and Xiaobin Feng and Guoming Zhang and Sheng Yu",
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
graces		graces
test		test
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

graces

graces

test

test

.gitignore

.gitignore

README.md

README.md

setup.py

setup.py

Repository files navigation

Introduction

Usage

Use in command lines

Import from python

Data

Citation

About

Releases

Packages

Languages

GanjinZero/GTS

Folders and files

Latest commit

History

Repository files navigation

Introduction

Usage

Use in command lines

Import from python

Data

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages