polyphone

Our paper: CVTE-Poly A New Benchmark for Chinese Polyphone Disambiguation, Interspeech 2023 (accepted)

files

├── README.md
├── data
│   ├── CVTE-poly            
│   ├── POLYPHONIC_CHARS.txt  # generate from `digest_cedict.pkl` by script `cedict2csv.py`
│   ├── cpp                   # combine `g2pm/data` and `g2pW/cpp_dataset`
│   └── digest_cedict.pkl     # dictionary has been corrected
│   └── cedict2csv.py 
│   └── POStagging.py         # generate the `.pos` files
├── g2pW                      # fork from `g2pW` repo and modify
├── g2pm 
└── scripts                   # my self model

results

	acc (Avg. by #sample)	acc (Avg. by #polyphone)	acc (Avg. by #phoneme)
g2pM (ori paper BiLSTM)	97.31	-	-
g2pM (ori paper BERT)	97.85	-	-
g2pM (pip and test)	-	-	-
g2pW (ori paper)	99.08	-	-
g2pW (trained on CPP)	99.0711	98.608	96.4619
g2pW (trained on CPP + CVTE-poly)	99.0711	98.5023	96.6914
self (trained on CPP)
self (trained on CPP + CVTE-poly)

notes

.pos files are prepared for g2pW training.

The POS are labeled by ckiptagger [https://github.com/p208p2002/ckiptagger?tab=readme-ov-file], same as g2pW does. However, I do not directly download the .pos files for CPP dataset from g2pW repo, because the version of ckiptagger varied.
results of original paper are over-estimated.

In the original test set of CPP, there are 1319 sentences which take a non-polyphone as a target character. After removing these sentences, there are only 8925 sentences in the test set. Using the original test set results in over-estimation of the models.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
g2pW		g2pW
g2pm @ 170526e		g2pm @ 170526e
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

polyphone

files

results

notes

About

Releases

Packages

Languages

NewZsh/polyphone

Folders and files

Latest commit

History

Repository files navigation

polyphone

files

results

notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages