Skip to content

Repo for Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning

Notifications You must be signed in to change notification settings

PaperMechanica/SemiPPL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 

Repository files navigation

Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning
 
Interspeech 2021
Yi Shi* Congyi Wang** Yu Chen Bin Wang
 

*First Author **Corresponding Author

This is the official repo of the interspeech 2021 paper: Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning. Here we provide data and other useful links.

Links

Paper | Video | Slides | DataTest | DataTrain

Dataset:

The data is split into training set and test set. The training set is scraped from the internet. The corresponding annotations are auto-labeled using traditional techniques and then fixed manually. The correstness of the training set is not guaranteed, so if you are able to find incorrect labels, please report in the issue section. The testset is created manually and cover some most difficult cases in real life conversations for robustness evaluation.

The format of training data file: aug<#local label>
The format of data: Pinyin/space/position/space/sentence/\n

License:

This is a research conducted by ([Xmov|魔珐科技]https://www.xmov.ai/about/). The usage of the dataset is restricted to eductaion and research purposes only.

If you would like to cite our paper:

@inproceedings{shi21semp,
  author={Shi,Yi and Wang,Congyi and Chen,Yu and Wang,Bin},
  title={Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning},
  year={2021},
  booktitle={Proc. Interspeech 2021},
  pages={4109--4113},
  doi={10.21437/Interspeech.2021-502}
}

For further question, you are welcome to contact shiyi2008@gmail.com

About

Repo for Polyphone Disambiguation in Mandarin Chinese with Semi-Supervised Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published