Skip to content

Latest commit

 

History

History
27 lines (21 loc) · 1.55 KB

README.md

File metadata and controls

27 lines (21 loc) · 1.55 KB

[Update][1 Feb, 2024] We release the .wav data of the dataset Google Drive Link. Please use it for research purposes only.

MDCC Dataset

This repository contains code and meta-data to download the How2 dataset as described in the following paper:

Tiezheng Yu and Rita Frieske and Peng Xu and Samuel Cahyawijaya and Cheuk Tung Shadow Yiu and Holy Lovenia and Wenliang Dai and Elham J. Barezi and Qifeng Chen and Xiaojuan Ma and Bertram E. Shi and Pascale Fung. "Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset" Link: https://arxiv.org/pdf/2201.02419.pdf

@misc{yu2022automatic,
      title={Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset}, 
      author={Tiezheng Yu and Rita Frieske and Peng Xu and Samuel Cahyawijaya and Cheuk Tung Shadow Yiu and Holy Lovenia and Wenliang Dai and Elham J. Barezi and Qifeng Chen and Xiaojuan Ma and Bertram E. Shi and Pascale Fung},
      year={2022},
      eprint={2201.02419},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Download

  1. You can find the LICENSE named "MDCC_LICENSE" in is folder. Please sign the license and send it to chinatysonyu@gmail.com.
  2. Then you can download the data from this Google Drive Link.

Download checkpoints

Google Drive Link

How to run the code?

[TODO]