Skip to content

ayaka14732/yue-cmn-classification-task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Cantonese/Mandarin Classification Task

The aim of this task is to classify texts into Cantonese and Mandarin. It is extremely useful for filtering Cantonese text from large-scale web crawling-based corpus.

Scores

Model Author Accuracy
Cantonese text classifier CanCLID 82.49%

Please update this list if you have built your own model.

Test

python compute_accuracy.py output.txt

Source

The Cantonese test data are extracted from 粵語對話語料.

The Mandarin test data are extracted from PTT 八卦版問答中文語料.

About

Cantonese/Mandarin Classification Task

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages