Skip to content

Sofwath/DhivehiDatasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 

Repository files navigation

Dhivehi Datasets

Some Dhivehi/Thaana datasets (not suitable for production) I use for my Machine Learning experiments.

Thaana Text Corpus

Corpus of Dhivehi News (mostly) Text (* 307 MB)

https://drive.google.com/file/d/1G_bwvnGiMOMuWvw_O9rnjgxcfeMrtqvI/view?usp=sharing

Dhivehi News Clasification

Dhivehi news headlines with various news categories such as politics, entertainment, lifestyle, general news, sports etc. (* 12 MB)

https://drive.google.com/file/d/1XBzr-tih1yGsZQSuajI1HfxoYTzljwlE/view?usp=sharing

Dhivehi Speech

Dhivehi speech data - data collected from PO MV (* 1 GB)

https://drive.google.com/file/d/1vhMXoB2L23i4HfAGX7EYa4L-sfE4ThU5/view?usp=sharing

Akuru-MNIST

Akuru-MNIST is a MNIST style akuru dataset for OCR (* 161 MB)

https://drive.google.com/file/d/16LSVcNcoPmaMPTkisOned9rl61YwfZKB/view?usp=sharing

Latin

Maldivian Latin to Thaana dataset - needs a lot of fixing (* 3 MB)

https://drive.google.com/file/d/1lPLREUbHI-Z4XDbyuaL3mwsaq6xiRNre/view?usp=sharing

Dhivehi Neural Machine Translation

Dhivehi-English texts extracted from websites and other sources. (* 4 MB)

https://drive.google.com/file/d/1qiD1XOPO5Fv-UAX0NcD_rlOrZz2WvGAo/view?usp=sharing

About

Some Dhivehi/Thaana datasets used for ML experiments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published