Some Dhivehi/Thaana datasets (not suitable for production) I use for my Machine Learning experiments.
Corpus of Dhivehi News (mostly) Text (* 307 MB)
https://drive.google.com/file/d/1G_bwvnGiMOMuWvw_O9rnjgxcfeMrtqvI/view?usp=sharing
Dhivehi news headlines with various news categories such as politics, entertainment, lifestyle, general news, sports etc. (* 12 MB)
https://drive.google.com/file/d/1XBzr-tih1yGsZQSuajI1HfxoYTzljwlE/view?usp=sharing
Dhivehi speech data - data collected from PO MV (* 1 GB)
https://drive.google.com/file/d/1vhMXoB2L23i4HfAGX7EYa4L-sfE4ThU5/view?usp=sharing
Akuru-MNIST is a MNIST style akuru dataset for OCR (* 161 MB)
https://drive.google.com/file/d/16LSVcNcoPmaMPTkisOned9rl61YwfZKB/view?usp=sharing
Maldivian Latin to Thaana dataset - needs a lot of fixing (* 3 MB)
https://drive.google.com/file/d/1lPLREUbHI-Z4XDbyuaL3mwsaq6xiRNre/view?usp=sharing
Dhivehi-English texts extracted from websites and other sources. (* 4 MB)
https://drive.google.com/file/d/1qiD1XOPO5Fv-UAX0NcD_rlOrZz2WvGAo/view?usp=sharing