The goal of this repo is to curate resources related to Machine Learning and language tools in general aimed for dhivehi, as such any new comer can easily get to an understanding about the current state and use the prebuiilt tools 😄
@mapmeid has done some amazing work training and fine tuning langauge models for dhivehi
- dv-wave:ELECTRA model trained from scratch on dhivehi text
- dv-muril:Experiment in inserting equivelent dhivehi words to muril
- dv-labse: Inserting dhivehi wordpeice tokens to Google's LaBSE models
Tacotron2 trained on Commonvoice data upto about 300k demo
Current best model: Tacotron2 ~300k
No work has been done in this area in a way that benefits the publlic
- Training Mozilla's tacotron2 implementation with data from Mozilla common voice (griffin Lim)
- Also Training Mozilla's tacotron2 implementation with data from Mozilla common voice (griffin Lim)
- Training MultiBand MelGAN on mozilla common voice data(Single Speaker) ~10k model
- Process Commonvoice data to LJspeech-1.1 format(Also allows to generate audio only from specified speakers)
- [WIP]
- Training Seq2Seq model to transliterate dhivehi to latin based on div-transliteration
- Inference for the div-transliteration model
- DhivehiDatasets: Many types of Curated Dhivehi datasets from many sources(News, )
- Common Voice: Crowd sourced voice dataset
- opendatamv: Effort to collect various types data by the Open Source community
needs to decided and created