Skip to content

Curating and organizing public Machine learning resources related to Dhivehi

License

Notifications You must be signed in to change notification settings

Dharisd/DhivehiML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

DhivehiML 📋

GitHub

Curating and organizing Dhivehi ML projects and experiments

The goal of this repo is to curate resources related to Machine Learning and language tools in general aimed for dhivehi, as such any new comer can easily get to an understanding about the current state and use the prebuiilt tools 😄

Current State

Language models

@mapmeid has done some amazing work training and fine tuning langauge models for dhivehi

  • dv-wave:ELECTRA model trained from scratch on dhivehi text
  • dv-muril:Experiment in inserting equivelent dhivehi words to muril
  • dv-labse: Inserting dhivehi wordpeice tokens to Google's LaBSE models

Text to Speech

Tacotron2 trained on Commonvoice data upto about 300k demo

  • Synthesis audio from Tacotron2 (griffin Lim) Open In Colab

Current best model: Tacotron2 ~300k

Speech to Text

No work has been done in this area in a way that benefits the publlic

Notebooks

Text to speech experiments

  • Training Mozilla's tacotron2 implementation with data from Mozilla common voice (griffin Lim) Open In Colab
  • Also Training Mozilla's tacotron2 implementation with data from Mozilla common voice (griffin Lim) Open In Colab
  • Training MultiBand MelGAN on mozilla common voice data(Single Speaker) ~10k model Open In Colab
  • Process Commonvoice data to LJspeech-1.1 format(Also allows to generate audio only from specified speakers) Open In Colab

Speech to text

  • [WIP]

Transliteration

Name Entity recognition

Datatsets

  • DhivehiDatasets: Many types of Curated Dhivehi datasets from many sources(News, )
  • Common Voice: Crowd sourced voice dataset
  • opendatamv: Effort to collect various types data by the Open Source community

Tasks For Evaluation

needs to decided and created

About

Curating and organizing public Machine learning resources related to Dhivehi

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published