Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Deepspeech scripts to Italian #2

Closed
Mte90 opened this issue Sep 4, 2019 · 7 comments

Comments

@Mte90
Copy link
Member

commented Sep 4, 2019

We need to migrate all the bash scripts and docker file replacing the French references from file to parameters to italian.
Right now localize the readme in that folder in english or italian is not a priority https://github.com/MozillaItalia/commonvoice-it/tree/master/DeepSpeech
The scripts in that fodler download various packages from other resources like lingualibre to add more data to the model generation, package and generate the model for deepspeech.

Until this is not done we cannot generate the model for italian to use with deepspeech.

@Mte90 Mte90 added the help wanted label Sep 11, 2019
@Mte90

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2019

Just an update, docker file is already aligned to use italian so if you want contribute:

  • Read the docker file also if you don't know that because it is the entrypoint for all the scripts
  • Read run.sh
  • Read run_fr.sh to understand the order of the scripts
  • Migrate every single file to use italian parameters or italian files

For any other info we are on telegram [at]mozitabot in the Developers channel.

You can do a pr also for only one file if you don't have time and other people will start work on the others.

@MozillaItalia MozillaItalia deleted a comment from Karm46 Sep 12, 2019
@Mte90

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2019

Already started working on it #3
YEAH!

@alex179ohm

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2019

Missing files:

  • CommonVoice-Data/names.py (needs data from a Italian source, and properly patched to be able to parse italian data) #4
  • CommonVoice-Data/libretheatre.py (needs data from Italian source, maybe has to be rewritten entirely) #5
  • CommonVoice-Data/wikipedia.py (done)
  • CommonVoice-Data/wikisource.py (needs italian translation of the book "Le forceures de blocus", has to be rewritten to be able to scrap the italian book) #5
  • CommonVoice-Data/framabook.py #5
  • CommonVoice-Data/utils.py (needs to be adapted to italian language)
@Mte90

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2019

Talking with @lissyx libretheather and framabook can be removed and replaced with something else insted for wikisource we can use another book.
The point of this script is to have resources to test the model generated so we can evaluate what to use.

So let's focus on names.py

@Mte90

This comment has been minimized.

Copy link
Member Author

commented Sep 12, 2019

One of the things that we have to do is to generate the model and upload it, https://github.com/MozillaItalia/commonvoice-it/blob/master/DeepSpeech/build_lm.sh#L12
This script in case doesn't exist, automatically download one already avalaible.

@Mte90

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2019

We have now 2 new ticket to track better what we have to do at #4 and #5

@Mte90

This comment has been minimized.

Copy link
Member Author

commented Sep 13, 2019

We can remove trainingspeech because it is a french project and I don't think that exists a similar for italian https://gitlab.com/nicolaspanel/TrainingSpeech

Done

@Mte90 Mte90 referenced this issue Sep 13, 2019
1 of 6 tasks complete
@Mte90 Mte90 closed this Sep 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.