Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the use of some file of Festcat project for creation not Catalan Voices. #7

Open
Pallas1303 opened this issue Dec 1, 2023 · 11 comments

Comments

@Pallas1303
Copy link

Hello, I was see the project Fastcat. The work of yous is very good, the only project of its kind to Festival Speech Synthesis.

I am developing a project similar for Festival Speech Synthesis but for my native language (Portuguese Brazilian). It is at an early stage but despite this have good results.

I am using Unit Selection (Clunits) for natural speech with a speech database with phases balanced phonetically.

I was see the codes of upc_ca_base. Specifically the codes of tokenization. How convert of numbers to words and treatment of texts. With this, I am request permission for adaptacion and use this codes for my project of new language for Festival Speech Synthesis.

Obviously, with names of autors and project Festcat.

I sent an email to Antonio bonafonte about this, I received in response that he is no longer working at the University. with whom do I ask for permission of the uses of the Festcat project files?

@zeehio
Copy link
Member

zeehio commented Dec 2, 2023

Hi!

Thanks for reaching out to us!

Our only requirements for you to comply with our license terms are explained in the headers of the source files themselves and in the copyright/license files. This message is just some comments for you to better understand them.

The Festcat project is free software, and this means that it grants you permission to use Festcat for any purpose. It also grants you permission to study how Festcat works and to make changes to your Festcat copy. You are allowed to redistribute those changes, under some reasonable conditions. For instance, "you should preserve authors names", and "we do not want people changing Festcat and making other people believe their modified version is our original version" (your modifications should be clearly marked as such), since that would lead to confusion.

You can read what you are allowed to do with any source file in the header of that file. You will see that, for some of those files, we the Festcat developers are not even their original authors. Like you, we took some of the Festival and Festvox sources and adapted them, preserving the original authors and complying with their license conditions. You can proceed in a similar way for your modifications to our sources.

Being more specific, when we carefully read the license header of those files, we saw a clause that said "Modifications should be clearly marked as such". Since we were making lots of changes, we decided to ship in our Festcat sources the original unmodified files besides our own customised changes. Similarly, if you copy and modify a Festcat file that has that clause to your new project, I suggest you keep the original unmodified Festcat file in your sources, so if any user or developer wants to understand your modifications he/she can easily compare your file with the original copy you also preserve.

If your project is also free open source software, complying with the license conditions will be straightforward.

In case you want your project to be "non free software" or closed source it might be still possible but a bit more challenging to comply with the licenses, and I would suggest in that case you to get professional legal advice (ask a lawyer).

Feel free to ask here if you have more doubts. The nice thing about free open source software is that when you receive your copy of the software the license gives you permission to do lots of things (even without explicitly asking us!). It is nice to know our code is useful to others, so we really appreciate you contacting and us.

I'm looking forward to seeing your progress.

Have a great day 👍

@Pallas1303
Copy link
Author

Very Thanks for your response, my project is a free software. I will read the licenses of files of Festcat, I will put your originals files and of project Festvox in my project.

I'am happy in get your response. Yes, I have varios doubt's about Festcat project and also doubt the Festvox project, that not I found responses.

Why encoding ISO-8889-1 in Festival Speech Synthesis?

In Fastcat and in others project of suport of language for Festival, are in ISO-8859-1 encode. By example, in Fastcat in the file "upc_catalan_tokenizer.scm", some characters are just possible to see in ISO-8859-1 encode.

I maked my voice using a phoneset in UTF-8 encode, and all files in UTF-8 encode. And be working normally, no errors. Why ISO-8859-1 encode Festival Speech Synthesis?

Sincerely waiting for your reply!

Have a good day! 🌞

@zeehio
Copy link
Member

zeehio commented Dec 5, 2023

I'am happy in get your response. Yes, I have varios doubt's about Festcat project and also doubt the Festvox project, that not I found responses.

Why encoding ISO-8889-1 in Festival Speech Synthesis?

In Fastcat and in others project of suport of language for Festival, are in ISO-8859-1 encode. By example, in Fastcat in the file "upc_catalan_tokenizer.scm", some characters are just possible to see in ISO-8859-1 encode.

I maked my voice using a phoneset in UTF-8 encode, and all files in UTF-8 encode. And be working normally, no errors. Why ISO-8859-1 encode Festival Speech Synthesis?

I would need to check, but I believe that Festival did not provide full UTF-8 support. Some parts of festival assumed "1 byte = 1 character" and either we used some weird hard-to-read notations everywhere or we used an 8 bit encoding. We chose the 8-bit encoding ISO-8859-15 because it was suitable for Catalan and it included the € currency sign.

Which parts of Festival didn't have UTF-8 support? I don't fully recall. Probably one is the regular expression module. The other I believe it was the hand-written letter to sound rules. http://festvox.org/bsv/x1431.html

If I recall correctly, other languages eventually managed to provide UTF-8 support, so I guess if they could do it, so can you. Back in the time when Festcat was built UTF-8 support existed but it wasn't that popular as it is now.

I once thought about porting Festcat to UTF-8, but since it would probably mean a significant amount of work and testing, and possibly breaking existing workflows of sight-limited people (using screen readers) I decided to let it be.

I hope this answer helps you.

Please feel free to ask and insist all you like 👍

@Pallas1303
Copy link
Author

Pallas1303 commented Dec 6, 2023

I understood, very thanks.

I'm added support for reading numbers and emoji. About numbers, get use in num2words command line.

In my code of tokenization:

(define (pallas_pt::number token name)
  "(pallas_pt::number token name)
Return list of words that pronounce this
number in pt."

;; Based in festival/lib/clunits_build.scm

;;Set the string with command for execute num2words, stdin in a file.
(set! command (format nil "num2words -l pt-BR %s > /tmp/n.txt" name))

;;Use system function to run the command.
(system command)

;;Set again the string, run a script in bash for normalization
(set! command "bash list2string.sh /tmp/n.txt /tmp/f.txt")

;;Run command
(system command)

;;Use function load for load and run code.
(load "/tmp/f.txt")

;;Returns the string with the result for list function.
(list (string-append OI))
)

My script list2string.sh

#!/bin/bash
#$1 is the file with num2word result
#$2 is output file with code in scheme.

N=$(cat $1)

echo "(set! OI /$N/)" | tr / '"' > $2

@Pallas1303
Copy link
Author

Pallas1303 commented Dec 6, 2023

The load function, load a file and run that same file. I not found a method for define a variable with the output of an external command.

A example of script list2string.sh output for number 98:

(set! OI "oitenta e nove")

Create a code for load function, defined a string "OI" with number conversion done by num2word.

Have a good day!

@Pallas1303
Copy link
Author

Pallas1303 commented Dec 6, 2023

I understood, very thanks.

I'm added support for reading numbers and emoji. About numbers, get use in num2words command line.

In my code of tokenization:

(define (pallas_pt::number token name)
  "(pallas_pt::number token name)
Return list of words that pronounce this
number in pt."

;; Based in festival/lib/clunits_build.scm

;;Set the string with command for execute num2words, stdin in a file.
(set! command (format nil "num2words -l pt-BR %s > /tmp/n.txt" name))

;;Use system function to run the command.
(system command)

;;Set again the string, run a script in bash for normalization
(set! command "bash list2string.sh /tmp/n.txt /tmp/f.txt")

;;Run command
(system command)

;;Use function load for load and run code.
(load "/tmp/f.txt")

;;Returns the string with the result for list function.
(list (string-append OI))
)

My script list2string.sh

#!/bin/bash
#$1 is the file with num2word result
#$2 is output file with code in scheme.

N=$(cat $1)

echo "(set! OI /$N/)" | tr / '"' > $2

About emoji, based this issue of NVDA

Have a list symbols and emojis with yours transcriptions. But no know how implement this, in Festival Speech Synthesis. A addend lexicon is a good idea? Are more of 1000 entries in this list.

@zeehio
Copy link
Member

zeehio commented Dec 9, 2023

With respect to emojis, since they can't be represented in ISO-8859-15 I can't support them. If you want to do anything about them it's up to you.

With respect to your approach converting numbers to words I can see several issues you would need to consider. If your solution is good for your requirements then don't worry about it.

  • performance: you are creating a process every time the function is called. You are creating a file every time it is called and you are reading that file as well. All that will contribute to the time it takes to generate the speech. Make sure it is within your requirements.
  • parallelization: multiple calls on independent TTS (text-to-speech) processes: If I am using your tts system to turn a book into speech I may want to process book chapters in parallel. Make sure that the file you created has a unique name if you want to allow for that use case.
  • make sure you delete tmp files after you do not need them
  • security: Make sure other users can't read that tmp file. If you want people using screen readers to use your tool you probably don't want their credit card numbers to be readable by other users.

Have you considered other tools besides festival? Festival was designed in the 1990s as a research tool, with flite to be used in production environments. Besides, there are deep learning based tools now providing better speech quality than the techniques available in festival.

I'm just saying because if I was starting Festcat today I would probably start by getting to know and evaluating the alternatives. It's up to you anyway

@Pallas1303
Copy link
Author

Hello! Thanks for your response.

With respect to emojis, since they can't be represented in ISO-8859-15 I can't support them. If you want to do anything about them it's up to you.

Without problems! Here, emoji are being currently read. I have in mind, how can processing this.

With respect to your approach converting numbers to words I can see several issues you would need to consider. If your solution is good for your requirements then don't worry about it.

  • performance: you are creating a process every time the function is called. You are creating a file every time it is called and you are reading that file as well. All that will contribute to the time it takes to generate the speech. Make sure it is within your requirements.
  • parallelization: multiple calls on independent TTS (text-to-speech) processes: If I am using your tts system to turn a book into speech I may want to process book chapters in parallel. Make sure that the file you created has a unique name if you want to allow for that use case.
  • make sure you delete tmp files after you do not need them
  • security: Make sure other users can't read that tmp file. If you want people using screen readers to use your tool you probably don't want their credit card numbers to be readable by other users.

This function of numbers to words, it will not be in the project. Because of two problems mentioned, it's just a very experimental thing. About this, I want to use the code of Festcat for this with also modified for this task.

Have you considered other tools besides festival? Festival was designed in the 1990s as a research tool, with flite to be used in production environments. Besides, there are deep learning based tools now providing better speech quality than the techniques available in festival.

I'm just saying because if I was starting Festcat today I would probably start by getting to know and evaluating the alternatives. It's up to you anyway

Yes, have various tools for Speech Synthesis. But I chose he, for the reasons that it is "easy" for me and I mainly want to create another TTS alternative for my language.

Sorry for my long delay in responding.

Have a good day! 🌞

@Pallas1303
Copy link
Author

Hello, how are things?

I have various things for ask and some things that I found out.

I get to use audios with sample rate different of 16 khz. How 48 ktz!

For generation of pitchmarck, I get use to REAPER. Had good results in synthesis of texts.

How you trained and implemented a model ngram for Pos Tagging? I not found document this, only a file of Pos Tagging for English language in Festival.

I am writer scripts for my project, a upgrade for my old repository that have. But, need of a licence for my own scripts. Have a suggestion of licence? Without writer name my institution.

@zeehio
Copy link
Member

zeehio commented Dec 18, 2023

Hello, how are things?

Good thanks!

I have various things for ask and some things that I found out.

I get to use audios with sample rate different of 16 khz. How 48 ktz!

That will improve the quality of the voice. It also increases the size of the model (especially in clunits and other concatenation-based voices) and the computational cost.

For generation of pitchmarck, I get use to REAPER. Had good results in synthesis of texts.

How you trained and implemented a model ngram for Pos Tagging? I not found document this, only a file of Pos Tagging for English language in Festival.

We had a tagged corpus available:

https://github.com/FestCat/festival-ca/tree/upstream/src/data/pos/ancora-ca

We use a bunch of perl scripts to process the data and eventually call ngram-build.

Everything is reproducible from the Makefile, although must say it is not easy to follow.

I am writer scripts for my project, a upgrade for my old repository that have. But, need of a licence for my own scripts. Have a suggestion of licence? Without writer name my institution.

There is a saying I like that says that "People write into a license the things they are scared of". There are very simple straightforward licenses such as the MIT license and there are much longer licenses such as the Apache license.

I have three pieces of advice:

  • Pick an OSI (Open Source Initiative) approved license
  • Do not write a license yourself
  • This is a good place to pick one https://choosealicense.com/

The license tells those who receive your software what are they allowed to do.

The copyright is the header that tells who owns the intellectual property of the software and what year it was last modified (the year is important since copyright expires, usually some years after the death of the copyright owners). "Who" is typically you or maybe our employer.

I hope this helped you a bit!

@Pallas1303
Copy link
Author

Hello, Good Year!!! How are things?

About POS, this information they will be very good in the future.

I updated my repository, with new information and project files. The name is FestPB. Unfortunately all the written part is in my language.

How you guys built the clunits voice "upc_ca_ona_clunits-1.2.tgz"? How many hours does the voice have?

I'm trying to build a great voice clunits (14 hours) from a speaker of the MLS - Multilingual LibriSpeech. But it has more than 450. 000 phoneme units, mainly ram and processing errors. I only used 2 hours, about 70,000 units. I have good results but I want to look for methods to train with more hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants