Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolves #118: text to speech #119

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

Resolves #118: text to speech #119

wants to merge 18 commits into from

Conversation

Baret
Copy link
Owner

@Baret Baret commented Oct 20, 2020

I stumbled upon a text-to-speech library and thought about letting it read out the radio calls. And well, a little tinkering and... here we are :)

@Baret
Copy link
Owner Author

Baret commented Oct 26, 2020

Please have a review (before I can't stop myself and put even more in this "just trying smth real quick"-issue :D).
The different mary versions are required because in 5.2 the rate effect does not accelerate the voice output (aka the effect does not work).

Unfortunately currently the game is only startable from the IDE. When building with maven and starting the application-0.2.0-SNAPSHOT-jar-with-dependencies.jar you get a stacktrace:

Exception in thread "main" java.lang.ExceptionInInitializerError
        at de.gleex.pltcmd.game.application.Main.run(main.kt:58)
        at de.gleex.pltcmd.game.application.MainKt.main(main.kt:43)
        at de.gleex.pltcmd.game.application.MainKt.main(main.kt)
Caused by: marytts.exceptions.MaryConfigurationException: Cannot start MARY server
        at marytts.LocalMaryInterface.<init>(LocalMaryInterface.java:66)
        at de.gleex.pltcmd.game.ui.sound.speech.Speaker.<clinit>(Speaker.kt:79)
        ... 3 more
Caused by: java.lang.NullPointerException
        at marytts.util.MaryRuntimeUtils.ensureMaryStarted(MaryRuntimeUtils.java:70)
        at marytts.LocalMaryInterface.<init>(LocalMaryInterface.java:64)
        ... 4 more

:(

Hint: for proper testing/debugging you should adjust the logback.xml:

    <logger name="de.gleex.pltcmd.game.ui" level="INFO"/>
    <logger name="de.gleex.pltcmd.game.sound" level="DEBUG"/>

@Baret Baret requested a review from Loomie October 26, 2020 17:06
@Baret Baret added this to In Progress in pltcmd via automation Oct 26, 2020
@Baret Baret moved this from In Progress to Review in pltcmd Oct 27, 2020
@Baret
Copy link
Owner Author

Baret commented Oct 28, 2020

Done.

BTW as a future improvement I thought about "randomized effects". To not make it too monotonous have slight variations in the used effects for every voice line, for example adjust the rate to have slightly faster or slower replay. Or even better: Derive some pitching or equal effect from the callsign so that every element has "its own unique voice". But that is something we can add later.

That is the reason why the filename consists of the text and the effect hash. Actually the speech example first had the option to set different effects to directly compare how they sound on the same text. And it always generated one wav file for a voice line and changing the effect did nothing :D

import javax.sound.sampled.SourceDataLine

/**
* Use this object for text-to-speech. All strings given to [say] wll be read aloud one after another. This means
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Use this object for text-to-speech. All strings given to [say] wll be read aloud one after another. This means
* Use this object for text-to-speech. All strings given to [say] will be read aloud one after another. This means


playLoop = GlobalScope.launch {
log.debug("Launching play loop")
while (isActive) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be always true as "Stopped play loop" is never logged! I think an own scope is needed that can be cancelled.

log.debug("Ignored phrase detected, not saying '$text'")
return
}
val speechDirectory = File(FOLDER_SPEECH_FILES)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File is soo before Java 7. Path and Files is the new I/O 2.

But kotlin.io doesn't seem to know that!?


// reuse existing texts if they had the same effects
mary.audioEffects = effects.toString()
val path = "${speechDirectory.absolutePath}/${text.hashCode()}_${mary.audioEffects.hashCode()}.wav"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, use a model class (File/Path) instead of String! So you don't have to use the os specific path separator '/ '.


// reuse existing texts if they had the same effects
mary.audioEffects = effects.toString()
val path = "${speechDirectory.absolutePath}/${text.hashCode()}_${mary.audioEffects.hashCode()}.wav"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hashCode() method returns the same hash only in the same JRE runtime instance! It is not fix for multiple application startups!

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are deleted at (normal) JVM-termination anyways ;)

}
soundFile.deleteOnExit()

filenames.send(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use soundFile instead of path as String is not very useful.

log.debug("Starting MaryTTS engine...")
mary = LocalMaryInterface()
if(Mary.currentState() == Mary.STATE_RUNNING) {
log.debug("MaryTTS started.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mention in the KDoc that the startup may take some time, it would be nice to log the time it took. For me currently the difference between the two log messages is about 1.500 ms.

@@ -50,6 +55,8 @@ open class Main {
open fun run() {
val application = SwingApplications.startApplication(UiOptions.buildAppConfig())

Speaker.startup()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this takes some time consider calling this asynchronous!

private fun play(filename: String) {
val soundFile = File(filename)
if (soundFile.exists()) {
val audioStream = AudioSystem.getAudioInputStream(soundFile)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check if reading files is faster than calculating the text-to-speech. Because disk i/o is relatively slow. But we could also cache the bytes in memory (for example with a fixed 10 MiB pool). But this are performance considerations.


it.drain()
}
} finally {
Copy link
Collaborator

@Loomie Loomie Oct 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you want to lock the audio when in use. So instead of an AtomicBoolean I would suggest a corresponding mutual exclusion functionality from java.util.concurrent. Maybe a Semaphore is more approtiate as a Lock. The benefit is that you can just wait for it to be freed instead of the while-delay in waitForQueueToEmpty().

Edit: This looks like a Guarded Block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: maybe we can just use Jobs? They can be waited for (joined).

it.start()

var count = 0
val buffer = ByteArray(4096)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 4096 instead of sourceDataLine.getBufferSize()?

@ExperimentalCoroutinesApi
object Speaker {

private const val FOLDER_SPEECH_FILES = "./speech"
Copy link
Collaborator

@Loomie Loomie Oct 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the os specific "./" and just use the name. Or provide a path object (see below)

Copy link
Collaborator

@Loomie Loomie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks and works good! For some implementation detail improvements see the comments.

@Baret Baret added the feature A completely new feature label Nov 26, 2020
@Baret Baret linked an issue Nov 26, 2020 that may be closed by this pull request
…peech

# Conflicts:
#	game/application/src/main/resources/logback.xml
#	game/ui/src/main/kotlin/de/gleex/pltcmd/game/ui/views/GameView.kt
@Baret Baret moved this from Review to In Progress in pltcmd Jan 4, 2021
# Conflicts:
#	game/application/src/main/resources/logback.xml
@Baret Baret removed this from In Progress in pltcmd Feb 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A completely new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Text to Speech
2 participants