Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glitchy audio after 5 to 10 minutes #20

Open
roschler opened this issue Sep 8, 2020 · 20 comments
Open

Glitchy audio after 5 to 10 minutes #20

roschler opened this issue Sep 8, 2020 · 20 comments

Comments

@roschler
Copy link

roschler commented Sep 8, 2020

I have noticed in my scene that after about 5 to 10 minutes of generating audio, the audio gets "glitchy". In other words, after having my Sumerian Host characters (3 of them) talk for about 5 to 10 minutes straight, the audio starts to develop really bad ticks and pops in it, to the point it is unlistenable (think of a really old, scratchy vinyl record being played).. This usually ends up being an audio buffer handling problem somewhere. It happens every time and the symptoms are always the same. A few minutes of perfect audio, then a single tick or a pop here or there, and then once the ticks and pops start happening, it exponentially increases rapidly to the unlistenable stage to where it happens with every audio buffer being played.

I know this might be a Chrome thing, but just in case, is there a way to "patch" into the audio generation code of the Sumerian host library and play the audio with something else, in my case Howler.js, while maintaining the lip sync between the audio and the character/host animation?

@c-morten
Copy link

c-morten commented Sep 8, 2020

Hi @roschler, I'm happy to look into this. Can you provide some steps for how to reproduce this issue? If you had a transcript of host speech that would cause this to occur that would be really helpful. Also would be good to get some information about your browser and version and which rendering engine you're using. Have you tried using a different browser like Firefox to see if the issue still occurs?

For the audio implementation, we are using Web Audio. For the three.js build we create an Audio object and either a THREE.PositionalAudio or THREE.Audio object, then connect the Audio object to the three.js Audio object using Audio.setMediaElementSource. In Babylon.js the rendering engine handles the creation of the web audio object, we just pass the url to the constructor of BABYLON.Sound. If you wanted to circumvent the host audio I can think of two possibilities. The first option is a little hacky but quicker to implement. You could call the setVolume method of the TextToSpeechFeature to set it to 0. Then you could listen for the TextToSpeechFeature's play event, which will supply a Speech object as an argument to your listener function. When you catch the event you could immediately pause speech and use the Speech object's audio property to get a handle to the Web Audio object, which will point you to the url the audio was loaded from. Use that to create your Howler.js audio and then play your resulting audio once it's ready at the same time as resuming speech on the host. The second option would be to pull the repository and create your own custom build that overloads the speech implementation. TextToSpeechFeature._synthesizeAudio is where you'd need to create your custom audio. You may also need to overload play/pause/resume/stop of the Speech class depending on how Howler.js audio works. I can provide more details on the second option if you do want to try that.

@roschler
Copy link
Author

roschler commented Sep 8, 2020

@c-morten The text doesn't matter. I've run a lot of tests. To test for yourself, just grab 5 to 10 minutes worth of text off the web, anywhere, and just keep generating TTS with the host TextToSpeechFeature facility with it. It's a quantity thing.

A big thanks for the audio internals details. Hopefully it doesn't come to that (at least for now). Eventually I'll want that to replace the audio anyways but hopefully not for a lon gitme. I'd like to apply volume and sound effects to the voices eventurally and I don't think there's a way to do that with the current library. Note, it would be nice if there was an easy way to swap out control of the audio so that once the audio needs to be started in sync with the viseme stream to effect lip sync TTS, the audio side of the things could be handed off to a consumer provided callback.

For now, I'm going to try the same test on my other stations. Hopefully it's an Ubuntu 14.04 audio driver issue and nothing else. That's an old Linux build.

@roschler
Copy link
Author

roschler commented Sep 9, 2020

@c-morten Does the Sumerian Hosts use WAV or MP3 generated audio when creating TTS through Polly via the TextToSpeechFeature._synthesizeAudio call? I found this Stack Overflow post that mentions crackling audio when using WAV formatted audio and suggests switching to MPE:

https://stackoverflow.com/questions/6955957/html5-audio-crackle-in-chrome

@c-morten
Copy link

c-morten commented Sep 9, 2020

The audio format is specified in the options you pass in when adding the TextToSpeechFeature, or when you play speech. If you don't define it we default to MP3, so you most likely were not getting WAV audio.

@roschler
Copy link
Author

roschler commented Sep 9, 2020

Ok, thanks. I was hoping it was WAV. Looks like I'm going to have fork and dig deeper. It happens on all stations.

@c-morten
Copy link

I have not yet been able to reproduce this, I have run test audio for over 30 minutes straight on all 3 builds with no issues yet. Can I get more information on your test scenario:

  • Which build are you using?
  • Is the browser and tab that's playing the audio active for the entire time leading up to when the issue occurs?
  • When it starts happening, do you notice any memory spikes in the console?
  • Do you encounter the same issue when using Firefox instead of Chrome

@roschler
Copy link
Author

roschler commented Sep 10, 2020 via email

@roschler
Copy link
Author

@c-morten Just tested on FireFox. Happens with FireFox too, same pattern too.

Where do I look to give you the correct answer to "what build are you using?"?

Regarding memory spikes, do you mean in the main system monitor or in the Chrome Task Browser (i.e. - Chrome's internal system monitor)?

Here's a note, not related to the audio crackling. Just a general comment about Sumerian Hosts audio on FireFox compared to Chrome. On Chrome, before the crackling occurs, the audio is smooth. On FireFox, the audio seems to get "clipped" at the start and the end of the waveform. If you have ever worked with music gear it feels like a noise gate with the volume threshold set too high, so when the audio starts there's an abrupt jump from no sound to some sound instead of a gentle, smooth easing in like sound normally does.

@roschler
Copy link
Author

@c-morten I watched memory/CPU/GPU in both the main System Monitor (Windows 8) and the Chrome Task Manager. Memory did not jump around much, but I did see something strange. When the audio was smooth, the CPU% was around 46% and then dropped to about 5% when the host animation/audio playing stopped. However, when the audio started crackling, especially heavily, the CPU was around 79% or worse. Also, after the scene stopped, the CPU stayed at that same high consumption level instead of dropping precipitously like it usually does after a scene stops. It's as if something in the browser is stuck doing something and won't stop.

This is wild speculation, but if for some reason some audio rendering process got stuck, then further attempts to play audio could easily cause crackling since the audio buffers would not be delivered properly with gaps between their delivery. This would get worse with each attempt if each attempt added another stuck audio process on the "stack".

@roschler
Copy link
Author

@c-morten I found a tutorial on debugging web audio problems using Chrome DevTools, especially in regards to crackling:

https://web.dev/profiling-web-audio-apps-in-chrome/

Here are some screenshots showing the performance metrics before and after crackling has begun. I have drawn boxes around the stats that are most notable (to me):

VIEW: tracing

SECTOR: AudioOutputDevice

 PHASE: Before Crackling Has Begun

image

 PHASE: During Crackling

image

NOTE: For the wasapi_render_thread, I didn't see any glaring differences, but when I look at the average durations the load appears to be about 25% greater during crackling compared to before crackling.

VIEW: tracing
SECTOR: wasapi_render_thread

 PHASE: Before Crackling Has Begun

image

 PHASE: During Crackling

image

VIEW: WebAudio Tools

NOTE: Look at the status line at the bottom of the screen for each of the following screenshots.

 PHASE: IDLE (i.e. - baseline, **before** any audio rendering has begun)
 NOTE: All values in the status line are zero.

image

 PHASE: ACTIVE (i.e. - actively rendering scene and audio, but **before** crackling has begun)

image

 PHASE: DURING CRACKLING (i.e. - the scene is rendering and crackling has made the audio unlistenable)

image

 PHASE: IDLE, AFTER CRACKLING HAS BEGUN (i.e. - the scene is no longer rendering, after crackling has made the audio unlistenable)

image

As you can see, the audio rendering system is completely damaged. I tried the trash can icon to execute an explicit garbage collection operation, and it did not help at all, no change. Note, the tutorial I linked to above also has tips on how to restructure audio rendering code to try and correct problems that might be causing the audio rendering difficulties. Let me know if you need anything else.

@c-morten
Copy link

Thanks for the link, I'll try debugging this way. In regards to figuring out which build you are on, are you using host.three.js or host.babylon.js? These would either be referenced in a script tag in your html file or you would have installed amazon-sumerian-hosts via npm and imported one of those.

@roschler
Copy link
Author

@c-morten

Here's the package.json reference for amazon-sumerian-hosts:

  "devDependencies": {
    "amazon-sumerian-hosts": "^1.3.1"
  }

I am using host.three.js.

@roschler roschler changed the title Glitchy audio after 5 to 10 minues Glitchy audio after 5 to 10 minutes Sep 18, 2020
@roschler
Copy link
Author

Any updates? I still have this problem and it happens consistently.

@c-morten
Copy link

I have not had much luck reproducing this yet, it's not happening for me within even 30 minutes so it's difficult to know how long I need to let things run before calling it quits. Since it is happening consistently for you, there are a few things I would want to test that you might give a try, it would be good to know your results:

  • Can you reproduce this using three.js traditional audio rather than positional audio? To do this, do not define the attachTo property of the options object you pass when creating the TextToSpeechFeature. If this option is not defined it will default to creating a three.js Audio object rather than a PositionalAudio object.

  • A little more involved, but can you reproduce this using the host.babylon.js build rather than host.three.js? Trying to determine if this is specific to the rendering engine audio system since hosts hook into the audio system of the rendering engine being used.

  • Last resort, I would try generating audio files for the dialog you are passing to the host system using the AWS Polly console. Then create an application that uses three.js without the host package and play that audio in sequence using the three.js audio system. Does this reproduce the issue?

@roschler
Copy link
Author

"Can you reproduce this using three.js traditional audio rather than positional audio? To do this, do not define the attachTo property of the options object you pass when creating the TextToSpeechFeature. If this option is not defined it will default to creating a three.js Audio object rather than a PositionalAudio object."

Thanks. I'll give that a try. I don't have to to do the host.babylon.js test at this time because that would be a massive refactor. But I'll try disabling positional audio as you suggest.

BTW, I found this interesting post that describes problems with :

https://bugs.chromium.org/p/chromium/issues/detail?id=175363

I'm not sure if this is relevant but this and other posts I found describes problems with the user of scriptProcessorNode that can cause crackling audio.

@c-morten
Copy link

I'm taking a wild guess here, but I'm thinking there may just be too much audio stored if you are continually playing dialog for long periods of time. We don't have any system in place for managing the storage of audio you are creating, but maybe you could set up a test to confirm whether or not this is actually the case. You will need to access internal host variables to get to the place where the host audio is stored. Assuming you have a HostObject variable named host, we store the speech audio that gets generated in the following location: host._features.TextToSpeechFeature._speechCache. Try setting up a keyboard event to set this variable to {}, then execute that keyboard event once you hear the audio crackling. Monitor the memory to try to determine when the next garbage collection happens after executing that event. Does the next piece of audio that plays after garbage collection happens play back normally?

@c-morten
Copy link

I was just scanning through the three.js audio documentation and I noticed there’s a mistake in our three.html example file, I’m wondering if it may be causing your issue. How closely are you following the example code? In our createHost method we’re creating a separate THREE.AudioListener instance for each host. However the three.js docs state that there’s only meant to be one listener per scene. If you are also using multiple listeners, try using just one instead.

@roschler
Copy link
Author

To set up the hosts I'm using the code from examples. I just checked my code and indeed three audioListener objects are being added to the camera object (odd place to add a listener object, don't you think?). I'm going to move that code out of the per-host set up code to the scene initialization stage and only do that operation only once. I'll tell you how it goes tomorrow.

@roschler
Copy link
Author

@c-morten The audioListener idea was helpful but I don't think it solved the original problem. I say this because now that I only create one audioListener object instead of 3, the glitchy audio still occurs, it just takes 3 times longer to start degrading. This is a big help but I would still like to get rid of the problem completely. When I get the chance I'll try your cache clean-up idea.

Side note. How can I get a list of the emotes? I looked at the emote.glb file but that's in a format that is not readable by a standard editor. When I try and open it I see non-ASCII characters. I see the animations in the gestures.json file that exists for each character, but not the emotes? Does the "Alien" character only have the one "angry" emote?

@c-morten
Copy link

Hi @roschler. The .glb format is viewable in DCC applications like blender. You can also import them into glTF viewers like https://gltf-viewer.donmccurdy.com/ and https://sandbox.babylonjs.com/ to be able to preview the names of animations contained within. Currently the "Alien" character only has the "angry" emote, that character has a more limited animation set because it was used as a test to prove out that we could use the PointOfInterestFeature on characters whose rigs have varying proportions and joint orientations/names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants