Timing tweaks to improve bot accuracy #8
+28
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
During some experimentation using Nexmo with Amazon Lex, we found that it was difficult to have a fluid two-way conversation with the bot. We would encounter issues such as double responses, commands getting cut off, and unpredictable delays.
The lex-connector server currently processes both the user's command and Lex's response in the same thread. This means that incoming speech was not being processed while a response was playing. It seemed that this caused commands to get queued up. If you weren't completely silent while Lex was playing its response, then any noise in the meantime might get misinterpreted as a command, leading to a confusing conversation. The inherent delay in the phone connection also made this worse, since it made it hard to tell which command led to which response.
This PR moves the processing of a Lex response to a background thread, so that incoming speech data can be processed from the buffer. Incoming speech, however, is discarded while a response is playing and for another 0.5 seconds afterwards (accounting for the phone delay), so that nothing the user says during a response might cut it off.
Another tweak is to maintain a few frames at the beginning of a silent segment indicating the end of a command. Sometimes the tail end of a spoken command would get interpreted as "silence" and got cut off.