allowing nested calls after recognition #12

ezavesky · 2019-09-05T12:14:57Z

I would like to chain additional processing steps after the recognition has been completed. This allows the inclusion of other cool things to be executed on top of the speech alone: sentiment analysis, topic understanding, speaker detection, etc.

Here's a rough sketch of the concept...

Each may follow their own "server" file that launches a new service so that they don't complicate the existing single-server architecture
Each would communicate over web calls (REST) to avoid process confusion; in the future, we could expand it to be something more rigorous like a message queue.
Each can communicate via stored JSON/metadata or audio files written to disk
Each can "register" itself with the main speech server as a secondary process on start-up. For example, the "speaker detection" module will (a) launch it's own service, (b) register with primary server, (c) accept REST calls and reply with JSON / text as required
Just one example, but each module could leverage other OSS like uis-rnn or pyannote-audio (both taken from this great repo of examples)

Seeking opinions at this point with more details to be flushed out later. Of course, eventually we may convert this suite into a package (e.g. satisfying #2), but that's not paramount right now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allowing nested calls after recognition #12

allowing nested calls after recognition #12

ezavesky commented Sep 5, 2019

allowing nested calls after recognition #12

allowing nested calls after recognition #12

Comments

ezavesky commented Sep 5, 2019