The controlability and security of the pure audio modality speech in/out generation

Thanks for the job.   this is the first pure speech modal conversation I have read.  but there is some question in my mind.  
currenty speech conversational synthesize jobs are bridged with text,  so that it can use NLP- LLM as the brain.  many security and controlibility are done in the LLM model.   
but in the prue audio speech-to-speech,  there is no such convenient intermedia,  the 8Hz speech codes are unreadable and are not semantic centered. so how to control the response content of the hertz model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The controlability and security of the pure audio modality speech in/out generation #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The controlability and security of the pure audio modality speech in/out generation #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions