Skip to content

The controlability and security of the pure audio modality speech in/out generation #32

@JohnHerry

Description

@JohnHerry

Thanks for the job. this is the first pure speech modal conversation I have read. but there is some question in my mind.
currenty speech conversational synthesize jobs are bridged with text, so that it can use NLP- LLM as the brain. many security and controlibility are done in the LLM model.
but in the prue audio speech-to-speech, there is no such convenient intermedia, the 8Hz speech codes are unreadable and are not semantic centered. so how to control the response content of the hertz model?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions