diff --git a/fern/customization/speech-configuration.mdx b/fern/customization/speech-configuration.mdx index 9a34243de..607112056 100644 --- a/fern/customization/speech-configuration.mdx +++ b/fern/customization/speech-configuration.mdx @@ -23,7 +23,16 @@ This plan defines the parameters for when the assistant begins speaking after th ![LiveKit Smart Endpointing Configuration](../static/images/advanced-tab/livekit-smart-endpointing.png) - **Example:** In insurance claims, Vapi's smart endpointing helps avoid interruptions while customers think through complex responses. For instance, when the assistant asks "do you want a loan," the system can intelligently wait for the complete response rather than interrupting after the initial "yes" or "no." For responses requiring number sequences like "What's your account number?", the system can detect natural pauses between digits without prematurely ending the customer's turn to speak. + **LiveKit Smart Endpointing Configuration:** + When using LiveKit, you can customize the `waitFunction` parameter which determines how long the bot will wait to start speaking based on the likelihood that the user has finished speaking: + + ``` + waitFunction: "200 + 8000 * x" + ``` + + This function maps probabilities (0-1) to milliseconds of wait time. A probability of 0 means high confidence the caller has stopped speaking, while 1 means high confidence they're still speaking. The default function (`200 + 8000 * x`) creates a wait time between 200ms (when x=0) and 8200ms (when x=1). You can customize this with your own mathematical expression, such as `4000 * (1 - cos(pi * x))` for a different response curve. + + **Example:** In insurance claims, smart endpointing helps avoid interruptions while customers think through complex responses. For instance, when the assistant asks "do you want a loan," the system can intelligently wait for the complete response rather than interrupting after the initial "yes" or "no." For responses requiring number sequences like "What's your account number?", the system can detect natural pauses between digits without prematurely ending the customer's turn to speak. - **Transcription-Based Detection**: Customize how the assistant determines that the customer has stopped speaking based on what they’re saying. This offers more control over the timing. **Example:** When a customer says, "My account number is 123456789, I want to transfer $500." - The system detects the number "123456789" and waits for 0.5 seconds (`WaitSeconds`) to ensure the customer isn't still speaking. diff --git a/fern/docs.yml b/fern/docs.yml index 9d771ec44..2d4df5bdc 100644 --- a/fern/docs.yml +++ b/fern/docs.yml @@ -276,6 +276,8 @@ navigation: path: providers/voice/rimeai.mdx - page: Deepgram path: providers/voice/deepgram.mdx + - page: Sesame + path: providers/voice/sesame.mdx - section: Video Models contents: diff --git a/fern/providers/voice/sesame.mdx b/fern/providers/voice/sesame.mdx new file mode 100644 index 000000000..1752c528e --- /dev/null +++ b/fern/providers/voice/sesame.mdx @@ -0,0 +1,32 @@ +--- +title: Sesame +subtitle: What is Sesame CSM-1B? +slug: providers/voice/sesame +--- + +**What is Sesame CSM-1B?** + +Sesame CSM-1B is an open source text-to-speech (TTS) model that Vapi hosts for seamless integration into your voice applications. Currently in beta, this model delivers natural-sounding speech synthesis with a single default voice option. + +**Key Features:** + +- **Vapi-Hosted Solution**: Access this open source model directly through Vapi without managing your own infrastructure +- **Single Default Voice**: Currently offers one voice option optimized for clarity and naturalness +- **Beta Release**: Early access to this emerging TTS technology + +**Integration Benefits:** + +- Simplified setup with no need to self-host the model +- Consistent performance through Vapi's optimized infrastructure +- Seamless compatibility with all Vapi voice applications + +**Use Cases:** + +- Virtual assistants and conversational AI +- Content narration and audio generation +- Interactive voice applications +- Prototyping voice-driven experiences + +**Current Limitations:** + +As this is a beta release, the model currently offers limited customization options with only one default voice available. Additional features and voice options may be introduced in future updates. \ No newline at end of file