Skip to content

dTelecom/sdk-ai-bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice AI Bot SDK for dTelecom (LiveKit‑based)

This SDK makes it easy to connect a voice AI bot to a dTelecom room. It builds a streaming pipeline from participants’ audio: speech recognition → LLM text processing → speech synthesis, and publishes the response back to the room as an Opus track.

Features

  • Connect a bot to a dTelecom/LiveKit room via URL + ROOM_TOKEN.
  • Processing pipeline: STT (Deepgram) → LLM (ChatGPT) → TTS (Deepgram).
  • Multi‑participant support: the bot listens to participants’ microphones and replies with synthesized voice.
  • Flexible extensibility via SpeechToText, TextProcessor, TextToSpeech interfaces and agent constructor options.

Requirements

  • Go 1.24+
  • dTelecom/LiveKit account and a room token (ROOM_TOKEN).
  • API keys:
    • DEEPGRAM_API_KEY — for Deepgram STT/TTS
    • CHATGPT_API_KEY — for the text processor (ChatGPT)

Installation

Add the module to your project:

go get github.com/dTelecom/sdk-ai-bot

If your project fails to resolve Deepgram due to forked modules, add this replace to your project’s go.mod:

replace github.com/deepgram/deepgram-go-sdk/v3 => github.com/dTelecom/deepgram-go-sdk/v3 v3.5.1-0.20251012194105-df6ec5cf4d79

This SDK already uses that replace internally; adding it to your app ensures consistent resolution when your build tooling vendors or overrides module graph.

Environment variables

Note: the included examples use godotenv and expect a .env file for convenience. Your own application can source these values any way you prefer (a .env file is not required).

Create a .env file in the example directory (or your app root) or set environment variables directly:

DTELECOM_URL=...          # your dTelecom server URL
ROOM_TOKEN=...            # dTelecom room token
DEEPGRAM_API_KEY=...      # Deepgram API key
CHATGPT_API_KEY=...       # OpenAI (ChatGPT) API key

Quick start (connect an agent to a room)

The simplest example is in examples/default_agent.

Run:

cd examples/default_agent
go run .

Examples read the URL from the DTELECOM_URL env var. Set it to your own deployment.

What the example does:

  1. Loads .env via godotenv (examples) and initializes the Deepgram SDK (logging).
  2. Creates agent.New(logger) with default pipeline (Deepgram STT, ChatGPT, Deepgram TTS).
  3. Calls a.Connect(url, ROOM_TOKEN), publishes a local Opus track, and starts listening to participants.

Example: agent with custom prompt

examples/agent_with_prompt shows how to pass your own TextProcessor to agent.New via options:

textProcessor, _ := buildTextProcessor(logger) // ChatGPT with SystemPrompt
a, _ := agent.New(logger, agent.WithTextProcessor(textProcessor))
a.Connect(os.Getenv("DTELECOM_URL"), os.Getenv("ROOM_TOKEN"))

The buildTextProcessor function configures a system prompt and uses CHATGPT_API_KEY.

Example: local pipeline (no LiveKit)

examples/pipeline demonstrates a pure local pipeline without connecting to a room: microphone → STT → ChatGPT → TTS → local playback.

Run:

cd examples/pipeline
go run .

Public API

Agent

type Agent struct { /* ... */ }

func New(logger *zap.Logger, options ...Option) (*Agent, error)
func (a *Agent) Connect(url, token string) error
  • New — builds the pipeline from components (Deepgram STT, ChatGPT, Deepgram TTS by default) or accepts your implementations via options.
  • Connect — connects to the room, publishes a local Opus track, and subscribes to participants’ audio. Each participant’s audio flows through the pipeline; responses are synthesized and sent back to the room.

Pipeline (pkg.Pipeline)

type Pipeline struct { /* ... */ }

func NewPipeline(stt SpeechToText, tp TextProcessor, tts TextToSpeech) *Pipeline
func (p *Pipeline) Start(ctx context.Context) (<-chan AudioChunk, error)
func (p *Pipeline) AddParticipant(ctx context.Context, name string, chunks <-chan AudioChunk) error
  • Start — starts processing and returns the bot’s audio chunk channel (Opus or PCM depending on TTS/transcoder).
  • AddParticipant — adds a participant: audio stream → STT → phrase accumulation via speech start/end control tokens → questions go to TextProcessor.

Interfaces for extensibility

type SpeechToText interface {
    Transcribe(ctx context.Context, r <-chan AudioChunk) (<-chan SpeechChunk, error)
}

type TextProcessor interface {
    Process(ctx context.Context, question <-chan TextChunk) (<-chan TextChunk, error)
}

type TextToSpeech interface {
    Synthesize(ctx context.Context, text <-chan TextChunk) (<-chan AudioChunk, error)
}

Implement these interfaces to swap out Deepgram/ChatGPT for other providers. For the agent, use options:

agent.WithSTT(customSTT)
agent.WithTextProcessor(customTP)
agent.WithTTS(customTTS)

Running tests

The project includes unit and integration tests for STT/TTS components and utilities. Run:

go test ./...

Integration tests for Deepgram and transcoders may require valid API keys and audio files from test_data.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages