Skip to content

Darshan0312/Voice_Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice Agent Core

A flexible, conversational voice companion bot framework for Python. Easily build your own AI assistant by plugging in any LLM (OpenAI, Gemini, etc.) and using built-in voice tools. Great for personal productivity, home automation, or just having a friendly AI to talk to!

Features

  • Conversational AI: Integrate any LLM (OpenAI, Gemini, etc.) for smart, natural conversations.
  • Speech Recognition: Uses Whisper and SpeechRecognition for accurate voice input.
  • Text-to-Speech: Responds with high-quality voice using TTS APIs and local fallback.
  • Extensible Tools: Add your own Python functions as tools (play music, check weather, control apps, etc.).
  • Easy API: Just provide an LLM handler and start your bot!

Requirements

  • Python 3.8+
  • System dependencies for audio:
    • Linux: sudo apt-get install portaudio19-dev ffmpeg
    • macOS: brew install portaudio ffmpeg
    • Windows: Install FFmpeg and ensure it's in your PATH.

Installation

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install voice-agent-core

Quick Start: Your Own Companion Bot

Create a Python file (e.g., my_bot.py):

from voice_agent_core import VoiceCompanionBot, get_tools
import datetime

def my_llm_handler(text):
    if "date" in text or "time" in text:
        now = datetime.datetime.now()
        return {"type": "text_response", "content": f"The current date and time is: {now}"}
    else:
        return {"type": "text_response", "content": "I am your companion bot! You said: " + text}

bot = VoiceCompanionBot(llm_handler=my_llm_handler, tools=get_tools())
bot.listen_and_respond()

Run it:

python my_bot.py

Speak to your bot! It will respond with the date/time or echo your message.

Advanced: Use Any LLM (OpenAI Function Calling Example)

To let your LLM automatically call tools (like playing YouTube or checking weather), use OpenAI's function calling feature:

from dotenv import load_dotenv
load_dotenv()

from voice_agent_core import VoiceCompanionBot, get_tools
import openai
import os

openai.api_key = os.getenv("OPENAI_API_KEY")

functions = [
    {
        "name": "play_on_youtube",
        "description": "Plays a video or song on YouTube.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The name of the song or video to play."}
            },
            "required": ["query"]
        }
    },
    {
        "name": "pause_or_resume",
        "description": "Pauses or resumes the currently playing media by simulating a spacebar press.",
        "parameters": {"type": "object", "properties": {}}
    },
    {
        "name": "stop_current_task",
        "description": "Stops the current task by closing the active tab in the browser (Ctrl+W).",
        "parameters": {"type": "object", "properties": {}}
    },
    {
        "name": "open_website",
        "description": "Opens a website in the default browser given a valid URL.",
        "parameters": {
            "type": "object",
            "properties": {
                "url": {"type": "string", "description": "The full URL of the website to open. Must start with http or https."}
            },
            "required": ["url"]
        }
    },
    {
        "name": "search_google",
        "description": "Searches for a query on Google.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The topic or question to search for on Google."}
            },
            "required": ["query"]
        }
    },
    {
        "name": "open_vscode",
        "description": "Opens the Visual Studio Code application.",
        "parameters": {"type": "object", "properties": {}}
    },
    {
        "name": "get_weather",
        "description": "Fetches the current weather for a specified location using the OpenWeatherMap API.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city name to get the weather for. For example: 'London' or 'Tokyo'."}
            },
            "required": ["location"]
        }
    }
]

system_prompt = (
    "You are a helpful, friendly voice companion. "
    "If the user asks to play something on YouTube, call the function 'play_on_youtube' with the song or video name as the 'query' argument. "
    "You can also call other tools for media, weather, websites, and more."
)

def openai_llm_handler(text):
    response = openai.ChatCompletion.create(
        model="gpt-4o-mini-2024-07-18",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": text}
        ],
        functions=functions,
        function_call="auto"
    )
    message = response.choices[0].message
    if hasattr(message, "function_call") and message.function_call:
        import json
        name = message.function_call.name
        args = message.function_call.arguments
        args = json.loads(args) if isinstance(args, str) else args
        return {"type": "function_call", "call": {"name": name, "args": args}}
    else:
        return {"type": "text_response", "content": message['content']}

bot = VoiceCompanionBot(llm_handler=openai_llm_handler, tools=get_tools())
bot.listen_and_respond()

How it works:

  • Each tool is a Python function (see actions.py).
  • The LLM can call any tool by name and arguments using OpenAI function calling.
  • Add your own tools by writing a function and adding its schema to the functions list.

License

MIT


For more details, see the API reference and examples above. Enjoy building your own AI companion!

About

I build a Voice agent using gemini llm and other tools we can use to access our system moslty like jarvis kind of agent.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages