# 🗣️ Talk2Find: Voice-Driven Knowledge Agent (with Synthetic Audio)

This notebook demonstrates a voice-driven knowledge agent using PraisonAI Agents. It generates a valid WAV file from text using gTTS and pydub, transcribes the audio with SpeechRecognition, and then uses PraisonAI Agents to answer the spoken question. This workflow is robust, reproducible, and does not require manual audio uploads.

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Dhivya-Bharathy/PraisonAI/blob/main/examples/cookbooks/talk2find_voice_agent_demo.ipynb)


# Dependencies

In [None]:
!pip install praisonaiagents openai SpeechRecognition pydub gTTS --quiet

# Set API Key

In [2]:
import os
os.environ["OPENAI_API_KEY"] = "Enter your api key"  # <-- Replace with your actual OpenAI API key

# Tools

In [3]:
from praisonaiagents import Agent
import speech_recognition as sr
from gtts import gTTS
from pydub import AudioSegment
from IPython.display import Audio

# Generate a Valid WAV File from Text

In [4]:
# Step 1: Define the text you want to convert to speech
text = "What time is the meeting?"

# Step 2: Generate speech from text and save as MP3
tts = gTTS(text=text, lang='en')
tts.save("question.mp3")

# Step 3: Convert MP3 to WAV using pydub
sound = AudioSegment.from_mp3("question.mp3")
sound.export("question.wav", format="wav")

print("✅ WAV file 'question.wav' has been created.")

# Optional: Play the audio (works in Jupyter/Colab)
try:
    display(Audio("question.wav"))
except:
    print("You can now play 'question.wav' manually.")

✅ WAV file 'question.wav' has been created.


# YAML Prompt (Role, Goal, Instructions)

In [5]:
ROLE = "Voice-driven knowledge assistant. Expert in understanding spoken queries and providing accurate, concise answers."
GOAL = "Transcribe the user's voice input and answer their question or find relevant information."
INSTRUCTIONS = "Given a transcribed user question, provide a clear, factual, and helpful answer. If the question is unclear, ask for clarification."

# Main (Transcription + Agent)

In [6]:
def transcribe_audio(audio_file):
    recognizer = sr.Recognizer()
    with sr.AudioFile(audio_file) as source:
        audio = recognizer.record(source)
    try:
        return recognizer.recognize_google(audio)
    except Exception as e:
        print("Transcription error:", e)
        return None

def talk2find_agent(audio_file):
    query = transcribe_audio(audio_file)
    if not query:
        return {"error": "Could not transcribe audio."}
    print("Transcribed text:", query)
    agent = Agent(role=ROLE, goal=GOAL, instructions=INSTRUCTIONS)
    return agent.start(query)

In [7]:
audio_path = "question.wav"
result = talk2find_agent(audio_path)
print(result)

Transcribed text: what time is the meeting


Output()

I'm sorry, I don't have access to your schedule or specific meeting details. Could you provide more information about which meeting you're referring to?
