This project is organized into four main components:

1. **Speech-to-Text** — to capture and transcribe the user's voice input  
2. **LLM Response Generation** — to process the input and generate a reply  
3. **Text-to-Speech** — to convert the response into audio output  
4. **Main Loop** — to integrate all components into a continuous voice interaction cycle

# 1.Speech recognition

This section provides two key functions:

- One to convert an audio file into text
- Another to record input directly from the microphone

In [None]:
import speech_recognition as sr

r = sr.Recognizer()

def recognize_speech_from_mic(recognizer, verbose=True):
    with sr.Microphone() as source:
        if verbose:
            print("Listening...")
        r.pause_threshold = 2
        audio_text = r.listen(source)
    if verbose:
        print("Recognizing...")
        
    try:
        return r.recognize_google(audio_text)
    except:
        if verbose:
            print("Could not recognize ... Please try again")
        return recognize_speech_from_mic(recognizer)

# print(recognize_speech_from_mic(r))

# 2. LLM Response Generation

Here, we define a function that manages the conversation with a language model (either local or through Groq), including simple context/history handling.

In [None]:
from groq import Groq
import os
import json
from dotenv import load_dotenv
import ollama

load_dotenv()

def converse(new_message, model="llama-3.1-8b-instant", api_key=None, history_file="conversation_history.txt"):
    """
    A function to interact with a Groq-based chatbot that maintains conversation context using a text file.

    Parameters:
    - new_message (str): The user's latest input message.
    - model (str): The model to use (default: "llama-3.1-8b-instant").
    - api_key (str): Groq API key (default: None, will use GROQ_API_KEY from environment variables if not provided).
    - history_file (str): The file to store conversation history (default: "conversation_history.txt").

    Returns:
    - str: The chatbot's response.
    """

    # Load conversation history
    if os.path.exists(history_file):
        with open(history_file, "r", encoding="utf-8") as file:
            messages = json.load(file)
    else:
        messages = []

    # Append new user message
    messages.append({"role": "user", "content": new_message})

    api_key = api_key or os.getenv("GROQ_API_KEY")

    if api_key:
        client = Groq()
        completion = client.chat.completions.create(
            model=model,
            messages=messages,
            temperature=1,
            max_completion_tokens=1024,
            top_p=1,
            stream=True,
            stop=None,
        )

        response_text = ""
        for chunk in completion:
            response_text += chunk.choices[0].delta.content or ""

    else:
        response = ollama.chat(
        model="llama3.2:3b",
        messages=messages
        )
        response_text = response["message"]["content"]

    
    # Append assistant response
    messages.append({"role": "assistant", "content": response_text})
    
    # Save updated conversation history
    with open(history_file, "w", encoding="utf-8") as file:
        json.dump(messages, file, indent=4)
    
    return response_text

# Example usage:
# print(groq_chatbot_conversation("It's me!"))


In [None]:
print(converse("the song name is correct but you kinda wiffed on the lyrics ngl"))

# 3. Text-to-Speech

This section handles the audio output. It includes a function that takes a text response and plays it back using text-to-speech tools.

In [None]:
import asyncio
import edge_tts
import io
import nest_asyncio
from pydub import AudioSegment
from pydub.playback import play

# Allow asyncio to work in Jupyter Notebook
nest_asyncio.apply()

async def speak(text: str, voice="en-US-MichelleNeural", rate="+100%"):
    """Convert text to speech and play it directly in Jupyter Notebook."""
    tts = edge_tts.Communicate(text, voice,rate =rate)
    stream = io.BytesIO()

    async for chunk in tts.stream():
        if chunk["type"] == "audio":
            stream.write(chunk["data"])

    stream.seek(0)
    audio = AudioSegment.from_file(stream, format="mp3")
    play(audio)

# Function to run async code in Jupyter
async def tts_play(text, voice="en-US-MichelleNeural", rate="+20%"):
    await speak(text, voice, rate)

# Example usage
# await tts_play("""It was a dark and stormy night. All of a sudden, a voice came out of the darkness and said, "Hello! I'm here to help you with your query. How can I assist you today?""")


# 4. Main Loop

This part brings everything together. It uses the speech recognition, LLM, and TTS components to form a real-time, voice-based chatbot.

In [None]:
while True:
    user_text = recognize_speech_from_mic(r)
    print("user: ", user_text)
    response = converse(user_text, model="llama-3.1-8b-instant")
    print("bot: ", response)
    await tts_play(response, voice="en-US-MichelleNeural", rate="+50%")