### Building a Chatbot Interface, with Text or Voice Input, Multi-LLM support, and Memory Persistence

In this tutorial, we’ll use Gradio to build a simple chatbot prototype with a user-friendly interface. The chatbot will support multiple language models, allowing the user to switch models at any point during the conversation. It will also offer optional memory persistence, where the chat history is stored and forwarded to the selected model — which allows shared memory across models, even when switching mid-chat.

In this project, we'll use OpenAI's API, Anthropic's Claude, and Meta's LLaMA, which runs locally via an Ollama server. Additionally, we'll use Python’s speech_recognition module to convert speech to text.

It's worth noting that some APIs — such as OpenAI's — now support direct audio input, so integrating speech capabilities can also be done end-to-end without a separate transcription module.

In [37]:
import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
import anthropic

In [38]:
# Speech recording and recognition libraries
import speech_recognition as sr
import sounddevice as sd
import numpy as np

In [39]:
# GUI prototyping
import gradio as gr

In [40]:
buffer = [] # For temporarily holding sound recording

#  Helper function for handling voice recording
def callback(indata, frames, time, status):
    buffer.append(indata.copy())

stream = sd.InputStream(callback=callback, samplerate=16000, channels=1, dtype='int16')

In [41]:

# Function for handling recording data and status
def toggle_recording(state):
    global stream, buffer
    print('state', state)

    if not state:
        buffer.clear()
        stream.start()
        return gr.update(value="Stop Recording"), 'Recording...', not state
    else:
        stream.stop()
        audio = np.concatenate(buffer, axis=0)
        text = transcribe(audio)
        return gr.update(value="Start Recording"), text, not state

# Functio that converts speech to text via Google's voice recognition module
def transcribe(recording, sample_rate=16000):
    r = sr.Recognizer()

    # Convert NumPy array to AudioData
    audio_data = sr.AudioData(
    recording.tobytes(),              # Raw byte data
    sample_rate,                     # Sample rate
        2                                # Sample width in bytes (16-bit = 2 bytes)
    )

    text = r.recognize_google(audio_data)
    print("You said:", text)
    return text

### LLM & API set-up

##### Load API keys from .env

In [42]:
# Load environment variables in a file called .env
# Print the key prefixes to help with any debugging

load_dotenv(override=True)
openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:8]}")
else:
    print("Google API Key not set")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
Google API Key not set


### Class for handling API calls and routing requests to the selected models

In [43]:
class LLMHandler:
    def __init__(self, system_message: str = '', ollama_api:str='http://localhost:11434/api/chat'):
        # Default system message if none provided
        self.system_message = system_message if system_message else "You are a helpful assistant. Always reply in Markdown"
        self.message_history = []

        # Initialize LLM clients
        self.openai = OpenAI()
        self.claude = anthropic.Anthropic()
        self.OLLAMA_API = ollama_api
        self.OLLAMA_HEADERS = {"Content-Type": "application/json"}

    def llm_call(self, model: str = 'gpt-4o-mini', prompt: str = '', memory_persistence=True):
        if not model:
            return 'No model specified'

        # Use full message template with system prompt if no prior history
        message = self.get_message_template(prompt, initial=True) if (
            not self.message_history and not 'claude' in model
             ) else self.get_message_template(prompt)

        # Handle memory persistence
        if memory_persistence:
            self.message_history.extend(message)
        else:
            self.message_history = message

        # Model-specific dispatch
        try:
            if 'gpt' in model:
                response = self.call_openai(model=model)
            elif 'claude' in model:
                response = self.call_claude(model=model)
            elif 'llama' in model:
                response = self.call_ollama(model=model)
            else:
                response = f'{model.title()} is not supported or not a valid model name.'
        except Exception as e:
            response = f'Failed to retrieve response. Reason: {e}'

        # Save assistant's reply to history if memory is enabled
        if memory_persistence:
            self.message_history.append({
                "role": "assistant",
                "content": response
            })

        return response

    def get_message_template(self, prompt: str = '', initial=False):
        # Returns a message template with or without system prompt
        initial_template = [
            {"role": "system", "content": self.system_message},
            {"role": "user", "content": prompt}
        ]
        general_template = [
            {"role": "user", "content": prompt}
        ]
        return initial_template if initial else general_template

    def call_openai(self, model: str = 'gpt-4o-mini'):
        # Sends chat completion request to OpenAI API
        completion = self.openai.chat.completions.create(
            model=model,
            messages=self.message_history,
        )
        response = completion.choices[0].message.content
        return response

    def call_ollama(self, model: str = "llama3.2"):

        payload = {
            "model": model,
            "messages": self.message_history,
            "stream": False
        }

        response = requests.post(url=self.OLLAMA_API, headers=self.OLLAMA_HEADERS, json=payload)
        return response.json()["message"]["content"]

    def call_claude(self, model: str = "claude-3-haiku-20240307"):
        # Sends chat request to Anthropic Claude API
        message = self.claude.messages.create(
            model=model,
            system=self.system_message,
            messages=self.message_history,
            max_tokens=500
        )
        response = message.content[0].text
        return response


In [44]:
llm_handler = LLMHandler()

# Function to handle user prompts received by the interface
def llm_call(model, prompt, memory_persistence):
    response = llm_handler.llm_call(model=model, prompt=prompt, memory_persistence=memory_persistence)
    return response, ''


In [45]:
# Specify available model names for the dropdown component
AVAILABLE_MODELS = ["gpt-4", "gpt-3.5", "claude-3-haiku-20240307", "llama3.2", "gpt-4o-mini"]


In [46]:

with gr.Blocks() as demo:
    state = gr.State(False) # Recording state (on/off)
    with gr.Row():
        
        with gr.Column():
            out = gr.Markdown(label='Message history')
            with gr.Row():
                memory = gr.Checkbox(label='Toggle memory', value=True) # Handle memory status (on/off) btn
                model_choice = gr.Dropdown(label='Model', choices=AVAILABLE_MODELS, interactive=True) # Model selection dropdown
            query_box = gr.Textbox(label='ChatBox', placeholder="Your message")
            record_btn = gr.Button(value='Record voice message') # Start/stop recording btn
            send_btn = gr.Button("Send") # Send prompt btn
      
            
    
    record_btn.click(fn=toggle_recording, inputs=state, outputs=[record_btn, query_box, state])
    send_btn.click(fn=llm_call, inputs=[model_choice, query_box, memory], outputs=[out, query_box])
    

demo.launch()

* Running on local URL:  http://127.0.0.1:7868
* To create a public link, set `share=True` in `launch()`.


