### Setup Instructions

This notebook contains the code for a voice-enabled LLM chat application. To run this notebook, you need to set up the environment and download the necessary models: check the '*Local Setup Instructions*' section of the README.md file.
**Important note: this notebook is intended to run only in local environment.**

### Usage Instructions

Once the setup is complete and the models are loaded, you can run the application cells to start the voice chat interface.

#### Starting the Application

Run the code cell directly following the "Starting The Application" markdown heading. This cell initializes and displays the chat interface.

#### Interacting with the Application

After running the start cell, a chat interface will appear.
*   **Voice Input:** Look for a microphone button and click it to start speaking. Speak clearly and concisely. Click the STOP button to stop recording. Your spoken input will be transcribed into text.
*   **Text Input:** You can also typically type your message into a text box provided in the interface and press Enter or click a send button.
*   **Receiving Responses:** The application will process your input using the LLM. The response will be displayed as text in the chat interface. If the Text-to-Speech model is working correctly, you will also hear the response spoken aloud.

#### Stopping the Application

To gracefully stop the application and release resources, run the code cell directly following the "Stopping The Application" markdown heading.

### Libraries Import

In [None]:
import sys
import time
from IPython.display import Javascript, HTML, display
from vosk import Model, KaldiRecognizer
import threading
import ipywidgets as widgets
from transformers import AutoModelForCausalLM, AutoTokenizer
from piper.voice import PiperVoice
from utils.voice_llm_chat import VoiceLLMChatBackend
from utils.voice_llm_chat_frontend import VoiceLLMChatFrontend_Local

### Loading Models

In [None]:
# # List of available Piper voices
# !python -m piper.download_voices

# Download a specific Piper voice
!python -m piper.download_voices en_US-danny-low

# Models are stored in the Open Neural Network Exchange (ONNX) format
piper_voice_name = "en_US-danny-low.onnx"

In [None]:
sample_rate = 16000

# Load the Piper voice model
try:
    voice_model = PiperVoice.load(piper_voice_name)
except FileNotFoundError:
    print(f"Error: Piper voice model file not found. Please ensure '{piper_voice_name}' is in the correct directory.", file=sys.stderr)
    voice_model = None
except Exception as e:
    print(f"An unexpected error occurred while loading the Piper model: {e}", file=sys.stderr)
    voice_model = None

""" Load the Vosk speech recognition model.
Here we use a relatively small model. You can download a larger, much more accurate speech recognition model.
"""
try:
    speech_model = Model(model_name="vosk-model-en-us-0.22-lgraph")
except Exception as e:
    print(f"Error loading Vosk model: {e}. Please ensure the model is downloaded and accessible.", file=sys.stderr)
    speech_model = None
    speech_recognizer = None

if speech_model:
    try:
        speech_recognizer = KaldiRecognizer(speech_model, sample_rate)
        speech_recognizer.SetWords(True)
    except Exception as e:
        print(f"Error creating Vosk recognizer: {e}", file=sys.stderr)
        speech_recognizer = None

# Initialization of the LLM model and tokenizer
llm_model_name = "Gensyn/Qwen2.5-0.5B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(llm_model_name)
llm_model = AutoModelForCausalLM.from_pretrained(
    llm_model_name, pad_token_id=tokenizer.eos_token_id
)
llm_model.eval()

### Voice LLM Chat Initialization

In [None]:
# Choose the system message that best meets your needs.
llm_model_system_message = "You are a supportive voice assistant that replies with one or two brief sentences. Your replies should avoid any text formatting."

# You can test various parameter configurations.
llm_model_temperature = 0.1
llm_model_max_tokens = 256
llm_model_top_k = 100
llm_model_top_p = 1.0
app = VoiceLLMChatBackend(llm_model, tokenizer, voice_model, speech_recognizer)
# Initialization of LLM model parameters.
app.set_model_parameters(llm_model_temperature, llm_model_max_tokens, llm_model_top_k, llm_model_top_p, locale="en")
app.set_system_message(llm_model_system_message)
# app.should_print_logs = True

### Ipywidgets Initialization

In [None]:
# Output widget to display messages and recognized text
app_output_widget = widgets.Output()
# Display object to update the HTML output area
html_display_area = display(HTML(""), display_id=True)

requestDataContainer = widgets.Textarea(
    value='',
    placeholder='Request Data Container',
    description='Request Data Container',
    layout = widgets.Layout(display='none')
)

### Button Click Handlers

In [None]:
output_lock = threading.Lock()

def start_new_chat_py(button):
    """Starts a new chat session."""
    app.start_new_chat()
    with output_lock:
        html_display_area.update(Javascript(f'newChatStarted();'))

def send_prompt_py(button):
    """Sends a user prompt to the LLM."""
    prompt = requestDataContainer.value.strip()
    app.send_prompt(prompt)
    with output_lock:
        html_display_area.update(Javascript(f'promptSent();'))

def interrupt_response_py(button):
    """Interrupts the LLM's response."""
    app.interrupt_response()
    while app.is_model_working:
        time.sleep(0.1)
    response = app.get_last_response()
    # Call JavaScript function to update the interrupted message
    with output_lock:
        html_display_area.update(Javascript(f'responseInterrupted(`{response}`, `{str(app.get_context_load())}`);'))

def transcribe_py(button):
    """Transcribes audio to text."""
    transcription = app.transcribe(requestDataContainer.value)
    with output_lock:
        html_display_area.update(Javascript(f'displayTranscription(`{transcription}`);'))

### Functions controlling the application 

These functions control the start and stop of the application and the thread function that refreshes the conversation window.

In [None]:
stop_event = threading.Event()

def update_conversation(html_display_area, app):
    """
    Continuously updates the conversation in the HTML output area.
    """
    try:
        while not stop_event.is_set():
            data = app.get_completed_data_chunk()
            if data is not None:
                display_sentence, encoded_audio = data
                with output_lock:
                    if encoded_audio != "":
                        html_display_area.update(Javascript(f'appendAudio(`{encoded_audio}`);'))
                    if display_sentence != "":
                        html_display_area.update(Javascript(f'updateMessage(`{display_sentence}`);'))
            else:
                with output_lock:
                    html_display_area.update(Javascript(f'assistantResponseFinished(`{str(app.get_context_load())}`);'))
    except Exception as e:
        print(f"An unexpected error occurred in speech_recognition thread: {e}", file=sys.stderr)
        stop_event.set()

def start_application():
    app.start()
    stop_event.clear()
    transcribe_thread = threading.Thread(target=update_conversation, args=(html_display_area,app))
    transcribe_thread.start()

def stop_application():
    app.stop()
    stop_event.set()

### Buttons

In [None]:
# The buttons below will be activated from JavaScript code.
new_chat_button = widgets.Button(description='New_Chat',layout=widgets.Layout(display='none'))
send_button = widgets.Button(description='Send_Prompt', layout=widgets.Layout(display='none'))
transcribe_button = widgets.Button(description='Transcribe_Audio', layout=widgets.Layout(display='none'))
stop_reply_button = widgets.Button(description='Stop_Reply',layout=widgets.Layout(display='none'))

# Arrange widgets in a layout
control_panel = widgets.HBox([new_chat_button, send_button, transcribe_button, stop_reply_button])

# Link button clicks to the respective functions
new_chat_button.on_click(start_new_chat_py)
send_button.on_click(send_prompt_py)
transcribe_button.on_click(transcribe_py)
stop_reply_button.on_click(interrupt_response_py)

### Application Frontend

User interface components and JavaScript functions that manage voice recording and communication with the Python backend in the Colab environment will be imported as a ready-made module. This is an HTML document with an embedded stylesheet and JavaScript script. Its content can be viewed using `print(llmChatFrontend)`.

In [None]:
voiceLLmFrontend = VoiceLLMChatFrontend_Local(
    # Setting up our assistant's logo.
    assistantAvatarSrc = "https://qwenlm.github.io/img/logo.png",
    # For the user, let this be the Golab logo.
    userAvatarSrc = "https://colab.research.google.com/img/colab_favicon_256px.png"
    )

# Static HTML document with the application's interface.
llmChatFrontend = voiceLLmFrontend.getDocument()
# # You can also preview the content of the document.
# print(llmChatFrontend)

### Launching the Application

You need to allow the browser to use the microphone if you haven't done so yet. When running the application for the first time, it may be necessary to rerun the code below. The first transcription takes a little longer due to the initialization of the speech recognition model.

In [None]:
if app.initialized:
    # Display the buttons and the output area
    app_output_widget.outputs = []
    display(control_panel, requestDataContainer, app_output_widget)
    start_application()
    
    # Add the HTML structure for the message display area
    app_output_widget.append_display_data(HTML(llmChatFrontend))
else:
    print("initialization failed")
    # In case of problems, set the 'should_print_logs' flag to 'True', reload the application, and check the logs.

### Stopping the Application

To properly stop the application and release resources, uncomment and run the code below.

In [None]:
# stop_application()

The entire conversation in the form of an unformatted list of text messages can be exported by referencing the object below:

In [None]:
app.chat_messages