# 🎙️ Speech-to-Text with Barge-in Testing

This notebook demonstrates real-time speech recognition with text-to-speech and barge-in functionality.

## Quick Test Steps:
1. Run all cells in order
2. **IMPORTANT**: Use headphones to prevent audio feedback
3. Speak into your microphone
4. Try interrupting the assistant while it's speaking
5. Say "quit" to exit

## What's Fixed:
- ✅ Simplified barge-in logic
- ✅ Better feedback prevention
- ✅ Reliable speech recognition
- ✅ Clear status indicators

In [1]:
import logging
import os

# set the directory to the location of the script
try:
    os.chdir("../../../")
    target_directory = os.getenv(
        "TARGET_DIRECTORY", os.getcwd()
    )  # Use environment variable if available
    if os.path.exists(target_directory):
        os.chdir(target_directory)
        print(f"Changed directory to: {os.getcwd()}")
        logging.info(f"Successfully changed directory to: {os.getcwd()}")
    else:
        logging.error(f"Directory does not exist: {target_directory}")
except Exception as e:
    logging.exception(f"An error occurred while changing directory: {e}")

Changed directory to: c:\Users\pablosal\Desktop\gbb-ai-audio-agent


In [2]:
from src.speech.text_to_speech import SpeechSynthesizer
from src.speech.speech_recognizer import StreamingSpeechRecognizerFromBytes
from openai import AzureOpenAI

if "az_speech_recognizer_stream_client" not in locals():
    az_speech_recognizer_stream_client = StreamingSpeechRecognizerFromBytes(
        vad_silence_timeout_ms=800,
        use_semantic_segmentation=False,
        audio_format="pcm",
        candidate_languages=["en-US", "fr-FR", "de-DE", "es-ES", "it-IT"],
        enable_diarisation=False,
        speaker_count_hint=2,
        enable_neural_fe=False,
    )

if "az_speach_synthesizer_client" not in locals():
    az_speach_synthesizer_client = SpeechSynthesizer()

# Ensure Azure OpenAI client is initialized only if not already defined
if "client" not in locals():
    client = AzureOpenAI(
        api_version="2025-02-01-preview",
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        api_key=os.getenv("AZURE_OPENAI_KEY"),
    )

[2025-08-29 11:00:49,899] INFO - src.speech.speech_recognizer: Azure Monitor tracing initialized for speech recognizer
INFO:src.speech.speech_recognizer:Azure Monitor tracing initialized for speech recognizer
[2025-08-29 11:00:49,909]  INFO - src.speech.speech_recognizer: Azure Monitor tracing initialized for speech recognizer
INFO:src.speech.speech_recognizer:Azure Monitor tracing initialized for speech recognizer
[2025-08-29 11:00:49,909] INFO - src.speech.speech_recognizer: Creating SpeechConfig with API key authentication
INFO:src.speech.speech_recognizer:Creating SpeechConfig with API key authentication
INFO - src.speech.speech_recognizer: Creating SpeechConfig with API key authentication
INFO:src.speech.speech_recognizer:Creating SpeechConfig with API key authentication
[2025-08-29 11:00:49,921] INFO - src.speech.text_to_speech[2025-08-29 11:00:49,921] INFO - src.speech.text_to_speech: Azure Monitor tracing initialized for speech synthesizer
INFO:src.speech.text_to_speech:Azure M

In [6]:
# Define end-of-sentence markers for TTS
TTS_ENDS = {".", "!", "?", ";", "\n"}

# Reset all flags and buffers (these will be redefined in the main cell)
print("🔄 Resetting conversation state...")

# Clear any existing variables to prevent conflicts
if 'user_buffer' in locals():
    del user_buffer
if 'is_synthesizing' in locals():
    del is_synthesizing
if 'audio_lock' in locals():
    del audio_lock

print("✅ State reset complete")

🔄 Resetting conversation state...
✅ State reset complete

✅ State reset complete


In [7]:
import os, time, threading


def on_final(text, lang):
    global user_buffer
    user_buffer += text.strip() + "\n"
    print(f"🧾 User (final) in {lang}: {text}")


def assistant_speak(text):
    global is_synthesizing
    print("Hi there, I am a assistant_speak callback!")
    is_synthesizing = True
    print("Syntetixing:", is_synthesizing)
    az_speach_synthesizer_client.start_speaking_text(text)


def on_partial(text, language, speaker_id):
    global is_synthesizing
    if is_synthesizing:
        az_speach_synthesizer_client.stop_speaking()
        is_synthesizing = False
    print(f"🗣️ User (partial) in {language}: {text}")


az_speech_recognizer_stream_client.set_partial_result_callback(on_partial)
az_speech_recognizer_stream_client.set_final_result_callback(on_final)

# Start recognition
az_speech_recognizer_stream_client.start()
print("🎙️ Speak now...")

# Start mic streaming thread
import pyaudio

RATE, CHANNELS, CHUNK = 16000, 1, 1024
audio = pyaudio.PyAudio()
stream = audio.open(
    format=pyaudio.paInt16,
    channels=CHANNELS,
    rate=RATE,
    input=True,
    frames_per_buffer=CHUNK,
)


def mic_loop():
    while True:
        data = stream.read(CHUNK, exception_on_overflow=False)
        az_speech_recognizer_stream_client.write_bytes(data)


threading.Thread(target=mic_loop, daemon=True).start()

messages = [{"role": "system", "content": "You are a helpful assistant."}]

user_buffer = ""  # This should be filled in by your STT callback as before

while True:
    if user_buffer:
        full_conversation = (
            "\n".join([f"{m['role'].capitalize()}: {m['content']}" for m in messages])
            + f"\nUser: {user_buffer}"
        )
        messages.append({"role": "user", "content": user_buffer})
        user_buffer = ""  # clear after using

        completion = client.chat.completions.create(
            stream=True,
            messages=messages,
            max_tokens=4096,
            temperature=1.0,
            top_p=1.0,
            model=os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT_ID"),
        )

        collected_messages = []
        last_tts_request = None

        for chunk in completion:
            if chunk.choices and hasattr(chunk.choices[0].delta, "content"):
                chunk_text = chunk.choices[0].delta.content
                if chunk_text:
                    collected_messages.append(chunk_text)
                    print(chunk_text, end="", flush=True)
                    # Check for sentence end to stream to TTS
                    if chunk_text in TTS_ENDS:
                        text = "".join(collected_messages).strip()
                        last_tts_request = assistant_speak(text)
                        collected_messages.clear()
        print()  # finish line after streaming LLM response

    time.sleep(0.1)

[2025-08-29 11:01:42,438] INFO - src.speech.speech_recognizer INFO - src.speech.speech_recognizer: Starting recognition from byte stream…
INFO:src.speech.speech_recognizer:Starting recognition from byte stream…
[2025-08-29 11:01:42,452] INFO: Starting recognition from byte stream…
INFO:src.speech.speech_recognizer:Starting recognition from byte stream…
[2025-08-29 11:01:42,452] INFO - src.speech.speech_recognizer: Speech-SDK prepare_start – format=pcm  neuralFE=False  diar=False
INFO:src.speech.speech_recognizer:Speech-SDK prepare_start – format=pcm  neuralFE=False  diar=False
 - src.speech.speech_recognizer: Speech-SDK prepare_start – format=pcm  neuralFE=False  diar=False
INFO:src.speech.speech_recognizer:Speech-SDK prepare_start – format=pcm  neuralFE=False  diar=False
[2025-08-29 11:01:42,522][2025-08-29 11:01:42,522] INFO - src.speech.speech_recognizer: Speech-SDK ready (neuralFE=False, diarisation=False, speakers=2)
INFO:src.speech.speech_recognizer:Speech-SDK ready (neuralFE=Fal

🎙️ Speak now...

🧾 User (final) in en-US: Hi there, how are you doing?
🧾 User (final) in en-US: Hi there, how are you doing?
Hello!Hi there, I am a assistant_speak callback!
Syntetixing: True
Hello!Hi there, I am a assistant_speak callback!
Syntetixing: True


[2025-08-29 11:01:48,307] INFO INFO - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: Hello!...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: Hello!...
 - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: Hello!...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: Hello!...


 I'm here here and ready to and ready to help you.Hi there, I am a assistant_speak callback! help you.Hi there, I am a assistant_speak callback!
Syntetixing: True

Syntetixing: True


[2025-08-29 11:01:48,348] INFO - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: I'm here and ready to help you....
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: I'm here and ready to help you....
 INFO - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: I'm here and ready to help you....
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: I'm here and ready to help you....


 How can I can I assist you today? assist you today?Hi there, I am a assistant_speak callback!
Syntetixing: True
Hi there, I am a assistant_speak callback!
Syntetixing: True


[2025-08-29 11:01:48,390] INFO -  INFO - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: How can I assist you today?...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: How can I assist you today?...
src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: How can I assist you today?...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: How can I assist you today?...





[2025-08-29 11:01:50,558] INFO - src.speech.text_to_speech: [🛑] Stopping speech synthesis...
INFO:src.speech.text_to_speech:[🛑] Stopping speech synthesis...
 INFO - src.speech.text_to_speech: [🛑] Stopping speech synthesis...
INFO:src.speech.text_to_speech:[🛑] Stopping speech synthesis...


🗣️ User (partial) in en-US: hello i'm

🗣️ User (partial) in en-US: hello i'm here and ready to
🗣️ User (partial) in en-US: hello i'm here and ready to
🗣️ User (partial) in en-US: hello i'm here and ready to help
🗣️ User (partial) in en-US: hello i'm here and ready to help
🧾 User (final) in en-US: Hello I'm here and ready to help.
🧾 User (final) in en-US: Hello I'm here and ready to help.
That's great to hear!Hi there, I am a assistant_speak callback!That's great to hear!Hi there, I am a assistant_speak callback!
Syntetixing: True

Syntetixing: True


[2025-08-29 11:01:52,434] INFO -  INFO - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: That's great to hear!...
src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: That's great to hear!...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: That's great to hear!...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: That's great to hear!...


 How can can I assist I assist you today?Hi there, I am a assistant_speak callback!
 you today?Hi there, I am a assistant_speak callback!
Syntetixing: True
Syntetixing: True


[2025-08-29 11:01:52,475] INFO INFO - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: How can I assist you today?...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: How can I assist you today?...
 - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: How can I assist you today?...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: How can I assist you today?...





[2025-08-29 11:01:53,918] INFO - src.speech.text_to_speech: [🛑] Stopping speech synthesis...
INFO:src.speech.text_to_speech:[🛑] Stopping speech synthesis...
 INFO - src.speech.text_to_speech: [🛑] Stopping speech synthesis...
INFO:src.speech.text_to_speech:[🛑] Stopping speech synthesis...


🗣️ User (partial) in en-US: that's great to

🗣️ User (partial) in en-US: that's great to hear
🗣️ User (partial) in en-US: that's great to hear
🧾 User (final) in en-US: That's great to hear.
🧾 User (final) in en-US: That's great to hear.
I'm glad to hear thatI'm glad to hear that!Hi there, I am a assistant_speak callback!
Syntetixing: True
!Hi there, I am a assistant_speak callback!
Syntetixing: True


[2025-08-29 11:01:56,028] INFO -  INFO - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: I'm glad to hear that!...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: I'm glad to hear that!...
src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: I'm glad to hear that!...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: I'm glad to hear that!...


 How can can I I assist you today?Hi there, I am a assistant_speak callback!
 assist you today?Hi there, I am a assistant_speak callback!
Syntetixing: True
Syntetixing: True


[2025-08-29 11:01:56,066] INFO - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: How can I assist you today?...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: How can I assist you today?...
 INFO - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: How can I assist you today?...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: How can I assist you today?...





[2025-08-29 11:01:57,526] INFO - src.speech.text_to_speech: [🛑] Stopping speech synthesis...
INFO:src.speech.text_to_speech:[🛑] Stopping speech synthesis...
 INFO - src.speech.text_to_speech: [🛑] Stopping speech synthesis...
INFO:src.speech.text_to_speech:[🛑] Stopping speech synthesis...


🗣️ User (partial) in en-US: i'm glad to

🗣️ User (partial) in en-US: i'm glad to hear that
🗣️ User (partial) in en-US: i'm glad to hear that
🧾 User (final) in en-US: I'm glad to hear that.
🧾 User (final) in en-US: I'm glad to hear that.
What can I assistWhat can I assist you with today? you with today?Hi there, I am a assistant_speak callback!
Syntetixing:Hi there, I am a assistant_speak callback!
Syntetixing: True
 True


[2025-08-29 11:01:59,215] INFO INFO -  - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: What can I assist you with today?...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: What can I assist you with today?...
src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: What can I assist you with today?...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: What can I assist you with today?...





[2025-08-29 11:02:00,716] INFO - src.speech.text_to_speech: [🛑] Stopping speech synthesis...
INFO:src.speech.text_to_speech:[🛑] Stopping speech synthesis...
 INFO - src.speech.text_to_speech: [🛑] Stopping speech synthesis...
INFO:src.speech.text_to_speech:[🛑] Stopping speech synthesis...


🗣️ User (partial) in en-US: what can i

🗣️ User (partial) in en-US: what can i assist you with
🗣️ User (partial) in en-US: what can i assist you with
🗣️ User (partial) in en-US: what can i assist you with today
🗣️ User (partial) in en-US: what can i assist you with today
🧾 User (final) in en-US: What can I assist you with today?
🧾 User (final) in en-US: What can I assist you with today?
I’m hereI’m here to help you! to help you!Hi there, I am a assistant_speak callback!
Syntetixing: True
Hi there, I am a assistant_speak callback!
Syntetixing: True


[2025-08-29 11:02:02,613] INFO -  INFO - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: I’m here to help you!...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: I’m here to help you!...
src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: I’m here to help you!...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: I’m here to help you!...


 What do you do you need assistance with need assistance with today?Hi there, I am a assistant_speak callback!
Syntetixing: True today?Hi there, I am a assistant_speak callback!
Syntetixing: True



[2025-08-29 11:02:02,660] INFO - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: What do you need assistance with today?...
 INFO - src.speech.text_to_speech: [🔊] Starting streaming speech synthesis for text: What do you need assistance with today?...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: What do you need assistance with today?...
INFO:src.speech.text_to_speech:[🔊] Starting streaming speech synthesis for text: What do you need assistance with today?...





[2025-08-29 11:02:04,120] INFO - src.speech.text_to_speech: [🛑] Stopping speech synthesis...
INFO:src.speech.text_to_speech:[🛑] Stopping speech synthesis...
 INFO - src.speech.text_to_speech: [🛑] Stopping speech synthesis...
INFO:src.speech.text_to_speech:[🛑] Stopping speech synthesis...


🗣️ User (partial) in en-US: i'm here to help

🗣️ User (partial) in en-US: i'm here to help you
🗣️ User (partial) in en-US: i'm here to help you
🧾 User (final) in en-US: I'm here to help you.
🧾 User (final) in en-US: I'm here to help you.


KeyboardInterrupt: 

In [None]:
az_speech_recognizer_stream_client.close_stream()
az_speach_synthesizer_client.stop_speaking()

[2025-08-29 11:02:09,319] INFO - src.speech.text_to_speech: [🛑] Stopping speech synthesis...
 INFO - src.speech.text_to_speech: [🛑] Stopping speech synthesis...
INFO:src.speech.text_to_speech:[🛑] Stopping speech synthesis...
INFO:src.speech.text_to_speech:[🛑] Stopping speech synthesis...


: Recognition canceled: SpeechRecognitionCanceledEventArgs(session_id=b36c85c7eea040879c1093ed8293531d, result=SpeechRecognitionResult(result_id=9e89a8a8830e410aa786cd2b597ef285, text="", reason=ResultReason.Canceled))
: Reason: CancellationReason.EndOfStream, Error: 
[2025-08-29 11:02:09,465][2025-08-29 11:02:09,465]  INFOINFO -  - src.speech.speech_recognizersrc.speech.speech_recognizer: Session stopped.
: Session stopped.
INFO:src.speech.speech_recognizer:Session stopped.
INFO:src.speech.speech_recognizer:Session stopped.
