# Overview: Using the Speech Service from Python

This lab shows how to use the Speech Service through the Speech SDK for Python. It illustrates how the SDK can be used to recognize speech from file input.

See the [accompanying article](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/index-speech-to-text) for more details.

## Prerequisites

Before you get started, here's a list of prerequisites:

* A subscription key for the Speech service. 

### Install Libraries in Terminal:
* gstreamer:
  ```sh
  sudo apt install libgstreamer1.0-0 \
  gstreamer1.0-plugins-base \
  gstreamer1.0-plugins-good \
  gstreamer1.0-plugins-bad \
  gstreamer1.0-plugins-ugly
  ```

### Speech SDK Python Package

**By downloading the Microsoft Cognitive Services Speech SDK, you acknowledge its license, see [Speech SDK license agreement](https://aka.ms/csspeech/license).**

Speech SDK Python Package should be installed together with other libraries via `requirementx.txt`
In case the pachage was not installed, the Cognitive Services Speech SDK Python package can be installed from [pyPI](https://pypi.org/) using this command:

```sh
pip install azure-cognitiveservices-speech
```


## Speech Recognition Using the Speech SDK

First, set up some general items. Import the Speech SDK Python:

In [None]:
import azure.cognitiveservices.speech as speechsdk
import time

### Load environment variables
The provided code is importing the `load_dotenv` function from the `dotenv` module and using it to load environment variables from a `.env`
To get access to Azure Speech Service via python SDK we need **Access Key** and **Servie Region**

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables
if load_dotenv():
    print("Found Azure AI Services Speech Endpoint: " + os.getenv("AZURE_AI_SPEECH_REGION"))
else: 
    print("Azure AI Services Speech Endpoint not found. Have you configured the .env file?")
    
service_region = os.getenv("AZURE_AI_SPEECH_REGION")
speech_key = os.getenv("AZURE_AI_SPEECH_KEY")


## 1. Speech recognition from mp3 file

### Run speech regonition for audio files
Here we run Speech Recognition and collect all the outputs

In [None]:
import os
import azure.cognitiveservices.speech as speechsdk

def recognize_from_file(filename):
    # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    speech_config.speech_recognition_language="en-US"

    audio_config = speechsdk.audio.AudioConfig(filename=filename)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    speech_recognition_result = speech_recognizer.recognize_once_async().get()

    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(speech_recognition_result.text))
        all_results.append(speech_recognition_result.text)
    elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
    elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_recognition_result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")



In [None]:
all_results = []
recognize_from_file("chunk0.wav")
recognize_from_file("chunk1.wav")
print(all_results)

### Save results to file
This Python code snippet is used to write the transcribed text to `"transcript_Azure_STT.txt"` file.

In [None]:
# Specify the file path
file_path = "transcript_Azure_STT.txt"

# Write the content to the file
with open(file_path, "w") as file:
    file.write("".join(all_results))

## 2. Get transcription with speaker recognition

The Azure Speech to Text service offers several powerful features:

- **High-Quality Transcription**: Accurately transcribe spoken audio to text in over 100 languages and variants. It provides state-of-the-art speech recognition for high-quality transcription.

- **Customizable Models**: Tailor the speech models to your specific needs. You can add domain-specific terminology by including specific words in your base vocabulary or even build your own custom speech-to-text models. This flexibility allows you to overcome barriers like background noise, accents, or unique vocabulary.

- **Speaker Diarization**: Determine who said what and when by using speaker diarization. This feature helps differentiate speakers in the transcribed text.

- **Automatic Formatting and Punctuation**: Get readable transcripts with automatic formatting and proper punctuation.

For detailed documentation and tutorials, visit the [Speech to Text documentation page](https://docs.microsoft.com/azure/cognitive-services/speech-service/index-speech-to-text).

#### This code is needed if you want special processin for real-time speech recognition
In this code snippet we are checking status of real-time recognition and collecting all pieces for further processing.
If you use bach processing, instead of real-time, full outpur will be provided by the service

In [None]:
def conversation_transcriber_recognition_canceled_cb(evt: speechsdk.SessionEventArgs):
    print('Canceled event')

def conversation_transcriber_session_stopped_cb(evt: speechsdk.SessionEventArgs):
    print('SessionStopped event')

def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
    print('TRANSCRIBED:')
    if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print('\tSpeaker ID={}'.format(evt.result.speaker_id))
        print('\tText={}'.format(evt.result.text))
        all_results.append('\n'+evt.result.speaker_id+': ')
        all_results.append(evt.result.text)
    elif evt.result.reason == speechsdk.ResultReason.NoMatch:
        print('\tNOMATCH: Speech could not be TRANSCRIBED: {}'.format(evt.result.no_match_details))

def conversation_transcriber_session_started_cb(evt: speechsdk.SessionEventArgs):
    print('SessionStarted event')

### Transcribe audio file with diarization
This Python function `recognize_from_file` performs speech recognition on an audio file using Azure's Speech SDK and transcribes a conversation from the audio file.

- The function takes one argument, `filename`, which is the path to the audio file.
- A speech configuration is created using `speechsdk.SpeechConfig` with a subscription key and service region. The speech recognition language is set to "en-US".
- An audio configuration is created using `speechsdk.audio.AudioConfig` with the audio stream.
- A conversation transcriber is created using `speechsdk.transcription.ConversationTranscriber` with the speech and audio configurations.

- Several event handlers are connected to the conversation transcriber. These handlers will be called when the corresponding events are fired by the transcriber.
- The function waits until `transcribing_stop` is `True`, checking every 0.5 seconds.


In [None]:
def recognize_speackers_from_file(filename):
  # Creates an audio stream format. For an example we are using MP3 compressed file here
    #compressed_format = speechsdk.audio.AudioStreamFormat(compressed_stream_format=speechsdk.AudioStreamContainerFormat.MP3)
    #callback = BinaryFileReaderCallback(filename=filename)
    #stream = speechsdk.audio.PullAudioInputStream(stream_format=compressed_format, pull_stream_callback=callback)

    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    speech_config.speech_recognition_language="en-US"
    audio_config = speechsdk.audio.AudioConfig(filename=filename)

    conversation_transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config, audio_config=audio_config)

    transcribing_stop = False

    def stop_cb(evt: speechsdk.SessionEventArgs):
        #"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        nonlocal transcribing_stop
        transcribing_stop = True

    # Connect callbacks to the events fired by the conversation transcriber
    conversation_transcriber.transcribed.connect(conversation_transcriber_transcribed_cb)
    conversation_transcriber.session_started.connect(conversation_transcriber_session_started_cb)
    conversation_transcriber.session_stopped.connect(conversation_transcriber_session_stopped_cb)
    conversation_transcriber.canceled.connect(conversation_transcriber_recognition_canceled_cb)
    # stop transcribing on either session stopped or canceled events
    conversation_transcriber.session_stopped.connect(stop_cb)
    conversation_transcriber.canceled.connect(stop_cb)

    conversation_transcriber.start_transcribing_async()

    # Waits for completion.
    while not transcribing_stop:
        time.sleep(.5)

    conversation_transcriber.stop_transcribing_async()

# Main



In [None]:
try:
    all_results = []
    recognize_speackers_from_file("chunk0.wav")
    recognize_speackers_from_file("chunk1.wav")
except Exception as err:
    print("Encountered exception. {}".format(err))

### Save results to file
This Python code snippet is used to write the transcribed text to `"transcript_Azure_STT_with_speaker.txt"` file.

In [None]:
# Specify the file path
file_path = "transcript_Azure_STT_with_speaker.txt"

# Write the content to the file
with open(file_path, "w") as file:
    file.write("".join(all_results))