# Quickstart: Using the Speech Service from Python

This sample shows how to use the Speech Service through the Speech SDK for Python. It illustrates how the SDK can be used to recognize speech from microphone input.

See the [accompanying article](https://docs.microsoft.com/azure/cognitive-services/speech-service/quickstart-python) on the SDK documentation page for step-by-step instructions.

## Prerequisites

Before you get started, here's a list of prerequisites:

* A subscription key for the Speech service. 

### Install Libraries:
* On Ubuntu 16.04 or 18.04, run the following commands for the installation of required packages:
  ```sh
  sudo apt-get update
  sudo apt-get install libssl-dev libasound2
  ```
* On Debian 9, run the following commands for the installation of required packages:
  ```sh
  sudo apt install libgstreamer1.0-0 \
  gstreamer1.0-plugins-base \
  gstreamer1.0-plugins-good \
  gstreamer1.0-plugins-bad \
  gstreamer1.0-plugins-ugly
  ```
*
## Get the Speech SDK Python Package

**By downloading the Microsoft Cognitive Services Speech SDK, you acknowledge its license, see [Speech SDK license agreement](https://aka.ms/csspeech/license).**

The Cognitive Services Speech SDK Python package can be installed from [pyPI](https://pypi.org/) using this command:

```sh
pip install azure-cognitiveservices-speech
```


## Speech Recognition Using the Speech SDK

First, set up some general items. Import the Speech SDK Python:

In [1]:
import azure.cognitiveservices.speech as speechsdk
import time

### Load environment variables

In [2]:
import os
from dotenv import load_dotenv

# Load environment variables
if load_dotenv():
    print("Found Azure AI Services Speech Endpoint: " + os.getenv("AZURE_AI_SPEECH_REGION"))
else: 
    print("Azure AI Services Speech Endpoint not found. Have you configured the .env file?")
    
service_region = os.getenv("AZURE_AI_SPEECH_REGION")
speech_key = os.getenv("AZURE_AI_SPEECH_KEY")


Found Azure AI Services Speech Endpoint: westeurope


### Get input file

In [3]:
# Inputs about the podcast
podcast_url = "https://www.microsoft.com/behind-the-tech"
podcast_audio_file = "../data/PodcastSnippet.mp3"

Create an instance of a speech config with specified subscription key and service region.
Replace with your own subscription key and service region (e.g., "westus").

Create a recognizer with the given settings. Since no explicit audio config is specified, the default microphone will be used (make sure the audio settings are correct).

Starts speech recognition, and returns after a single utterance is recognized. The end of a
single utterance is determined by listening for silence at the end or until a maximum of 15
seconds of audio is processed.  The task returns the recognition text as result. 
Note: Since `recognize_once()` returns only a single utterance, it is suitable only for single
shot recognition like command or query. 
For long-running multi-utterance recognition, use `start_continuous_recognition()` instead.

####  This code is needet to process compressed files like MP3. 

In [4]:
class BinaryFileReaderCallback(speechsdk.audio.PullAudioInputStreamCallback):
    def __init__(self, filename: str):
        super().__init__()
        self._file_h = open(filename, "rb")

    def read(self, buffer: memoryview) -> int:
        try:
            size = buffer.nbytes
            frames = self._file_h.read(size)
            buffer[:len(frames)] = frames

            return len(frames)
        except Exception as ex:
            print('Exception in `read`: {}'.format(ex))
            raise

    def close(self) -> None:
        print('closing file')
        try:
            self._file_h.close()
        except Exception as ex:
            print('Exception in `close`: {}'.format(ex))
            raise

### Speech recognition from mp3 file

In [5]:
all_results = []
def speech_recognizer_recognised_cb(evt: speechsdk.SpeechRecognitionEventArgs):
    if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print('\tRecognised text={}'.format(evt.result.text))
        all_results.append(evt.result.text)
    elif evt.result.reason == speechsdk.ResultReason.NoMatch:
        print('\tNOMATCH: Speech could not be TRANSCRIBED: {}'.format(evt.result.no_match_details))
    return

In [6]:
def speech_recognize_continuous_from_file(filename):
    """performs continuous speech recognition with input from an audio file"""
    # <SpeechContinuousRecognitionWithFile>
# Creates an audio stream format. For an example we are using MP3 compressed file here
    compressed_format = speechsdk.audio.AudioStreamFormat(compressed_stream_format=speechsdk.AudioStreamContainerFormat.MP3)
    callback = BinaryFileReaderCallback(filename=filename)
    stream = speechsdk.audio.PullAudioInputStream(stream_format=compressed_format, pull_stream_callback=callback)

    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    audio_config = speechsdk.audio.AudioConfig(stream=stream)

    # Creates a speech recognizer using a file as audio input, also specify the speech language
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config, audio_config)
    
    done = False
    text=""

    def stop_cb(evt: speechsdk.SessionEventArgs):
        """callback that signals to stop continuous recognition upon receiving an event `evt`"""
        #print('CLOSING on {}'.format(evt))
        nonlocal done
        done = True

    # Connect callbacks to the events fired by the speech recognizer
    #speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.recognized.connect(speech_recognizer_recognised_cb)
    speech_recognizer.session_started.connect(lambda evt: print('SessionStarted event'))
    speech_recognizer.session_stopped.connect(lambda evt: print('SessionStopped event'))
    speech_recognizer.canceled.connect(lambda evt: print('Canceled event'))
    # Stop continuous recognition on either session stopped or canceled events
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)

    # Start continuous speech recognition
    speech_recognizer.start_continuous_recognition()
    while not done:
        time.sleep(.5)

    speech_recognizer.stop_continuous_recognition()
    return
    # </SpeechContinuousRecognitionWithFile>

In [7]:
all_results = []
speech_recognize_continuous_from_file(podcast_audio_file)
print("Recognized text:")
print(all_results)

SessionStarted event
	Recognised text=Neil deGrasse Tyson is one of America's best known astrophysicists and a beloved educator and advocate for the sciences. He has a great talent for presenting complex concepts in a clear and accessible manner. He's the head of the Hayden Planetarium and has been the director there since 1996. He's hosted numerous space related TV and radio programs, published several books, and host the podcast Star Talk Radio. I am thrilled to have you on the podcast today, Neil.
	Recognised text=Well, thanks. Thanks for having me. Thank you. Why do you take you this long to invite me? I just want to know. Shame, shame, shame on me. I'm not hidden right. You aren't. And and I will say, when I started this podcast and when I wrote my book and I started doing this very uncomfortable thing for me, which is trying to talk more about technology in the public, you were literally my role model. I said, you know, Neil deGrasse Tyson does such a wonderful job.
	Recognised t

### Save results to file

In [8]:
# Specify the file path
file_path = "transcript_Azure_STT.txt"

# Write the content to the file
with open(file_path, "w") as file:
    file.write("".join(all_results))

## Get transcription with speaker recognition

In [9]:
def conversation_transcriber_recognition_canceled_cb(evt: speechsdk.SessionEventArgs):
    print('Canceled event')

def conversation_transcriber_session_stopped_cb(evt: speechsdk.SessionEventArgs):
    print('SessionStopped event')

def conversation_transcriber_transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
    print('TRANSCRIBED:')
    if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print('\tSpeaker ID={}'.format(evt.result.speaker_id))
        print('\tText={}'.format(evt.result.text))
        all_results.append('\n'+evt.result.speaker_id+': ')
        all_results.append(evt.result.text)
    elif evt.result.reason == speechsdk.ResultReason.NoMatch:
        print('\tNOMATCH: Speech could not be TRANSCRIBED: {}'.format(evt.result.no_match_details))

def conversation_transcriber_session_started_cb(evt: speechsdk.SessionEventArgs):
    print('SessionStarted event')

In [10]:
def recognize_from_file(filename):
  # Creates an audio stream format. For an example we are using MP3 compressed file here
    compressed_format = speechsdk.audio.AudioStreamFormat(compressed_stream_format=speechsdk.AudioStreamContainerFormat.MP3)
    callback = BinaryFileReaderCallback(filename=filename)
    stream = speechsdk.audio.PullAudioInputStream(stream_format=compressed_format, pull_stream_callback=callback)

    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    speech_config.speech_recognition_language="en-US"
    audio_config = speechsdk.audio.AudioConfig(stream=stream)

    conversation_transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config, audio_config=audio_config)

    transcribing_stop = False

    def stop_cb(evt: speechsdk.SessionEventArgs):
        #"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        nonlocal transcribing_stop
        transcribing_stop = True

    # Connect callbacks to the events fired by the conversation transcriber
    conversation_transcriber.transcribed.connect(conversation_transcriber_transcribed_cb)
    conversation_transcriber.session_started.connect(conversation_transcriber_session_started_cb)
    conversation_transcriber.session_stopped.connect(conversation_transcriber_session_stopped_cb)
    conversation_transcriber.canceled.connect(conversation_transcriber_recognition_canceled_cb)
    # stop transcribing on either session stopped or canceled events
    conversation_transcriber.session_stopped.connect(stop_cb)
    conversation_transcriber.canceled.connect(stop_cb)

    conversation_transcriber.start_transcribing_async()

    # Waits for completion.
    while not transcribing_stop:
        time.sleep(.5)

    conversation_transcriber.stop_transcribing_async()

# Main

try:
    all_results = []
    recognize_from_file(podcast_audio_file)
except Exception as err:
    print("Encountered exception. {}".format(err))

SessionStarted event
TRANSCRIBED:
	Speaker ID=Guest-1
	Text=Neil deGrasse Tyson is one of America's best known astrophysicists and a beloved educator and advocate for the sciences. He has a great talent for presenting complex concepts in a clear and accessible manner. He's the head of the Hayden Planetarium and has been the director there since 1996. He's hosted numerous space related TV and radio programs, published several books, and host the podcast Star Talk Radio. I am thrilled to have you on the podcast today, Neil.
TRANSCRIBED:
	Speaker ID=Guest-2
	Text=Well, thanks. Thanks for having me. Thank you. Why do you take you this long to?
TRANSCRIBED:
	Speaker ID=Guest-1
	Text=Invite me. I just wanna know. Shame, shame, shame on me.
TRANSCRIBED:
	Speaker ID=Guest-2
	Text=I'm not hidden right.
TRANSCRIBED:
	Speaker ID=Guest-1
	Text=You aren't. And and I will say, when I started this podcast and when I wrote my book and I started doing this very uncomfortable thing for me, which is tryi

### Save results to file

In [11]:
# Specify the file path
file_path = "transcript_Azure_STT_with_speaker.txt"

# Write the content to the file
with open(file_path, "w") as file:
    file.write("".join(all_results))