# Transcribing Audio Using the Azure Speech Service - Batch APIs and Python

### Overview of Steps
* Get a blob SAS URI that you can pass to the API
* Submit request.
* Wait for transcription to be completed (check status)
* Download completed transactions
* Combine files
* Cleanup / Next Steps

### What you'll need: 
* An Azure subscription
* An Azure Speech service instance provisioned (Note: [needs to be in the standard pricing tier](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription#subscription-key), free tier doesn't currently work with the Batch API)
* A Speech service API key
* Audio file uploaded to an Azure Blob Storage account. See [supported formats here](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription#supported-formats)

### Manual Step: Generate SAS URI for Audio File in Blob Storage
The Batch API service needs to be able to access the audio files you have in blob storage. One way to do that is to make your blobs publicly accessible. But, if your audio is sensitive, a better way to do that is through signed secure URL calls a SAS (shared access signature). This will let you grant access for a limited amount of time and from a restricted IPs. 

To do this from the Azure portal,
1. navigate to the blob storage account and container where your audio is stored. 
2. right-click on the audio file you'd like to trasncribe. 
3. select "Generate SAS" 
4. in the pane that opens, set the permissions to "Read". The other defaults should be fine for the purposes of this demo. 
5. Click "Generate blob SAS token and URL" 
6. Copy the Blob SAS URL and save for upcoming steps.

_Note: if you are transcribing audio at scale or in a production system, you'll likely want to automate this step. {TODO: insert link to helpful documentation}_
   

### Submit your Audio File for Transcription

In [None]:
audio_sas_url = 'your_blob_sas_url'
speech_service_region = 'your_speech_service_region' ## i.e. eastus2
speech_service_key = 'your_speech_service_key'

In [None]:
import requests
import json

speech_batch_url = "https://{}.cris.ai/api/speechtotext/v2.0/transcriptions".format(speech_service_region)

headers = {
    "Ocp-Apim-Subscription-Key": speech_service_key, 
    "Content-Type":"application/json"
}

body = {
  "recordingsUrl": audio_sas_url,
  "models": [],
  "locale": "en-US",
  "name": "FileNameOrSomethingHere",
  "description": "Audio Transcription Submitted from BDL Jupyter Notebook",
  "properties": {
    "ProfanityFilterMode": "Masked",
    "PunctuationMode": "DictatedAndAutomatic"
  }
}

In [None]:
r = requests.post(url = speech_batch_url, headers = headers, data = json.dumps(body))

## Check Status of Response
if r.status_code == 202:
    submission_url = r.headers['Operation-Location']
    print("Audio sumbitted for processing.")
    print("Check status of this submission at {}".format(submission_url))
    
else: 
    print("There was an error submitting audio for processing.")

### Check on Transcription Status
The batch API is not intended to provide an immediate response. Instead, requests are queued and processed over time.

The response header of the submission contains a URL with an ID for the job, which can be used to to check the status and get download links for the transcription once complete. 

In [None]:
r2 = requests.get(url=submission_url, headers = headers)
transcript_info = json.loads(r2.content)
transcript_status = transcript_info['status']
transcript_id = transcript_info['id']

print("Transcription ID# {} Status is {}".format(transcript_id, transcript_status))
if transcript_status == "Succeeded":
    print("Transcripts can be downloaded: ")
    for key, value in transcript_info['resultsUrls'].items():
        print(key + " : " + value)
elif transcript_status == "NotStarted":
    print("Transcription has not completed. Please check again later.")
elif transcript_status == "Running":
    print("The submitted transcript is running. Check back soon.")
else:
    print("There appears to have been some sort of issue with the transcription. {}".format(r2.content))

Once the transcription status is "Complete", you can proceed to the next step. In my experience this usually happens within a few minutes to an hour of submission. 

### Download Completed Transcription
If you upload stereo audio, each channel will be processed into a separate transcript. Mono audio is processed into the same file. 

The result format is a JSON document that displays the various utterances of the file and their duration and offset. Each utterance is given a confidence score. 

You can iterate through the JSON transcript files to prepare a block of text that can be submitted to other services for further analysis. 

In [None]:
## For each channel, download the JSON transcript
for key, value in transcript_info['resultsUrls'].items():
    transcript = json.loads(requests.get(value).content)
    
    ## Then, combine the JSON results into a block of text. 
    full_text = ""
    for t in transcript["AudioFileResults"][0]["SegmentResults"]:
        full_text += t['NBest'][0]['Display']
    
    print("Transcript for {}: \r\n {}".format(key, full_text))

### Cleanup
You may wish to clean up the items you've submitted to the service by issuing the "Delete" command to the endpoint with the ID of the submitted transcription.

In [None]:
r = requests.delete(url=speech_batch_url+"/"+transcript_id, headers=headers)
r.status_code

### Next Steps
Once you've gotten your audio transcript, you can do all kinds of fun things such as performing various text analytics tasks using the Azure Text API. 

### References: 
[Batch Transcription (REST) Documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription)

[Custom Speech API Swagger Documentation](https://eastus2.cris.ai/swagger/ui/index#/Custom%20Speech%20transcriptions%3A/CreateTranscription)