Skip to content

Commit

Permalink
Merge pull request #6910 from mahalaxminanya/my-update
Browse files Browse the repository at this point in the history
My update
  • Loading branch information
v-stsavell committed Jul 7, 2022
2 parents eb05b11 + 9a65dc8 commit 6d0a6e8
Show file tree
Hide file tree
Showing 5 changed files with 2,333 additions and 453 deletions.
Binary file modified docs/browse/thumbs/speech-services.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
34 changes: 21 additions & 13 deletions docs/solution-ideas/articles/speech-services-content.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,49 @@
[!INCLUDE [header_file](../../../includes/sol-idea-header.md)]

With Speech services, it's easy to transcribe every call. Index the transcription for [full-text search](/azure/search/search-what-is-azure-search), or apply [Text Analytics](/azure/cognitive-services/Text-Analytics) to detect sentiment, language, and key phrases for insights. If your call center recordings involve specialized terminology, such as product names or IT jargon, create a custom [language model](/azure/cognitive-services/speech-service/how-to-customize-language-model) to teach Speech Services the vocabulary. A custom [acoustic model](/azure/cognitive-services/speech-service/how-to-customize-acoustic-models) helps Speech Services understand speakers even with background noise or poor phone connections.
Speech Services is part of the Cognitive Services family of products and this service uses AI to work and process audio files. Some of the most common use cases for working with speech files involve the transcription of an audio file into a text file by relying on the Speech-to-text API. Furthermore, the previous use case can see its value amplified with the use of other Cognitive Services which can process the transcriptions and mine the data with APIs such as Text Analytics, Sentiment Analysis, Cognitive Search and Translation.

For more information, read how [batch transcription](/azure/cognitive-services/speech-service/batch-transcription) works with Speech Services.
Several industries rely on supporting their customers over the phone, such as call centers, medical response units, emergency services units, and so on.

Traditionally, a call center relies on agents who talk over the phone with customers. The agents need to handle two jobs at the same time: listening and speaking over the phone, while at the same time taking notes for further analysis and documentation of a particular case. This makes the job not only harder for the agent, but also less efficient. It could even impact negatively on the call centers' most common KPIS, such as AHT (average handling time) and FCR (first call resolution).

## Potential use cases

This solution can be for organizations that record conversations (for training or quality assurance) that also want a written transcript. It's ideal for the education, retail, and nonprofit industries.
This solution can be for organizations that record conversations (for training or quality assurance) that also want a written transcript. It's ideal for the education, retail, healthcare, and nonprofit industries.

## Architecture

![Architecture diagram shows recorded calls to Azure Trans Queue to Speech Endpoint to Transcription Result Queue to Transcript Blob and Insights.](../media/speech-services.png)
![Architecture diagram shows recorded calls to Azure trans queue to speech endpoint to transcription result queue to transcript blob and insights.](../media/speech-services.png)
*Download an [SVG](../media/speech-services.svg) of this architecture.*

### Dataflow

1. Adapt a model for your domain and deploy that model.
1. Upload your recordings to a blob container.
1. Create a POST request to batch transcription.
1. The Speech service schedules the transcription job.
1. Stereo files are split into two channels.
1. Mono files undergo diarization to distinguish between speakers.
1. Download the transcription using the transcription ID.
1. The first step begins with the collection of data. Calls in a call center are usually recorded. It would be best to store those recordings in their raw state (.wav or .mp3 file formats) into a Blob Storage.
1. A function app is then used to issue a GET request to a speech service endpoint, to get the results transcribed. You can also use Queue Storage to start partitioning the files before issuing a GET request to a speech service endpoint. For customization, you can use Custom Speech to build a custom model and deploy the model to an endpoint, to get the results transcribed.
1. The transcribed results will generate an output as a .txt file, which can be moved to Blob Storage by using a POST request to the Speech service endpoint.
1. Queue Storage works with individual files before sending them to their final destination. A call transcripts blob is used to store the call transcripts in a .txt file format, and a transcription insights blob will store the transcription insights that are generated using Language services to detect sentiment, language, and key phrases for insights.
1. Finally, the visualization stage can be served either via a web app or a dashboard in Power BI.


### Components

* [Azure Blob Storage](https://azure.microsoft.com/services/storage/blobs)
* [Speech service](https://azure.microsoft.com/en-us/services/cognitive-services/speech-services)
* [Speech service](https://azure.microsoft.com/services/cognitive-services/speech-services)
* [Cognitive Service for Language](https://azure.microsoft.com/services/cognitive-services/language-service)
* [Azure Functions](https://azure.microsoft.com/services/functions)
* [Azure Queue Storage](https://azure.microsoft.com/services/storage/queues)

## Next steps

To learn more about these services, see the following articles:

* [Azure Blob Storage](/azure/storage/blobs)
* [Speech service](/azure/cognitive-services/Speech-Service)
* [Train a Custom Speech model](/azure/cognitive-services/speech-service/how-to-custom-speech-train-model)
* [Cognitive Service for Language](/azure/cognitive-services/language-service/overview)
* [Azure Functions](/azure/azure-functions/functions-reference)
* [Azure Queue Storage](/azure/storage/queues/storage-queues-introduction)

## Related resources
## Related resource

* [Artificial intelligence (AI) - Architectural overview](../../data-guide/big-data/ai-overview.md)
* [Speech-to-text conversion](../../reference-architectures/ai/speech-to-text-transcription-pipeline.yml)
2 changes: 1 addition & 1 deletion docs/solution-ideas/articles/speech-services.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ metadata:
description: Use Speech services to transcribe calls and then run full-text searches, detect sentiment and language, and create custom language and acoustic models.
author: edprice-msft
ms.author: edprice
ms.date: 06/14/2022
ms.date: 07/07/2022
ms.topic: conceptual
ms.service: architecture-center
ms.subservice: solution-idea
Expand Down
Binary file modified docs/solution-ideas/media/speech-services.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 6d0a6e8

Please sign in to comment.