# [Eval] Evaluate Trained Custom Speech Models
This sample demonstrates how to evaluate Trained Custom Speech models calling REST API. 

> ✨ ***Note*** <br>
> You can test the accuracy of your custom model by creating a test. A test requires a collection of audio files and their corresponding transcriptions. You can compare a custom model's accuracy with a speech to text base model or another custom model. After you get the test results, evaluate the word error rate (WER) compared to speech recognition results. 

## Prerequisites
Configure a Python virtual environment for 3.10 or later: 
 1. open the Command Palette (Ctrl+Shift+P).
 1. Search for Python: Create Environment.
 1. select Venv / Conda and choose where to create the new environment.
 1. Select the Python interpreter version. Create with version 3.10 or later.


## Setup the environment

In [1]:
import azure.cognitiveservices.speech as speechsdk
import os
import json
from openai import AzureOpenAI
import requests
from dotenv import load_dotenv
from utils.common import *
import requests
import time
import json
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, generate_blob_sas, BlobSasPermissions
import os
import datetime
from tqdm import tqdm
import pandas as pd
from IPython.display import display, HTML

load_dotenv()

SPEECH_KEY = os.getenv("AZURE_AI_SPEECH_API_KEY")
SPEECH_REGION = os.getenv("AZURE_AI_SPEECH_REGION")
CUSTOM_SPEECH_LANG = os.getenv("CUSTOM_SPEECH_LANG")
CUSTOM_SPEECH_LOCALE = os.getenv("CUSTOM_SPEECH_LOCALE")

# Get the project and custom model IDs from the previous notebook
project_id = ""
custom_model_with_plain_id = ""
custom_model_with_acoustic_id = ""
%store -r project_id
%store -r custom_model_with_plain_id
%store -r custom_model_with_acoustic_id

try:
    project_id, custom_model_with_plain_id, custom_model_with_acoustic_id
except NameError:
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")
    print("[ERROR] Please run the previous notebook again.")
    print("++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++")

print(project_id, custom_model_with_plain_id, custom_model_with_acoustic_id)

352518e0-0bb0-41fa-a57a-c1acb9b49a8a 7897ebf8-059c-442c-8a33-86f4b5842123 b331efb4-915f-41cc-8bb9-f074f39a3fdb


## 🧪 Test based speech models
- In order to learn how to quantitatively measure and improve the accuracy of the base speech to text model or your own custom models check this link
- https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-evaluate-data?pivots=speech-cli#create-a-test

To evaluate the word error rate (WER) of a base model in Azure AI’s Speech service, follow these steps:

Sign in to the Speech Studio:
Go to the Azure Speech Studio.
Create a Test:
Navigate to Custom speech and select your project.
Go to Test models and click on Create new test.
Select Evaluate accuracy and click Next.
Choose an audio + human-labeled transcription dataset. If you don’t have any datasets, upload them in the Speech datasets menu.
Select up to two models to evaluate, then click Next.
Enter the test name and description, then click Next.
Review the test details and click Save and close.
Get Test Results:
After the test is complete, indicated by the status set to Succeeded, you will see the results, including the WER for each tested model.
Evaluate WER:
WER is calculated as the sum of insertion, deletion, and substitution errors divided by the total number of words in the reference transcript, multiplied by 100 to get a percentage1.
For more detailed instructions, you can refer to this link - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-custom-speech-evaluate-data?pivots=rest-api.


In [2]:
# Base URL for the Speech Services REST API
base_url = f'https://{SPEECH_REGION}.api.cognitive.microsoft.com/speechtotext'

# Headers for authentication
headers = {
    'Ocp-Apim-Subscription-Key': SPEECH_KEY,
    'Content-Type': 'application/json'
}

### Check the custom speech model ids to evaluate

In [3]:
#option1. check the model id from the train a new model (UI) in the Azure Speech Studio. 
base_model_id = "8066b5fb-0114-4837-90b6-0c245928a896"  # Vietnamese base model id

#option2. check the model id from the API call
base_model = get_latest_base_model(base_url, headers, f"locale eq '{CUSTOM_SPEECH_LOCALE}' and status eq 'Succeeded'")

# Filter the base models to find the ones that support 'Language' adaptations and have the latest lastActionDateTime
filtered_models = [model for model in base_model['values'] if 'properties' in model and 'Language' in model['properties']['features'].get('supportsAdaptationsWith', [])]
if filtered_models:
	latest_model = max(filtered_models, key=lambda x: x['createdDateTime'])
	print("Latest model supporting 'Language' adaptations:")
	print(latest_model)
else:
	print("No models found that support 'Language' adaptations.")

# Get the latest model ID from the self link for example 8066b5fb-0114-4837-90b6-0c245928a896 is the model id in 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/8066b5fb-0114-4837-90b6-0c245928a896' 
base_model_id = latest_model['self'].split('/')[-1]

Latest model supporting 'Language' adaptations:
{'links': {'manifest': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/8066b5fb-0114-4837-90b6-0c245928a896/manifest'}, 'properties': {'deprecationDates': {'adaptationDateTime': '2025-04-15T00:00:00Z', 'transcriptionDateTime': '2025-04-15T00:00:00Z'}, 'features': {'supportsAdaptationsWith': ['Language'], 'supportsTranscriptions': True, 'supportsEndpoints': True, 'supportsTranscriptionsOnSpeechContainers': False, 'supportedOutputFormats': ['Display', 'Lexical']}, 'chargeForAdaptation': False}, 'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/8066b5fb-0114-4837-90b6-0c245928a896', 'displayName': '20230111', 'description': 'vi-VN base model', 'locale': 'vi-VN', 'createdDateTime': '2023-01-31T12:16:46Z', 'lastActionDateTime': '2023-01-31T13:08:53Z', 'status': 'Succeeded'}


In [4]:
# check the model id from the train a new model (UI) in the Azure Speech Studio. 
# The base model ids are vary from each language 
print("Latest base model id:", base_model_id)
print("custom_model_with_plain_id: ", custom_model_with_plain_id)
print("custom_model_with_acoustic_id: ", custom_model_with_acoustic_id)

Latest base model id: 8066b5fb-0114-4837-90b6-0c245928a896
custom_model_with_plain_id:  7897ebf8-059c-442c-8a33-86f4b5842123
custom_model_with_acoustic_id:  b331efb4-915f-41cc-8bb9-f074f39a3fdb


### Upload zip files to a storage account and generate content urls

In [5]:
data_folder = "eval_dataset"
account_name = os.getenv("AZURE_STORAGE_ACCOUNT_NAME")
account_key = os.getenv("AZURE_STORAGE_ACCOUNT_KEY")
container_name = os.getenv("AZURE_STORAGE_CONTAINER_NAME")

uploaded_files, url = upload_dataset_to_storage(data_folder, container_name, account_name, account_key)

Files uploaded successfully.
uploaded_files: ['eval_vi-VN_20241217132834', 'Call2-merge-err2', 'Call2-merge-err3', 'Call2-merge-err1']
url: {'eval_vi-VN_20241217132834': 'https://aoaihub1storageaccount.blob.core.windows.net/stt-container/eval_vi-VN_20241217132834.zip?se=2025-01-15T13%3A52%3A24Z&sp=r&sv=2025-01-05&sr=b&sig=ujIc/aGN5eRiOf094DmrK6EZw%2BZEtRm0Z2yQm0h3QHg%3D', 'Call2-merge-err2': 'https://aoaihub1storageaccount.blob.core.windows.net/stt-container/Call2-merge-err2.zip?se=2025-01-15T13%3A52%3A24Z&sp=r&sv=2025-01-05&sr=b&sig=KeQoR8JYhiZzZlMECFwzctm2wwXmEb1qiFtcVLhThNQ%3D', 'Call2-merge-err3': 'https://aoaihub1storageaccount.blob.core.windows.net/stt-container/Call2-merge-err3.zip?se=2025-01-15T13%3A52%3A24Z&sp=r&sv=2025-01-05&sr=b&sig=StZw6jXFiknc6OXO8koayVd4UP/YT0ITugVriG/5mzM%3D', 'Call2-merge-err1': 'https://aoaihub1storageaccount.blob.core.windows.net/stt-container/Call2-merge-err1.zip?se=2025-01-15T13%3A52%3A24Z&sp=r&sv=2025-01-05&sr=b&sig=BxHzNpcPoopE7JIBx4vHneUOnOwcsBWE

In [6]:
uploaded_files

['eval_vi-VN_20241217132834',
 'Call2-merge-err2',
 'Call2-merge-err3',
 'Call2-merge-err1']

### Create datasets for evaluation

In [7]:
kind="Acoustic"
description = f"[eval] Dataset for evaluation the {CUSTOM_SPEECH_LANG} base model"
dataset_ids = {}

for display_name in uploaded_files:
    dataset_ids[display_name] = create_dataset(base_url, headers, project_id, url[display_name], kind, display_name, description, CUSTOM_SPEECH_LOCALE)

Dataset created with ID: 867e28d4-2ae5-4d47-adce-fe325f829061
Dataset created with ID: 41a0a991-e8ee-4f6e-bbad-f4cee3855fcf
Dataset created with ID: c683d4d6-8021-413e-ba7b-79004e9587d8
Dataset created with ID: 8a7eb804-25a1-486b-8255-d32f94069bc1


In [8]:
for display_name in uploaded_files:
    monitor_status(base_url, headers, get_dataset_status, dataset_ids[display_name])

Running Status:   0%|          | 0/3 [00:00<?, ?step/s]

Current Status: Running


Running Status:  67%|██████▋   | 2/3 [00:10<00:05,  5.04s/step]

Current Status: Running


Running Status: 100%|██████████| 3/3 [00:20<00:00,  6.72s/step]


Operation Completed


Running Status: 100%|██████████| 3/3 [00:00<00:00, 83.71step/s]


Operation Completed


Running Status: 100%|██████████| 3/3 [00:00<00:00, 82.83step/s]


Operation Completed


Running Status: 100%|██████████| 3/3 [00:00<00:00, 92.86step/s]

Operation Completed





### Test accuracy of the trained Custom Speech model creating evaluations (tests)

In [9]:
description = f"[{CUSTOM_SPEECH_LOCALE}] Evaluation of the {CUSTOM_SPEECH_LANG} base and custom model"
evaluation_ids={}
for display_name in uploaded_files:
    evaluation_ids[display_name] = create_evaluation(base_url, headers, project_id, dataset_ids[display_name], base_model_id, custom_model_with_acoustic_id, f'vi_eval_base_vs_custom_{display_name}', description, CUSTOM_SPEECH_LOCALE)

https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/evaluations
352518e0-0bb0-41fa-a57a-c1acb9b49a8a 867e28d4-2ae5-4d47-adce-fe325f829061 8066b5fb-0114-4837-90b6-0c245928a896 b331efb4-915f-41cc-8bb9-f074f39a3fdb
{'model1': {'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/models/8066b5fb-0114-4837-90b6-0c245928a896'}, 'model2': {'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/models/b331efb4-915f-41cc-8bb9-f074f39a3fdb'}, 'dataset': {'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/datasets/867e28d4-2ae5-4d47-adce-fe325f829061'}, 'project': {'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/projects/352518e0-0bb0-41fa-a57a-c1acb9b49a8a'}, 'displayName': 'vi_eval_base_vs_custom_eval_vi-VN_20241217132834', 'description': '[vi-VN] Evaluation of the Vietnamese base and custom model', 'locale': 'vi-VN'}
Evaluation job created with ID: 91e05bf6-f0d4-4243-8aa7-3e336a

In [10]:
for display_name in uploaded_files:
    monitor_status(base_url, headers, get_evaluation_status, evaluation_ids[display_name])

Running Status:   0%|          | 0/3 [00:00<?, ?step/s]

Current Status: NotStarted
Current Status: NotStarted
Current Status: NotStarted
Current Status: NotStarted
Current Status: NotStarted
Current Status: NotStarted


Running Status:  67%|██████▋   | 2/3 [01:00<00:30, 30.19s/step]

Current Status: Running


Running Status: 100%|██████████| 3/3 [01:10<00:00, 23.49s/step]


Operation Completed


Running Status: 100%|██████████| 3/3 [00:00<00:00, 90.65step/s]


Operation Completed


Running Status:   0%|          | 0/3 [00:00<?, ?step/s]

Current Status: Running


Running Status: 100%|██████████| 3/3 [00:10<00:00,  3.36s/step]


Operation Completed


Running Status: 100%|██████████| 3/3 [00:00<00:00, 41.29step/s]

Operation Completed





## 🧪 Print evaluation result with WER

In [11]:
# Collect WER results for each dataset
wer_results = []
eval_title = "Evaluation Results for base model and custom model: "
for display_name in uploaded_files:
    eval_info = get_evaluation_results(base_url, headers, evaluation_ids[display_name])
    print(eval_info)
    eval_title = eval_title + display_name + " "
    wer_results.append({
            'Dataset': display_name,
            'WER_base_model': eval_info['properties']['wordErrorRate1'],
            'WER_custom_model': eval_info['properties']['wordErrorRate2'],
            
    })
# Create a DataFrame to display the results
print(eval_info)
wer_df = pd.DataFrame(wer_results)
print(eval_title)
print(wer_df)

{'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/evaluations/91e05bf6-f0d4-4243-8aa7-3e336a0f3483', 'model1': {'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/models/base/8066b5fb-0114-4837-90b6-0c245928a896'}, 'model2': {'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/models/b331efb4-915f-41cc-8bb9-f074f39a3fdb'}, 'dataset': {'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/datasets/867e28d4-2ae5-4d47-adce-fe325f829061'}, 'transcription2': {'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions/7e30f88c-92db-4e19-be8d-b1fab52bc3d8'}, 'transcription1': {'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions/4a97025a-9f75-4cb7-9ce5-79de50add8a7'}, 'project': {'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.2/projects/352518e0-0bb0-41fa-a57a-c1acb9b49a8a'}, 'links': {'files':

In [12]:
# Create a markdown file for table scoring results
md_table_scoring_result(base_url, headers, evaluation_ids, uploaded_files)

Evaluation eval_vi-VN_20241217132834 Scoring results
Evaluation Call2-merge-err2 Scoring results
Evaluation Call2-merge-err3 Scoring results
Evaluation Call2-merge-err1 Scoring results


## 🧪 Calculate WER/CER from the evaluation result
> ❗️To calculate the Word Error Rate (WER) and Character Error Rate (CER) from the evaluation result, you need to download the result file from Azure AI Foundry or Speech Studio, and then use the Human labeled transcription (normalized) and Machine recognition result > txt_lexical.

In [None]:
from utils.wercer_tool import evaluate_stt
from utils.wercer_tool import OriginFileType

origin_path = "wer_origin"
#wer_origin_path = "wer_origin"
stt_folder = "wer_result"
output_csv_path = "stt_evaluation_results.csv"
output_html_path = "stt_evaluation_results.html"

evaluation_df = evaluate_stt(
    # When you have multi lines in the original/stt files, you should use OriginFileType.MULTI_LINE
    OriginFileType.MULTI_LINE,
    origin_path=origin_path,
    stt_folder=stt_folder,
    output_csv=output_csv_path,
    output_html=output_html_path
)
evaluation_df

Unnamed: 0,wav_name,original_transcript,stt_transcript,WER,CER,highlighted_original,highlighted_stt
0,vi_Call6_preds,﻿để gặp trực tiếp nhân viên hỗ trợ mời quý khá...,﻿để gặp trực tiếp nhân viên hỗ trợ mời quý khá...,0.190476,0.055556,<span style='background-color:yellow'>﻿để</spa...,<span style='background-color:yellow'>﻿để</spa...
1,vi_Call6_preds,dạ contoso số năm không một ba xin nghe ạ rồi em ơi...,dạ contoso số năm không một ba xin nghe ạ tụi em ơi...,0.361446,0.170149,<span style='background-color:yellow'>dạ</span...,<span style='background-color:yellow'>dạ</span...


In [6]:
from utils.wercer_tool import evaluate_stt
from utils.wercer_tool import OriginFileType

origin_path = "wer_origin_vi/trans_original.txt"
#wer_origin_path = "wer_origin"
stt_folder = "wer_result_vi"
output_csv_path = "stt_evaluation_results.csv"
output_html_path = "stt_evaluation_results.html"

evaluation_df = evaluate_stt(
    # When you have a single line training file with the format:
    # 1_vi-VN_20241217132830.wav	Chào bộ phận hỗ trợ khách hàng của Contoso Electronics, tôi muốn hỏi về sản phẩm mới nhất của công ty.
    # 10_vi-VN_20241217132833.wav	Tôi muốn biết cách liên hệ với bộ phận hỗ trợ khách hàng của Contoso Electronics, có thể bạn hướng dẫn tôi cách liên hệ không?
    # 2_vi-VN_20241217132830.wav	Tôi đang gặp vấn đề với máy tính xách tay của mình, có thể bạn có thể giúp tôi khôi phục dữ liệu không?
    OriginFileType.SINGLE_LINE_TRAINING,
    origin_path=origin_path,
    stt_folder=stt_folder,
    output_csv=output_csv_path,
    output_html=output_html_path
)
evaluation_df

Unnamed: 0,wav_name,original_transcript,stt_transcript,WER,CER,highlighted_original,highlighted_stt
0,1_vi-VN_20241217132830.wav,Chào bộ phận hỗ trợ khách hàng của Contoso Ele...,Chào bộ phận hỗ trợ khách hàng của contoso ele...,0.095238,0.019608,<span style='background-color:yellow'>Chào</sp...,<span style='background-color:yellow'>Chào</sp...
1,10_vi-VN_20241217132833.wav,Tôi muốn biết cách liên hệ với bộ phận hỗ trợ ...,Tôi muốn biết cách liên hệ với bộ phận hỗ trợ ...,0.076923,0.02381,<span style='background-color:yellow'>Tôi</spa...,<span style='background-color:yellow'>Tôi</spa...
2,2_vi-VN_20241217132830.wav,Tôi đang gặp vấn đề với máy tính xách tay của ...,đang gặp vấn đề với máy tính xách tay của mình...,0.125,0.058252,<span style='background-color:red'>Tôi</span> ...,<span style='background-color:yellow'>đang</sp...
3,3_vi-VN_20241217132830.wav,Tôi muốn đổi sản phẩm mới nhất của Contoso Ele...,Tôi muốn đổi sản phẩm mới nhất của contoso ele...,0.090909,0.027273,<span style='background-color:yellow'>Tôi</spa...,<span style='background-color:yellow'>Tôi</spa...
4,4_vi-VN_20241217132831.wav,"Tôi muốn hủy đơn hàng của mình, có thể bạn hướ...",Tôi muốn hủy đơn hàng của mình. Có thể bạn hướ...,0.111111,0.024691,<span style='background-color:yellow'>Tôi</spa...,<span style='background-color:yellow'>Tôi</spa...
5,5_vi-VN_20241217132831.wav,Tôi muốn mua thêm phụ kiện cho máy tính xách t...,Tôi muốn mua thêm phụ kiện cho máy tính xách t...,0.083333,0.018519,<span style='background-color:yellow'>Tôi</spa...,<span style='background-color:yellow'>Tôi</spa...
6,6_vi-VN_20241217132831.wav,Tôi muốn biết thời gian giao hàng của sản phẩm...,Tôi muốn biết thời gian giao hàng của sản phẩm...,0.068966,0.061644,<span style='background-color:yellow'>Tôi</spa...,<span style='background-color:yellow'>Tôi</spa...
7,7_vi-VN_20241217132831.wav,Tôi muốn biết chính sách bảo hành của sản phẩm...,Tôi muốn biết chính sách bảo hành của sản phẩm...,0.068966,0.020548,<span style='background-color:yellow'>Tôi</spa...,<span style='background-color:yellow'>Tôi</spa...
8,8_vi-VN_20241217132832.wav,Tôi muốn biết cách sử dụng sản phẩm mới nhất c...,Tôi muốn biết cách sử dụng sản phẩm mới nhất c...,0.08,0.02439,<span style='background-color:yellow'>Tôi</spa...,<span style='background-color:yellow'>Tôi</spa...
9,9_vi-VN_20241217132832.wav,Tôi muốn biết cách truy cập vào tài khoản của ...,Tôi muốn biết cách truy cập vào tài khoản của ...,0.103448,0.027211,<span style='background-color:yellow'>Tôi</spa...,<span style='background-color:yellow'>Tôi</spa...
