title | titleSuffix | description | author | manager | ms.service | ms.topic | ms.date | ms.author | zone_pivot_groups |
---|---|---|---|---|---|---|---|---|---|
Upload training and testing datasets for custom speech - Speech service |
Azure AI services |
Learn about how to upload data to test or train a custom speech model. |
eric-urban |
nitinme |
azure-ai-speech |
how-to |
4/15/2024 |
eur |
speech-studio-cli-rest |
You need audio or text data for testing the accuracy of speech recognition or training your custom models. For information about the data types supported for testing or training your model, see Training and testing datasets.
Tip
You can also use the online transcription editor to create and refine labeled audio datasets.
::: zone pivot="speech-studio"
To upload your own datasets in Speech Studio, follow these steps:
-
Sign in to the Speech Studio.
-
Select Custom speech > Your project name > Speech datasets > Upload data.
-
Select the Training data or Testing data tab.
-
Select a dataset type, and then select Next.
-
Specify the dataset location, and then select Next. You can choose a local file or enter a remote location such as Azure Blob URL. If you select remote location, and you don't use trusted Azure services security mechanism, then the remote location should be a URL that can be retrieved with a simple anonymous GET request. For example, a SAS URL or a publicly accessible URL. URLs that require extra authorization, or expect user interaction aren't supported.
[!NOTE] If you use Azure Blob URL, you can ensure maximum security of your dataset files by using trusted Azure services security mechanism. You will use the same techniques as for Batch transcription and plain Storage Account URLs for your dataset files. See details here.
-
Enter the dataset name and description, and then select Next.
-
Review your settings, and then select Save and close.
After your dataset is uploaded, go to the Train custom models page to train a custom model.
::: zone-end
::: zone pivot="speech-cli"
[!INCLUDE Map CLI and API kind to Speech Studio options]
To create a dataset and connect it to an existing project, use the spx csr dataset create
command. Construct the request parameters according to the following instructions:
-
Set the
project
parameter to the ID of an existing project. This parameter is recommended so that you can also view and manage the dataset in Speech Studio. You can run thespx csr project list
command to get available projects. -
Set the required
kind
parameter. The possible set of values for dataset kind are: Language, Acoustic, Pronunciation, and AudioFiles. -
Set the required
contentUrl
parameter. This parameter is the location of the dataset. If you don't use trusted Azure services security mechanism (see next Note), then thecontentUrl
parameter should be a URL that can be retrieved with a simple anonymous GET request. For example, a SAS URL or a publicly accessible URL. URLs that require extra authorization, or expect user interaction aren't supported.[!NOTE] If you use Azure Blob URL, you can ensure maximum security of your dataset files by using trusted Azure services security mechanism. You will use the same techniques as for Batch transcription and plain Storage Account URLs for your dataset files. See details here.
-
Set the required
language
parameter. The dataset locale must match the locale of the project. The locale can't be changed later. The Speech CLIlanguage
parameter corresponds to thelocale
property in the JSON request and response. -
Set the required
name
parameter. This parameter is the name that is displayed in the Speech Studio. The Speech CLIname
parameter corresponds to thedisplayName
property in the JSON request and response.
Here's an example Speech CLI command that creates a dataset and connects it to an existing project:
spx csr dataset create --api-version v3.1 --kind "Acoustic" --name "My Acoustic Dataset" --description "My Acoustic Dataset Description" --project YourProjectId --content YourContentUrl --language "en-US"
You should receive a response body in the following format:
{
"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/datasets/e0ea620b-e8c3-4a26-acb2-95fd0cbc625c",
"kind": "Acoustic",
"contentUrl": "https://contoso.com/mydatasetlocation",
"links": {
"files": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/datasets/e0ea620b-e8c3-4a26-acb2-95fd0cbc625c/files"
},
"project": {
"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/projects/70ccbffc-cafb-4301-aa9f-ef658559d96e"
},
"properties": {
"acceptedLineCount": 0,
"rejectedLineCount": 0
},
"lastActionDateTime": "2022-05-20T14:07:11Z",
"status": "NotStarted",
"createdDateTime": "2022-05-20T14:07:11Z",
"locale": "en-US",
"displayName": "My Acoustic Dataset",
"description": "My Acoustic Dataset Description"
}
The top-level self
property in the response body is the dataset's URI. Use this URI to get details about the dataset's project and files. You also use this URI to update or delete a dataset.
For Speech CLI help with datasets, run the following command:
spx help csr dataset
::: zone-end
::: zone pivot="rest-api"
[!INCLUDE Map CLI and API kind to Speech Studio options]
To create a dataset and connect it to an existing project, use the Datasets_Create operation of the Speech to text REST API. Construct the request body according to the following instructions:
-
Set the
project
property to the URI of an existing project. This property is recommended so that you can also view and manage the dataset in Speech Studio. You can make a Projects_List request to get available projects. -
Set the required
kind
property. The possible set of values for dataset kind are: Language, Acoustic, Pronunciation, and AudioFiles. -
Set the required
contentUrl
property. This property is the location of the dataset. If you don't use trusted Azure services security mechanism (see next Note), then thecontentUrl
parameter should be a URL that can be retrieved with a simple anonymous GET request. For example, a SAS URL or a publicly accessible URL. URLs that require extra authorization, or expect user interaction aren't supported.[!NOTE] If you use Azure Blob URL, you can ensure maximum security of your dataset files by using trusted Azure services security mechanism. You will use the same techniques as for Batch transcription and plain Storage Account URLs for your dataset files. See details here.
-
Set the required
locale
property. The dataset locale must match the locale of the project. The locale can't be changed later. -
Set the required
displayName
property. This property is the name that is displayed in the Speech Studio.
Make an HTTP POST request using the URI as shown in the following example. Replace YourSubscriptionKey
with your Speech resource key, replace YourServiceRegion
with your Speech resource region, and set the request body properties as previously described.
curl -v -X POST -H "Ocp-Apim-Subscription-Key: YourSubscriptionKey" -H "Content-Type: application/json" -d '{
"kind": "Acoustic",
"displayName": "My Acoustic Dataset",
"description": "My Acoustic Dataset Description",
"project": {
"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/projects/70ccbffc-cafb-4301-aa9f-ef658559d96e"
},
"contentUrl": "https://contoso.com/mydatasetlocation",
"locale": "en-US",
}' "https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/v3.1/datasets"
You should receive a response body in the following format:
{
"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/datasets/e0ea620b-e8c3-4a26-acb2-95fd0cbc625c",
"kind": "Acoustic",
"contentUrl": "https://contoso.com/mydatasetlocation",
"links": {
"files": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/datasets/e0ea620b-e8c3-4a26-acb2-95fd0cbc625c/files"
},
"project": {
"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/projects/70ccbffc-cafb-4301-aa9f-ef658559d96e"
},
"properties": {
"acceptedLineCount": 0,
"rejectedLineCount": 0
},
"lastActionDateTime": "2022-05-20T14:07:11Z",
"status": "NotStarted",
"createdDateTime": "2022-05-20T14:07:11Z",
"locale": "en-US",
"displayName": "My Acoustic Dataset",
"description": "My Acoustic Dataset Description"
}
The top-level self
property in the response body is the dataset's URI. Use this URI to get details about the dataset's project and files. You also use this URI to update or delete the dataset.
::: zone-end
Important
Connecting a dataset to a custom speech project isn't required to train and test a custom model using the REST API or Speech CLI. But if the dataset is not connected to any project, you can't select it for training or testing in the Speech Studio.