# [TTS] Create Custom Speech Model 
This sample demonstrates how to create Custom Speech model calling REST API. 

> ✨ ***Note*** <br>
> Please check the custom speech support for each language before you get started - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt#:~:text=Custom%20speech%20support 

## Prerequisites
Git clone the repository to your local machine. 

```bash
git clone https://github.com/hyogrin/Azure_OpenAI_samples.git
```

* A subscription key for the Speech service. See [Try the speech service for free](https://docs.microsoft.com/azure/cognitive-services/speech-service/get-started).
* Python 3.5 or later needs to be installed. Downloads are available [here](https://www.python.org/downloads/).
* The Python Speech SDK package is available for Windows (x64 or x86) and Linux (x64; Ubuntu 16.04 or Ubuntu 18.04).
* On Ubuntu 16.04 or 18.04, run the following commands for the installation of required packages:
  ```sh
  sudo apt-get update
  sudo apt-get install libssl1.0.0 libasound2
  ```
* On Debian 9, run the following commands for the installation of required packages:
  ```sh
  sudo apt-get update
  sudo apt-get install libssl1.0.2 libasound2
  ```
* On Windows you need the [Microsoft Visual C++ Redistributable for Visual Studio 2017](https://support.microsoft.com/help/2977003/the-latest-supported-visual-c-downloads) for your platform.

Configure a Python virtual environment for 3.10 or later: 
 1. open the Command Palette (Ctrl+Shift+P).
 1. Search for Python: Create Environment.
 1. select Venv / Conda and choose where to create the new environment.
 1. Select the Python interpreter version. Create with version 3.10 or later.

```bash
pip install -r requirements.txt
```

Create an .env file based on the .env-sample file. Copy the new .env file to the folder containing your notebook and update the variables.

## Setup the environment

In [3]:
import azure.cognitiveservices.speech as speechsdk
import os
import json
from openai import AzureOpenAI
import requests
from dotenv import load_dotenv
load_dotenv()

speech_key = os.getenv("AZURE_AI_SPEECH_API_KEY")
speech_region = os.getenv("AZURE_AI_SPEECH_REGION")

headers = {
    'Ocp-Apim-Subscription-Key': speech_key,
    'Content-Type': 'application/json'
}


## Create a project

In [12]:
# Step 1: Create a Project
def create_project(name, description, locale):
    project_url = f"https://{speech_region}.api.cognitive.microsoft.com/speechtotext/v3.1/projects"
    data = {
        "displayName": name,
        "locale": locale,
        "description": description
    }
    response = requests.post(project_url, headers=headers, json=data)
    if response.status_code == 201:
        project = response.json()
        print(f"Project created : {project}")
        return project["self"]
    else:
        print(f"Failed to create project. Status code: {response.status_code}")
        print("Error message:", response.text)
        return None

In [14]:
project = create_project("MyProject1", "Project for custom speech model1", "en-US")
print(project)


Project created : {'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.1/projects/013d27fb-ae20-474f-b53c-cf9be14a007a', 'links': {'evaluations': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.1/projects/013d27fb-ae20-474f-b53c-cf9be14a007a/evaluations', 'datasets': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.1/projects/013d27fb-ae20-474f-b53c-cf9be14a007a/datasets', 'models': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.1/projects/013d27fb-ae20-474f-b53c-cf9be14a007a/models', 'endpoints': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.1/projects/013d27fb-ae20-474f-b53c-cf9be14a007a/endpoints', 'transcriptions': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.1/projects/013d27fb-ae20-474f-b53c-cf9be14a007a/transcriptions'}, 'properties': {'datasetCount': 0, 'evaluationCount': 0, 'modelCount': 0, 'transcriptionCount': 0, 'endpointCount': 0}, 'createdDateTime': '

In [None]:
# Step 2: Create an Acoustic Dataset
def create_dataset(project, name, description, locale):
    dataset_url = f"https://{speech_region}.api.cognitive.microsoft.com/speechtotext/v3.1/datasets/upload"
    data = {
        "kind": "Acoustic",
        "locale": locale,
        "displayName": name,
        "description": description, 
        "project": project,
        "data": {
            "contentUrls": [
                "https://speechsamples.blob.core.windows.net/speech/short-audio.wav"
            ]
        }
        
    }
    response = requests.post(dataset_url, headers=headers, json=data)
    if response.status_code == 201:
        dataset = response.json()
        print(f"dataset created : {dataset}")
        return dataset['self']
    else:
        print(f"Failed to create dataset. Status code: {response.status_code}")
        print("Error message:", response.text)
        return None

In [None]:
if(project):
    dataset = create_dataset(project, "MyDataset", "Dataset for custom speech model", "en-US")
print(dataset)
    

dataset created : {'self': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.1/datasets/6b6f2403-9cb0-4c43-9620-c8a4d92289ff', 'kind': 'Acoustic', 'links': {'files': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.1/datasets/6b6f2403-9cb0-4c43-9620-c8a4d92289ff/files', 'commitBlocks': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.1/datasets/6b6f2403-9cb0-4c43-9620-c8a4d92289ff/blocks:commit', 'listBlocks': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.1/datasets/6b6f2403-9cb0-4c43-9620-c8a4d92289ff/blocks', 'uploadBlocks': 'https://swedencentral.api.cognitive.microsoft.com/speechtotext/v3.1/datasets/6b6f2403-9cb0-4c43-9620-c8a4d92289ff/blocks'}, 'properties': {'acceptedLineCount': 0, 'rejectedLineCount': 0}, 'lastActionDateTime': '2024-11-04T10:14:16Z', 'status': 'NotStarted', 'createdDateTime': '2024-11-04T10:14:16Z', 'locale': 'en-US', 'displayName': 'MyDataset', 'description': 'Dataset for custom speech mo

In [None]:
def upload_files_to_dataset(dataset_id, file_path):
    upload_url = f"https://{speech_region}.api.cognitive.microsoft.com/speechtotext/v3.1/datasets/{dataset_id}/files"
    file_name = os.path.basename(file_path)
    params = {
        "fileName": file_name
    }
    files = {
        'data': (file_name, open(file_path, 'rb'), 'audio/wav')
    }
    response = requests.post(upload_url, headers={'Ocp-Apim-Subscription-Key': subscription_key}, params=params, files=files)
    if response.status_code == 202:
        print(f"File '{file_name}' uploaded successfully.")
    else:
        print(f"Failed to upload file '{file_name}'. Status code: {response.status_code}")
        print("Error message:", response.text)