# Getting started with Speech to Text

The IBM® Speech to Text service provides APIs that use IBM's speech-recognition capabilities to produce transcripts of spoken audio. The service can transcribe speech from various languages and audio formats. In addition to basic transcription, the service can produce detailed information about many different aspects of the audio. For most languages, the service supports two sampling rates, broadband and narrowband. It returns all JSON response content in the UTF-8 character set.  

This notebook explains how to develop a custom-model for your chosen language

IBM documentation for Watson Speech to Text https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-gettingStarted

IBM API documentation for Watson Speech to Text https://cloud.ibm.com/apidocs/speech-to-text?code=python#introduction


## Before you begin

a) Create an instance of the service:
<br>
b) Go to the Speech to Text page in the IBM Cloud Catalog. select Watson Speech to Text (STT)
<br>
c) Select your Region + Plan and Click Create.
<br>
d) Copy the credentials to authenticate to your service instance:
<br>
e) From the IBM Cloud Resource list, click on your Speech to Text service instance to go to the Speech to Text service dashboard page.


## 1. Setup

##### To prepare your environment, you need to install some packages and enter credentials for the Watson services.

In [None]:
!pip install --upgrade "ibm-watson>=4.4.1"

### 1.1 Import Packages and Libraries

In [None]:
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from os.path import join, dirname
import json
import os
import time

### 1.2 Add Service Credentials From IBM Cloud for Watson Services

#### Edit the following cell to provide your credentials for Watson STT

- Insert API key
- Insert STT URL (example https://api.eu-de.speech-to-text.watson.cloud.ibm.com )
- Run the cell.


In [None]:
url = "<Insert URL from credentials>"
api_key = "<Insert apikey from credentials>"
authenticator = IAMAuthenticator(api_key)
speech_to_text = SpeechToTextV1(
authenticator=authenticator
    )
speech_to_text.set_service_url(url)

### 1.3 Creating a local path for your notebook

In [None]:
notebook_path = os.path.dirname(os.path.abspath("Acoustic Model Training.ipynb"))

## 2. Create a new acoustic model in Watson STT in Dutch

Creates a new custom acoustic model for a specified base model. The custom acoustic model can be used only with the base model for which it is created. The model is owned by the instance of the service whose credentials are used to create it.

You can create a maximum of 1024 custom acoustic models per owning credentials. The service returns an error if you attempt to create more than 1024 models. You do not lose any models, but you cannot create any more until your model count is below the limit.



### 2.1 List the existing Dutch speech models available within the Watson STT service

- Change language if required (example en-US)

This link provides information about available languages https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models

In [None]:
acoustic_models = speech_to_text.list_acoustic_models(language='nl-NL').get_result()
print(json.dumps(acoustic_models, indent=2))


### 2.2 Create a Dutch speech acoustic model.

- add a name for your model
- add the language you require, (example nl-NL_NarrowbandModel)
- provide a description to the model

***The output is the acoustic model id, which you will require later in this notebook.***

In [None]:
# Create a Acoustic Model
acoustic_model = speech_to_text.create_acoustic_model(
    '<add name of the model>',
    'nl-NL_NarrowbandModel',
     description='<add description>').get_result()
print(json.dumps(acoustic_model, indent=2))

## 3. Add audio to the custom acoustic model

The service accepts the same audio file formats for acoustic modeling that it accepts for speech recognition. It also accepts archive files that contain multiple audio files. Archive files are the preferred means of adding audio resources. You can repeat the method to add more audio or archive files to a custom model.

More details can be found at the following link https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acoustic#createModel-acoustic

- add the acoustic model id (customization_id) created in cell 2.2 to the cell below

***The output describes the "total minutes of audio" within the custom acoustic model.***

In [None]:
custom_acoustic_narrowband_model_id = '<add customization_id>'
audio_resources = speech_to_text.list_audio(custom_acoustic_narrowband_model_id).get_result()
print(json.dumps(audio_resources, indent=2))

### 3.1 Adding an audio resource

You can add individual audio files or archive files that contain multiple audio files to a custom acoustic model. The recommended means of adding audio resources is by adding archive files. Creating and adding a single archive file is considerably more efficient than adding multiple audio files individually. You can also submit requests to add multiple different audio resources at the same time.

More information about audio resources can be found in the following link https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audioResources#addAudioResource

- add the file 
- create a name for the upload (example audio1)
- change file type . example below is wav (Using .zip for bulk upload = application/zip)
- allow_overwrite = false (if you dont want to overwrite the current audio file)

In [None]:
with open(join(dirname('.'), '.', '<add file>'),
               'rb') as audio_file:
    speech_to_text.add_audio(
        custom_acoustic_narrowband_model_id,
        'audio1',
        audio_file,
        content_type='audio/wav',
        allow_overwrite= 'true'
    )

## 4. Training the acoustic model

**** You may need to wait before running the cell - locked in the previous process ****


Once you add audio resources to the custom model, you must train the model. Training prepares the custom acoustic model for use in speech recognition. Training can take a significant amount of time. The length of the training depends on the amount of audio data that the model contains.

Total audio duration must be between 10 minutes and 12000 minutes 

In [None]:
speech_to_text.train_acoustic_model(custom_acoustic_narrowband_model_id)

### 4.1 Check the status of the training 
Depending on the audio size (or number of files), this can take a while to complete

In [None]:
# Get status of the language model - wait until it is 'available'
acoustic_models = speech_to_text.list_acoustic_models(language='nl-NL').get_result()
models = acoustic_models["customizations"]

statusNotAvailable = True
while statusNotAvailable:    
    #print(json.dumps(language_models, indent=2))
    for model in models:
        if model['customization_id'] == custom_acoustic_narrowband_model_id: 
            if model['status'] == 'available':
                print(model['status'])
                statusNotAvailable = False
                break
            else:
               time.sleep(15)
                print(model['status'])

## 5. Using a custom acoustic model

Once you create and train your custom acoustic model, you can use it in speech recognition requests. You use the acoustic_customization_id parameter to specify the custom acoustic model for a request.

- Add the file name + type (example test.wav)
- Add the content type - (example audio/wav)
- Add the model type (example nl-NL_NarrowbandModel)

In [None]:
files = ['<add file>']
for file in files:
    with open(join(dirname('.'), './.', file),
                   'rb') as audio_file:
        speech_recognition_results = speech_to_text.recognize(
            audio=audio_file,
            content_type='audio/wav',
            model='nl-NL_NarrowbandModel',
            acoustic_customization_id=custom_acoustic_narrowband_model_id
        ).get_result()
    print(json.dumps(speech_recognition_results, indent=2))


## 6.  Danger Zone - Delete Acoustic Model

- add a specific 'customization_id" in the cell below if required (see cell 2.1)

In [None]:
speech_to_text.delete_acoustic_model('<add customization_id>')