# Lexicon - Custom Model


## Overview

For this project, I will build a simple custom language model that is able to learn from any text data provided, and return a transcript with confidence values from input posed in speech utterances. I will use Google's cloud-based services to preprocess the input audio data and transcribe into an initial guess. Then I will train a model to improve on Google cloud speech API's response.



## Getting Started

In order to use Google's cloud-based services, you first need to create an account on the [Google Cloud Platform](https://cloud.google.com//).

Then, for each service you want to use, you have to enable use of that service.

In [1]:
!pip install --upgrade google-cloud-speech

Requirement already up-to-date: google-cloud-speech in /Users/deanmwebb/anaconda/envs/sdc_dev/lib/python3.5/site-packages
Requirement already up-to-date: google-gax<0.16dev,>=0.15.14 in /Users/deanmwebb/anaconda/envs/sdc_dev/lib/python3.5/site-packages (from google-cloud-speech)
Requirement already up-to-date: googleapis-common-protos[grpc]<2.0dev,>=1.5.2 in /Users/deanmwebb/anaconda/envs/sdc_dev/lib/python3.5/site-packages (from google-cloud-speech)
Requirement already up-to-date: google-cloud-core<0.28dev,>=0.27.0 in /Users/deanmwebb/anaconda/envs/sdc_dev/lib/python3.5/site-packages (from google-cloud-speech)
Requirement already up-to-date: protobuf<4.0dev,>=3.0.0 in /Users/deanmwebb/anaconda/envs/sdc_dev/lib/python3.5/site-packages (from google-gax<0.16dev,>=0.15.14->google-cloud-speech)
Requirement already up-to-date: ply==3.8 in /Users/deanmwebb/anaconda/envs/sdc_dev/lib/python3.5/site-packages (from google-gax<0.16dev,>=0.15.14->google-cloud-speech)
Requirement already up-to-date

### Install the Google Cloud SDK: https://cloud.google.com/sdk/docs/

In [2]:
!CLOUDSDK_CORE_DISABLE_PROMPTS=1 ./google-cloud-sdk/install.sh

Welcome to the Google Cloud SDK!

To help improve the quality of this product, we collect anonymized usage data
and anonymized stacktraces when crashes are encountered; additional information
is available at <https://cloud.google.com/sdk/usage-statistics>. You may choose
to opt out of this collection now (by choosing 'N' at the below prompt), or at
any time in the future by running the following command:

    gcloud config set disable_usage_reporting true


Your current Cloud SDK version is: 170.0.1
The latest available version is: 170.0.1

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                  Components                                                 │
├───────────────┬──────────────────────────────────────────────────────┬──────────────────────────┬───────────┤
│     Status    │                         Name                         │            ID            │    Size   │
├────

## Authenticate with Google Cloud API:

In [4]:
!source google-cloud-sdk/completion.bash.inc && \
source google-cloud-sdk/path.bash.inc && \
gcloud auth activate-service-account lexicon-bot@exemplary-oath-179301.iam.gserviceaccount.com --key-file=Lexicon-e94eff39fad7.json

Activated service account credentials for: [lexicon-bot@exemplary-oath-179301.iam.gserviceaccount.com]


In [5]:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS']='/Users/deanmwebb/Google Drive/Development/consulting/lexicon/Lexicon-e94eff39fad7.json'

### Test out Cloud Spech API

In [23]:
import io
import os

# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

# Instantiates a client
client = speech.SpeechClient()

# The name of the audio file to transcribe
dev_file_name = os.path.join(
    os.getcwd(),
    'RNN-Tutorial-master',
    'data',
    'raw',
    'librivox',
    'LibriSpeech',
    'dev-clean-wav',
    '777-126732-0068.wav')

# Loads the audio into memory
with io.open(dev_file_name, 'rb') as audio_file:
    content = audio_file.read()
    audio = types.RecognitionAudio(content=content)

config = types.RecognitionConfig(
    encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    language_code='en-US',
    enable_word_time_offsets=True)

# Detects speech and words in the audio file
operation = client.long_running_recognize(config, audio)

print('Waiting for operation to complete...')
result = operation.result(timeout=90)

alternatives = result.results[0].alternatives
for alternative in alternatives:
    print('Transcript: {}'.format(alternative.transcript))
    print('Confidence Score: {}'.format(alternative.confidence))

    for word_info in alternative.words:
        word = word_info.word
        start_time = word_info.start_time
        end_time = word_info.end_time
        start = start_time.seconds + start_time.nanos * 1e-9
        end = end_time.seconds + end_time.nanos * 1e-9
        delta = end - start
        
        print('Word: {}, start_time: {}, end_time: {}, total_time: {}'.format(
            word,
            start,
            end,
            delta))

Waiting for operation to complete...
Transcript: the boy hears too much of what is talked about here
Confidence Score: 0.9152038097381592
Word: the, start_time: 0.1, end_time: 0.5, total_time: 0.4
Word: boy, start_time: 0.5, end_time: 0.7000000000000001, total_time: 0.20000000000000007
Word: hears, start_time: 0.7000000000000001, end_time: 1.1, total_time: 0.4
Word: too, start_time: 1.1, end_time: 1.2, total_time: 0.09999999999999987
Word: much, start_time: 1.2, end_time: 1.4, total_time: 0.19999999999999996
Word: of, start_time: 1.4, end_time: 1.5, total_time: 0.10000000000000009
Word: what, start_time: 1.5, end_time: 1.6, total_time: 0.10000000000000009
Word: is, start_time: 1.6, end_time: 1.7000000000000002, total_time: 0.10000000000000009
Word: talked, start_time: 1.7000000000000002, end_time: 2.0, total_time: 0.2999999999999998
Word: about, start_time: 2.0, end_time: 2.1, total_time: 0.10000000000000009
Word: here, start_time: 2.1, end_time: 2.5, total_time: 0.3999999999999999
