# Lexicon - Orchestrator


## Overview

For this project, I will build a simple custom ochestrator that processes data objects from the "Lexicon" class.
    - These objects are custom datasets that are modeled after the Ted Talk speakers. 
    - Each Lexicon has a corpus and some helper methods aimed at training and prediction
    - Lexicon class will also have a preprocessing and caching function.
    - Each object will have two methods of prediction, n-gram language model and a recurrent neural network model
    - Each object has a custom reporting function that reports the results of training
    - Each object will be able to learn from any text data provided, and return a transcript with confidence values from input posed in speech utterances. 
        - I will use Google's cloud-based services to preprocess the input audio data and transcribe into an initial guess. Then I will train a model to improve on Google cloud speech API's response.


In [1]:
from urllib.request import urlretrieve
from os.path import isfile, isdir
from tqdm import tqdm
import tarfile

librispeech_dataset_folder_path = 'LibriSpeech'
tar_gz_path = 'dev-clean.tar.gz'

books_path = 'original-books.tar.gz'

class DLProgress(tqdm):
    last_block = 0

    def hook(self, block_num=1, block_size=1, total_size=None):
        self.total = total_size
        self.update((block_num - self.last_block) * block_size)
        self.last_block = block_num

if not isfile(books_path):
    with DLProgress(unit='B', unit_scale=True, miniters=1, desc='Librispeech Book Texts') as pbar:
        urlretrieve(
            'http://www.openslr.org/resources/12/original-books.tar.gz',
            books_path,
            pbar.hook)

if not isdir(librispeech_dataset_folder_path+'/books'):
    with tarfile.open(books_path) as tar:
        tar.extractall()
        tar.close()
        
        
        
if not isfile(tar_gz_path):
    with DLProgress(unit='B', unit_scale=True, miniters=1, desc='Librispeech dev-clean.tar.gz') as pbar:
        urlretrieve(
            'http://www.openslr.org/resources/12/dev-clean.tar.gz',
            tar_gz_path,
            pbar.hook)

if not isdir(librispeech_dataset_folder_path):
    with tarfile.open(tar_gz_path) as tar:
        tar.extractall()
        tar.close()
        
        
        

In [2]:
# Prepare a plain text corpus from which we train a languague model
import glob
import os
import utils

# Gather all text files from directory
LIBRISPEECH_DIRECTORY = os.path.join(os.getcwd(),'LibriSpeech/')
TEDLIUM_DIRECTORY = os.path.join(os.getcwd(),'TEDLIUM_release1/')

# TRAINING_DIRECTORY = os.path.abspath(os.path.join(os.sep,'Volumes',"My\ Passport\ for\ Mac",'lexicon','LibriSpeech'))
dev_librispeech_path = "{}{}{}{}".format(LIBRISPEECH_DIRECTORY, 'dev-clean/', '**/', '*.txt*')
train_librispeech_path = "{}{}{}{}{}".format(LIBRISPEECH_DIRECTORY, 'books/', 'utf-8/', '**/', '*.txt*')
TED_path = "{}{}{}{}".format(TEDLIUM_DIRECTORY,'train/','**/', '*.stm')

text_paths = sorted(glob.glob(train_librispeech_path, recursive=True))
segmented_text_paths = sorted(glob.glob(dev_librispeech_path, recursive=True))
stm_paths = sorted(glob.glob(TED_path, recursive=True))

print('Found:',len(text_paths),"text files in the directories {0}\n{1} segmented text files in the {2} directory and \n{3} stm files in directory: {4}:".format(train_librispeech_path, 
        len(segmented_text_paths), dev_librispeech_path, len(stm_paths),TED_path ))

Found: 41 text files in the directories /src/lexicon/LibriSpeech/books/utf-8/**/*.txt*
97 segmented text files in the /src/lexicon/LibriSpeech/dev-clean/**/*.txt* directory and 
774 stm files in directory: /src/lexicon/TEDLIUM_release1/train/**/*.stm:


### Build Text Corpuses for Training

In [3]:
import tensorflow as tf
import re
import codecs
import string
from lexicon import Lexicon
from speech import Speech
      
corpus_raw = u""
stm_segments = []
speakers = []
lexicons = {} # {speaker_id: lexicon_object}
speeches = {} # {speech_id: speech_object}
segmented_librispeeches = {}

for book_filename in text_paths:
    with codecs.open(book_filename, "r", "utf-8") as book_file:
        lines = book_file.read()
        corpus_raw += lines
            
        
for stm_filename in stm_paths: # Process STM files (Tedlium)
        stm_segments.append(utils.parse_stm_file(stm_filename))

for segments in stm_segments[:10]:
    for segment in segments:
        segment_key = "{0}_{1}_{2}".format(segment.speaker_id.strip(), str(segment.start_time).replace('.','_'),
                                          str(segment.stop_time).replace('.','_'))

        speech = None
        # If not already exist
        if segment.speaker_id not in speeches.keys():
            # Connect to Cloud API to get Candidate Transcripts
            source_file = os.path.join(os.getcwd(), 'TEDLIUM_release1', 'train','sph', '{}.sph'.format(segment.filename))
            speech = Speech(speaker_id=segment.speaker_id,
                                           speech_id = segment_key,
                                           source_file=source_file,
                                           ground_truth = ' '.join(segment.transcript.split()[:-1]),
                                           start = segment.start_time,
                                           stop = segment.stop_time,
                                           audio_type = 'LINEAR16')
        else:
            speech = speeches[segment.speaker_id.strip()]
            print('Already found speech in list at location: ', speech)
        
        speeches[segment_key] = speech
        
        
        # Add Lexicon to list if not already exists
        lexicon = None
        if segment.speaker_id not in lexicons.keys():
            lexicon = Lexicon(base_corpus=corpus_raw, name=segment.speaker_id)
            lexicons[segment.speaker_id.strip()] = lexicon
        else:
            lexicon = lexicons[segment.speaker_id.strip()]
        
        # Add Speech to Lexicon
        if speech not in lexicon.speeches:
            lexicon.add_speech(speech)


### Build Speech Objects from Librispeech Dataset for Training

In [4]:
# Build Speech Objects from Librispeech Dataset for Training
for transcript_file in segmented_text_paths:
    #print(transcript_file)
    with open(transcript_file,"r") as filep:   
        for i,line in enumerate(filep):
            # extracting the text sentence from each line
            speech_id, transcript = line.split()[0], " ".join(line.split()[1:])
            speaker_id, transcript_id, _ = speech_id.split('-')
            librispeech = None
            # If speech not already exist
            if speech_id not in segmented_librispeeches.keys():
                # Connect to Cloud API to get Candidate Transcripts
                source_file = os.path.join(os.getcwd(), LIBRISPEECH_DIRECTORY, 'dev-clean',
                                       speaker_id, transcript_id,'{}.flac'.format(speech_id))

                librispeech = Speech(speaker_id=speaker_id,
                                               speech_id = speech_id,
                                               source_file=source_file,
                                               ground_truth = transcript,
                                               start = 0,
                                               stop = 0,
                                               audio_type = 'FLAC')
            else:
                librispeech = segmented_librispeeches[speech_id]
                print('Already found speech in list at location: ', speech)

            # Add Librispeech to Lexicon for Training
            for speaker_id, lexicon in lexicons.items():
                speech_ids = [speech.speech_id for speech in lexicon.speeches]
                if librispeech.speech_id not in speech_ids:
                    lexicon.add_speech(librispeech)

    # # # Print Loading Report for Lexicons
    # for speaker_id, lexicon in lexicons.items():
    #     lexicon.print_loading_report()

    # #Preprocess and Save Data
    # for speaker_id, lexicon in lexicons.items():
    #     lexicon.preprocess_and_save()


In [5]:
for speaker_id, lexicon in lexicons.items():
    lexicon.preprocess_and_save()
    for speech in lexicon.speeches:
        speech.preprocess_and_save()

### Load Preprocessed Data

In [6]:
# speaker_list = list(lexicons)
# lexicon = speaker_list[0]
# lexicon.print_loading_report()

# print(lexicon.speech_corpus)

In [7]:
# import helper
# import numpy as np

# for speaker_id, lexicon in lexicons.items():
#     cache_file = os.path.join(os.getcwd(), 'datacache', 'lexicon_objects',
#                                        '{}_preprocess.p'.format(speaker_id.strip()))
#     (name,
#      base_corpus,
#      full_corpus,
#      int_text, 
#      vocab_to_int, 
#      int_to_vocab)  = Lexicon.load_preprocess(cache_file)
#     print("{0} int_text length: {1}".format(name, len(int_text)))

### Load Speech Objects

In [8]:
# from speech import Speech
# stm_segments = []

# for stm_filename in stm_paths: # Process STM files (Tedlium)
#         stm_segments.append(utils.parse_stm_file(stm_filename))        

# speakers = []
# speeches = {} # {speech_id: speech_object}

# for segments in stm_segments[:5]:
#     for segment in segments:
#         segment_key = "{0}_{1}_{2}".format(segment.speaker_id.strip(), str(segment.start_time).replace('.','_'),
#                                           str(segment.stop_time).replace('.','_'))

#         speech = None
#         # If not already exist
#         if segment.speaker_id not in speeches.keys():
#             # Connect to Cloud API to get Candidate Transcripts
#             source_file = os.path.join(os.getcwd(), 'TEDLIUM_release1', 'train','sph', '{}.sph'.format(segment.filename))
#             speech = Speech(speaker_id=segment.speaker_id,
#                                            speech_id = segment_key,
#                                            source_file=source_file,
#                                            ground_truth = ' '.join(segment.transcript.split()[:-1]),
#                                            start = segment.start_time,
#                                            stop = segment.stop_time,
#                                            audio_type = 'LINEAR16')
#         else:
#             speech = speeches[segment.speaker_id.strip()]
#             print('Already found speech in list at location: ', speech)
        
#         speeches[segment_key] = speech


### Load GCS Transcripts using GCS Wrapper

In [9]:
from gcs_api_wrapper import GCSWrapper

cache_directory = os.path.join(os.getcwd(), 'datacache', 'speech_objects')
libri_gcs = GCSWrapper(encoding='FLAC')
for speech_id, librispeech in segmented_librispeeches.items():
    try:
        for speech_id, speech in test_speeches.items():
            # Not already saved in prepocess cache
            cache_file = os.path.join(cache_directory,'{}_preprocess.p'.format(speech.speech_id))
            if not os.path.exists(cache_file): 
                result = gcs.transcribe_speech(speech.audio_file)
                speech.populate_gcs_results(result)
                # Print Loading Report
                speech.preprocess_and_save()
    except:
        print('An InvalidArgumentError occured trying to send audio to GCS.')

gcs = GCSWrapper()
for speech_id, speech in speeches.items():
    try:
        # Not already saved in prepocess cache
        cache_file = os.path.join(cache_directory,'{}_preprocess.p'.format(speech.speech_id)
        if not os.path.exists(cache_file)): 
            result = gcs.transcribe_speech(speech.audio_file)
            speech.populate_gcs_results(result)
            # Print Loading Report
            speech.preprocess_and_save()
    except:
        print('An InvalidArgumentError occured trying to send audio to GCS.')



In [10]:
import pickle 
speech_cache_paths = []
lexicon_cache_paths = []
for speaker_id, lexicon in lexicons.items():
    lexicon_cache_path = lexicon.preprocess_and_save()
    lexicon_cache_paths.append(lexicon_cache_path)
    for speech in lexicon.speeches:
        cache_path =speech.preprocess_and_save()
        speech_cache_paths.append({lexicon.name:cache_path})
pickle.dump((lexicon_cache_paths, speech_cache_paths), open('cache_paths_preprocess.p', 'wb'))

In [11]:
print(lexicon_cache_paths)

[None, None, None, None, None, None, None, None, None, None]


In [12]:
print(speech_cache_paths)

[{'AditiShankardass_2009I': '/src/lexicon/datacache/speech_objects/AditiShankardass_2009I_106_72_120_57_preprocess.p'}, {'AditiShankardass_2009I': '/src/lexicon/datacache/speech_objects/AditiShankardass_2009I_121_34_130_71_preprocess.p'}, {'AditiShankardass_2009I': '/src/lexicon/datacache/speech_objects/AditiShankardass_2009I_131_57_140_46_preprocess.p'}, {'AditiShankardass_2009I': '/src/lexicon/datacache/speech_objects/AditiShankardass_2009I_141_16_151_52_preprocess.p'}, {'AditiShankardass_2009I': '/src/lexicon/datacache/speech_objects/AditiShankardass_2009I_152_91_165_68_preprocess.p'}, {'AditiShankardass_2009I': '/src/lexicon/datacache/speech_objects/AditiShankardass_2009I_166_34_174_06_preprocess.p'}, {'AditiShankardass_2009I': '/src/lexicon/datacache/speech_objects/AditiShankardass_2009I_17_09_22_65_preprocess.p'}, {'AditiShankardass_2009I': '/src/lexicon/datacache/speech_objects/AditiShankardass_2009I_174_64_185_49_preprocess.p'}, {'AditiShankardass_2009I': '/src/lexicon/datacach

In [50]:
    def token_lookup():
        """
        Generate a dict to turn punctuation into a token.
        :return: Tokenize dictionary where the key is the punctuation and the value is the token
        """
        return {
            ',': '',
            '(1)': '',
            '(2)': '',
            '(3)': '',
            '(4)': '',
            '(5)': '',
            '(6)': '',
            '(7)': '',
            '(8)': '',
            '(9)': '',
            '"': '',
            ';': '',
            '!': '',
            '?': '',
            '*': '',
            '--': '',
            '{NOISE}': '',
            '{noise}': '',
            '{BREATH}': '',
            '{breath}': '',
            '{UH}': '',
            '{uh}': '',
            '{SMACK}': '',
            '{smack}': '',
            '{COUGH}': '',
            '{cough}': '',
            '<sil>': ','
        }

In [None]:
transcript_raw = u""
targets_raw = u""
    
for speech_dict in speech_cache_paths:
    speech_path = list(speech_dict.values())[0]
    if os.path.exists(speech_path):
        (_speech_id,
         _speaker_id,
         _source_file,
         _audio_file,
         _candidate_transcripts,
         _candidate_timestamps,
         _audio_type,
         _sample_rate, 
         _start_time, 
         _stop_time, 
         _ground_truth_transcript) = Speech.load_preprocess(speech_path)
        

        speech = Speech(_speaker_id,
         _speech_id,
         _source_file,
         _ground_truth_transcript,
         _start_time,
         _stop_time,
         _audio_type,
         _sample_rate)

        
        for canidate_transcript in speech.candidate_transcripts:
            transcript_raw += canidate_transcript["transcript"]+'\n'
            targets_raw += speech.ground_truth_transcript+'\n'
        
        for sent in corpus_raw.split('.'):
            transcript_raw += sent+'\n'
            targets_raw += sent+'\n'


corp_file = open(os.path.join(os.getcwd(),"source_corp.txt"), "w", encoding="utf-8")
transcript_raw = transcript_raw.encode('ascii', 'ignore')
transcript_raw = transcript_raw.decode("utf-8")

token_dict = token_lookup()
for key, token in token_dict.items():
    transcript_raw = transcript_raw.replace(key, ' {} '.format(token))

transcript_raw = transcript_raw.lower()


corp_file.write(transcript_raw)
corp_file.close



corp_file = open(os.path.join(os.getcwd(),"target_corp.txt"), "w", encoding="utf-8")
targets_raw = targets_raw.encode('ascii', 'ignore')
targets_raw = targets_raw.decode("utf-8")


token_dict = token_lookup()
for key, token in token_dict.items():
    targets_raw = targets_raw.replace(key, ' {} '.format(token))

targets_raw = targets_raw.lower()


corp_file.write(targets_raw)
corp_file.close

In [None]:
import helper
lex = list(lexicons.values())[0]
source_path = os.path.join(os.getcwd(),"source_corp.txt")
target_path = os.path.join(os.getcwd(),"target_corp.txt")
source_text = helper.load_data(source_path)
target_text = helper.load_data(target_path)

In [None]:
view_sentence_range = (0, 10)

"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in source_text.split()})))

sentences = source_text.split('\n')
word_counts = [len(sentence.split()) for sentence in sentences]
print('Number of sentences: {}'.format(len(sentences)))
print('Average number of words in a sentence: {}'.format(np.average(word_counts)))

print()
print('Transcript sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(source_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))
print()
print('Ground Truth sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(target_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

In [None]:
def text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int):
    """
    Convert source and target text to proper word ids
    :param source_text: String that contains all the source text.
    :param target_text: String that contains all the target text.
    :param source_vocab_to_int: Dictionary to go from the source words to an id
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :return: A tuple of lists (source_id_text, target_id_text)
    """
    # TODO: Implement Function
    
    
    # source_id_text and target_id_text are a list of lists where each list represent a line. 
    # That's why we use a first split('\n')] (not written in the statements)
    source_list = [sentence for sentence in source_text.split('\n')]
    target_list = [sentence for sentence in target_text.split('\n')]
    
    # Filling the lists
    source_id_text = list()
    target_id_text = list()
    for i in range(len(source_list)):
        source_id_text_temp = list()
        target_id_text_temp = list()
        for word in source_list[i].split():
            source_id_text_temp.append(source_vocab_to_int[word])
        for word in target_list[i].split():
            target_id_text_temp.append(target_vocab_to_int[word])
        # We need to add EOS for target    
        target_id_text_temp.append(target_vocab_to_int['<EOS>'])
        source_id_text.append(source_id_text_temp)
        target_id_text.append(target_id_text_temp)
              
    return source_id_text, target_id_text

In [None]:
import os
import pickle
import copy
import numpy as np
from tensorflow.python.layers.core import Dense
CODES = {'<PAD>': 0, '<EOS>': 1, '<UNK>': 2, '<GO>': 3 }

def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    """
    vocab = set(text.split())
    vocab_to_int = copy.copy(CODES)

    for v_i, v in enumerate(vocab, len(CODES)):
        vocab_to_int[v] = v_i

    int_to_vocab = {v_i: v for v, v_i in vocab_to_int.items()}

    return vocab_to_int, int_to_vocab


In [None]:
import helper

def preprocess_and_save_data(source_path, target_path, text_to_ids):
    source_text = helper.load_data(source_path)
    target_text = helper.load_data(target_path)

    source_text = source_text.lower()
    target_text = target_text.lower()

    source_vocab_to_int, source_int_to_vocab = create_lookup_tables(source_text)
    target_vocab_to_int, target_int_to_vocab = create_lookup_tables(target_text)

    source_text, target_text = text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int)

    # Save Data
    with open('preprocess.p', 'wb') as out_file:
        pickle.dump((
            (source_text, target_text),
            (source_vocab_to_int, target_vocab_to_int),
            (source_int_to_vocab, target_int_to_vocab)), out_file)

In [None]:
preprocess_and_save_data(source_path, target_path, text_to_ids)

In [None]:
import numpy as np
import helper

(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess()

In [None]:
def model_inputs():
    """
    Create TF Placeholders for input, targets, learning rate, and lengths of source and target sequences.
    :return: Tuple (input, targets, learning rate, keep probability, target sequence length,
    max target sequence length, source sequence length)
    """
    
    inputs = tf.placeholder(tf.int32,[None,None], name = "input")
    targets = tf.placeholder(tf.int32,[None,None], name = "target")
    learning_rate = tf.placeholder(tf.float32, name = "learning_rate")
    keep_probability = tf.placeholder(tf.float32, name = "keep_prob")
    target_sequence_length = tf.placeholder(tf.int32,[None], name = "target_sequence_length")
    max_target_sequence_length = tf.reduce_max(target_sequence_length, name = "max_target_len")
    source_sequence_length = tf.placeholder(tf.int32, [None], name = "source_sequence_length")
    return inputs, targets, learning_rate, keep_probability, target_sequence_length, max_target_sequence_length, source_sequence_length


In [None]:
def process_decoder_input(target_data, target_vocab_to_int, batch_size):
    """
    Preprocess target data for encoding
    :param target_data: Target Placehoder
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :param batch_size: Batch Size
    :return: Preprocessed target data
    """
    # TODO: Implement Function
    
    #removing the last word id from each batch in target_data 
    print(target_data)
    target_data = tf.strided_slice(target_data,[0,0],[batch_size,-1],[1,1] )
    #target_data = tf.strided_slice(target_data,[0,0],[int(target_data.shape[0]),int(target_data.shape[1]-1)],[1,1] )
    
    # concat the GO ID to the begining of each batch
    decoder_input = tf.concat([tf.fill([batch_size,1],target_vocab_to_int['<GO>']),target_data],1)
        
    return decoder_input

In [None]:
from imp import reload

def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob, 
                   source_sequence_length, source_vocab_size, 
                   encoding_embedding_size):
    """
    Create encoding layer
    :param rnn_inputs: Inputs for the RNN
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param keep_prob: Dropout keep probability
    :param source_sequence_length: a list of the lengths of each sequence in the batch
    :param source_vocab_size: vocabulary size of source data
    :param encoding_embedding_size: embedding size of source data
    :return: tuple (RNN output, RNN state)
    """
    # TODO: Implement Function
    
    # Embed the encoder input using tf.contrib.layers.embed_sequence
    inputs_embeded = tf.contrib.layers.embed_sequence(
                                    ids = rnn_inputs,
                                    vocab_size = source_vocab_size,
                                    embed_dim = encoding_embedding_size)
    
    # Construct a stacked tf.contrib.rnn.LSTMCell wrapped in a tf.contrib.rnn.DropoutWrapper
    cell = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.LSTMCell(rnn_size) for _ in range(num_layers) ])
    # cell_dropout = tf.contrib.rnn.DropoutWrapper(cell, keep_prob)
    
    # Pass cell and embedded input to tf.nn.dynamic_rnn()
    RNN_output, RNN_state = tf.nn.dynamic_rnn(
                                cell = cell,
                                inputs = inputs_embeded,
                                sequence_length = source_sequence_length,
                                dtype = tf.float32)
    
    return RNN_output, RNN_state

In [None]:

def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, 
                         target_sequence_length, max_summary_length, 
                         output_layer, keep_prob):
    """
    Create a decoding layer for training
    :param encoder_state: Encoder State
    :param dec_cell: Decoder RNN Cell
    :param dec_embed_input: Decoder embedded input
    :param target_sequence_length: The lengths of each sequence in the target batch
    :param max_summary_length: The length of the longest sequence in the batch
    :param output_layer: Function to apply the output layer
    :param keep_prob: Dropout keep probability
    :return: BasicDecoderOutput containing training logits and sample_id
    """
    # TODO: Implement Function
    
    # Create a tf.contrib.seq2seq.TrainingHelper
    training_helper = tf.contrib.seq2seq.TrainingHelper(
                                            inputs = dec_embed_input,
                                            sequence_length = target_sequence_length)
    
    # Create a tf.contrib.seq2seq.BasicDecoder
    basic_decoder = tf.contrib.seq2seq.BasicDecoder(
                                            cell = dec_cell,
                                            helper = training_helper,
                                            initial_state = encoder_state,
                                            output_layer = output_layer)
    
    # Obtain the decoder outputs from tf.contrib.seq2seq.dynamic_decode
    BasicDecoderOutput = tf.contrib.seq2seq.dynamic_decode(
                                            decoder = basic_decoder,
                                            impute_finished = True,
                                            maximum_iterations = max_summary_length 
                                            )

    return BasicDecoderOutput[0]

In [None]:
def decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id,
                         end_of_sequence_id, max_target_sequence_length,
                         vocab_size, output_layer, batch_size, keep_prob):
    """
    Create a decoding layer for inference
    :param encoder_state: Encoder state
    :param dec_cell: Decoder RNN Cell
    :param dec_embeddings: Decoder embeddings
    :param start_of_sequence_id: GO ID
    :param end_of_sequence_id: EOS Id
    :param max_target_sequence_length: Maximum length of target sequences
    :param vocab_size: Size of decoder/target vocabulary
    :param decoding_scope: TenorFlow Variable Scope for decoding
    :param output_layer: Function to apply the output layer
    :param batch_size: Batch size
    :param keep_prob: Dropout keep probability
    :return: BasicDecoderOutput containing inference logits and sample_id
    """
    # TODO: Implement Function
    
    # creates a new tensor by replicating start_of_sequence_id batch_size times.
    start_tokens = tf.tile(tf.constant([start_of_sequence_id],dtype = tf.int32),[batch_size], name = 'start_tokens' )
        
    # Create a tf.contrib.seq2seq.GreedyEmbeddingHelper
    embedding_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(
                                embedding = dec_embeddings,
                                start_tokens = start_tokens, 
                                end_token = end_of_sequence_id)
    
    # Create a tf.contrib.seq2seq.BasicDecoder
    basic_decoder = tf.contrib.seq2seq.BasicDecoder(
                                                cell = dec_cell,
                                                helper = embedding_helper,
                                                initial_state = encoder_state,
                                                output_layer = output_layer)
    
    # Obtain the decoder outputs from tf.contrib.seq2seq.dynamic_decode
    BasicDecoderOutput = tf.contrib.seq2seq.dynamic_decode(
                                                decoder = basic_decoder,
                                                impute_finished = True,
                                                maximum_iterations = max_target_sequence_length)

    return BasicDecoderOutput[0]

In [None]:
def decoding_layer(dec_input, encoder_state,
                   target_sequence_length, max_target_sequence_length,
                   rnn_size,
                   num_layers, target_vocab_to_int, target_vocab_size,
                   batch_size, keep_prob, decoding_embedding_size):
    """
    Create decoding layer
    :param dec_input: Decoder input
    :param encoder_state: Encoder state
    :param target_sequence_length: The lengths of each sequence in the target batch
    :param max_target_sequence_length: Maximum length of target sequences
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :param target_vocab_size: Size of target vocabulary
    :param batch_size: The size of the batch
    :param keep_prob: Dropout keep probability
    :param decoding_embedding_size: Decoding embedding size
    :return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)
    """
    # TODO: Implement Function
    
    # Embed the target sequences
    dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)
    
    # Construct the decoder LSTM cell (just like you constructed the encoder cell above)
    cell = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.LSTMCell(rnn_size) for _ in range(num_layers) ])
    cell_dropout = tf.contrib.rnn.DropoutWrapper(cell, keep_prob)
    
    # Create an output layer to map the outputs of the decoder to the elements of our vocabulary
    output_layer = Dense(target_vocab_size)
                        
    
    # Use the your decoding_layer_train(encoder_state, dec_cell, dec_embed_input, target_sequence_length, 
    # max_target_sequence_length, output_layer, keep_prob) function to get the training logits.
    with tf.variable_scope("decode"):
        Training_BasicDecoderOutput = decoding_layer_train(encoder_state, 
                                                       cell_dropout, 
                                                       dec_embed_input, 
                                                       target_sequence_length, 
                                                       max_target_sequence_length, 
                                                       output_layer, 
                                                       keep_prob)
    
    # Use your decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, 
    # end_of_sequence_id, max_target_sequence_length, vocab_size, output_layer, batch_size, keep_prob) 
    # function to get the inference logits.
    with tf.variable_scope("decode", reuse=True):
        Inference_BasicDecoderOutput = decoding_layer_infer(encoder_state, 
                                                        cell_dropout, 
                                                        dec_embeddings, 
                                                        target_vocab_to_int['<GO>'], 
                                                        target_vocab_to_int['<EOS>'],
                                                        max_target_sequence_length, 
                                                        target_vocab_size,
                                                        output_layer,
                                                        batch_size, 
                                                        keep_prob)
    return Training_BasicDecoderOutput, Inference_BasicDecoderOutput

In [None]:
def seq2seq_model(input_data, target_data, keep_prob, batch_size,
                  source_sequence_length, target_sequence_length,
                  max_target_sentence_length,
                  source_vocab_size, target_vocab_size,
                  enc_embedding_size, dec_embedding_size,
                  rnn_size, num_layers, target_vocab_to_int):
    """
    Build the Sequence-to-Sequence part of the neural network
    :param input_data: Input placeholder
    :param target_data: Target placeholder
    :param keep_prob: Dropout keep probability placeholder
    :param batch_size: Batch Size
    :param source_sequence_length: Sequence Lengths of source sequences in the batch
    :param target_sequence_length: Sequence Lengths of target sequences in the batch
    : max_target_sentence_length,
    :param source_vocab_size: Source vocabulary size
    :param target_vocab_size: Target vocabulary size
    :param enc_embedding_size: Decoder embedding size
    :param dec_embedding_size: Encoder embedding size
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)
    """
    # TODO: Implement Function
    
    # Encode the input using your encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob,  source_sequence_length, source_vocab_size, encoding_embedding_size).
    rnn_output , rnn_state = encoding_layer(input_data, 
                   rnn_size, 
                   num_layers, 
                   keep_prob, 
                   source_sequence_length, 
                   source_vocab_size, 
                   enc_embedding_size)
    
    # Process target data using your process_decoder_input(target_data, target_vocab_to_int, batch_size) function.
    decoder_input = process_decoder_input(target_data,
                                        target_vocab_to_int,
                                        batch_size)
    
    # Decode the encoded input using your decoding_layer(dec_input, enc_state, target_sequence_length, max_target_sentence_length, 
    # rnn_size, num_layers, target_vocab_to_int, target_vocab_size, batch_size, keep_prob, dec_embedding_size) function.
    Training_BasicDecoderOutput, Inference_BasicDecoderOutput = decoding_layer(
                                        decoder_input,
                                        rnn_state,
                                        target_sequence_length,
                                        max_target_sentence_length,
                                        rnn_size,
                                        num_layers,
                                        target_vocab_to_int,
                                        target_vocab_size,
                                        batch_size,
                                        keep_prob,
                                        dec_embedding_size)
    
    return Training_BasicDecoderOutput, Inference_BasicDecoderOutput

In [None]:
# Number of Epochs
epochs = 10
# Batch Size
batch_size = 64
# RNN Size
rnn_size = 512
# Number of Layers
num_layers = 1

encoding_embedding_size = 256
decoding_embedding_size = 256
# Learning Rate
learning_rate = 0.0005
# Dropout Keep Probability
keep_probability = 0.75
display_step = 100

In [None]:
save_path = 'checkpoints/dev'
(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess()
max_target_sentence_length = max([len(sentence) for sentence in source_int_text])

train_graph = tf.Graph()
with train_graph.as_default():
    input_data, targets, lr, keep_prob, target_sequence_length, max_target_sequence_length, source_sequence_length = model_inputs()

    #sequence_length = tf.placeholder_with_default(max_target_sentence_length, None, name='sequence_length')
    input_shape = tf.shape(input_data)

    train_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                   targets,
                                                   keep_prob,
                                                   batch_size,
                                                   source_sequence_length,
                                                   target_sequence_length,
                                                   max_target_sequence_length,
                                                   len(source_vocab_to_int),
                                                   len(target_vocab_to_int),
                                                   encoding_embedding_size,
                                                   decoding_embedding_size,
                                                   rnn_size,
                                                   num_layers,
                                                   target_vocab_to_int)


    training_logits = tf.identity(train_logits.rnn_output, name='logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')

    masks = tf.sequence_mask(target_sequence_length, max_target_sequence_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
        # Monitor gradient
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)


In [None]:
def pad_sentence_batch(sentence_batch, pad_int):
    """Pad sentences with <PAD> so that each sentence of a batch has the same length"""
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [pad_int] * (max_sentence - len(sentence)) for sentence in sentence_batch]


def get_batches(sources, targets, batch_size, source_pad_int, target_pad_int):
    """Batch targets, sources, and the lengths of their sentences together"""
    for batch_i in range(0, len(sources)//batch_size):
        start_i = batch_i * batch_size

        # Slice the right amount for the batch
        sources_batch = sources[start_i:start_i + batch_size]
        targets_batch = targets[start_i:start_i + batch_size]

        # Pad
        pad_sources_batch = np.array(pad_sentence_batch(sources_batch, source_pad_int))
        pad_targets_batch = np.array(pad_sentence_batch(targets_batch, target_pad_int))

        # Need the lengths for the _lengths parameters
        pad_targets_lengths = []
        for target in pad_targets_batch:
            pad_targets_lengths.append(len(target))

        pad_source_lengths = []
        for source in pad_sources_batch:
            pad_source_lengths.append(len(source))

        yield pad_sources_batch, pad_targets_batch, pad_source_lengths, pad_targets_lengths


In [None]:
def get_accuracy(target, logits):
    """
    Calculate accuracy
    """
    max_seq = max(target.shape[1], logits.shape[1])
    if max_seq - target.shape[1]:
        target = np.pad(
            target,
            [(0,0),(0,max_seq - target.shape[1])],
            'constant')
    if max_seq - logits.shape[1]:
        logits = np.pad(
            logits,
            [(0,0),(0,max_seq - logits.shape[1])],
            'constant')

    return np.mean(np.equal(target, logits))

# Split data to training and validation sets
train_source = source_int_text[batch_size:]
train_target = target_int_text[batch_size:]
valid_source = source_int_text[:batch_size]
valid_target = target_int_text[:batch_size]
(valid_sources_batch, valid_targets_batch, valid_sources_lengths, valid_targets_lengths ) = next(get_batches(valid_source,
                                                                                                             valid_target,
                                                                                                             batch_size,
                                                                                                             source_vocab_to_int['<PAD>'],
                                                                                                             target_vocab_to_int['<PAD>']))                                                                                                  
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i, (source_batch, target_batch, sources_lengths, targets_lengths) in enumerate(
                get_batches(train_source, train_target, batch_size,
                            source_vocab_to_int['<PAD>'],
                            target_vocab_to_int['<PAD>'])):
            _, loss = sess.run(
                [train_op, cost],
                {input_data: source_batch,
                 targets: target_batch,
                 lr: learning_rate,
                 target_sequence_length: targets_lengths,
                 source_sequence_length: sources_lengths,
                 keep_prob: keep_probability})


            if batch_i % display_step == 0 and batch_i > 0:


                batch_train_logits = sess.run(
                    inference_logits,
                    {input_data: source_batch,
                     source_sequence_length: sources_lengths,
                     target_sequence_length: targets_lengths,
                     keep_prob: 1.0})


                batch_valid_logits = sess.run(
                    inference_logits,
                    {input_data: valid_sources_batch,
                     source_sequence_length: valid_sources_lengths,
                     target_sequence_length: valid_targets_lengths,
                     keep_prob: 1.0})

                train_acc = get_accuracy(target_batch, batch_train_logits)
                valid_acc = get_accuracy(valid_targets_batch, batch_valid_logits)
                print('Epoch {:>3} Batch {:>4}/{} - Train Accuracy: {:>6.4f}, Validation Accuracy: {:>6.4f}, Loss: {:>6.4f}'
                      .format(epoch_i, batch_i, len(source_int_text) // batch_size, train_acc, valid_acc, loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_path)
    print('Model Trained and Saved')

In [None]:
# Save parameters for checkpoint
helper.save_params(save_path)

In [None]:
import tensorflow as tf
import numpy as np
import helper

_, (source_vocab_to_int, target_vocab_to_int), (source_int_to_vocab, target_int_to_vocab) = helper.load_preprocess()
load_path = helper.load_params()

In [None]:
def sentence_to_seq(sentence, vocab_to_int):
    """
    Convert a sentence to a sequence of ids
    :param sentence: String
    :param vocab_to_int: Dictionary to go from the words to an id
    :return: List of word ids
    """
    

    # TODO: Implement Function
    
    # Convert the sentence to lowercase and to list
    list_words = [word for word in sentence.lower().split() ]
    
    # Convert words into ids using vocab_to_int
    list_words_int = list()
    for word in list_words:
        # Convert words not in the vocabulary, to the <UNK> word id.
        if word not in vocab_to_int:
            list_words_int.append(vocab_to_int['<UNK>'])
        else:
            list_words_int.append(vocab_to_int[word])
    return list_words_int


### Build Test Set

In [52]:
# Use other TED speeches for building test set
test_speeches = {}
for segments in stm_segments[20:25]:
    for segment in segments:
        segment_key = "{0}_{1}_{2}".format(segment.speaker_id.strip(), str(segment.start_time).replace('.','_'),
                                          str(segment.stop_time).replace('.','_'))

        speech = None
        # If not already exist
        if segment.speaker_id not in test_speeches.keys():
            # Connect to Cloud API to get Candidate Transcripts
            source_file = os.path.join(os.getcwd(), 'TEDLIUM_release1', 'train','sph', '{}.sph'.format(segment.filename))
            speech = Speech(speaker_id=segment.speaker_id,
                                           speech_id = segment_key,
                                           source_file=source_file,
                                           ground_truth = ' '.join(segment.transcript.split()[:-1]),
                                           start = segment.start_time,
                                           stop = segment.stop_time,
                                           audio_type = 'LINEAR16')
        else:
            speech = test_speeches[segment.speaker_id.strip()]
            print('Already found speech in list at location: ', speech)
        
        
        
        test_speeches[segment_key] = speech

In [57]:
gcs = GCSWrapper()
cache_directory = os.path.join(os.getcwd(), 'datacache', 'speech_objects')
for speech_id, speech in test_speeches.items():
    # Not already saved in prepocess cache
    cache_file = os.path.join(cache_directory,'{}_preprocess.p'.format(speech.speech_id))
    if not os.path.exists(cache_file): 
        result = gcs.transcribe_speech(speech.audio_file)
        speech.populate_gcs_results(result)
        # Print Loading Report
        speech.preprocess_and_save()

### Evaluate Model

In [None]:
import nltk

token_dict = token_lookup()
cloud_speech_api_accuracy = []
custom_lang_model_accuracy = []


for speech in list(test_speeches.values()):
    gt_transcript = speech.ground_truth_transcript.lower()
    for key, token in token_dict.items():
        gt_transcript = gt_transcript.replace(key, ' {} '.format(token))
    

    loaded_graph = tf.Graph()
    with tf.Session(graph=loaded_graph) as sess:
        # Load saved model
        loader = tf.train.import_meta_graph(load_path + '.meta')
        loader.restore(sess, load_path)

        input_data = loaded_graph.get_tensor_by_name('input:0')
        logits = loaded_graph.get_tensor_by_name('predictions:0')
        target_sequence_length = loaded_graph.get_tensor_by_name('target_sequence_length:0')
        source_sequence_length = loaded_graph.get_tensor_by_name('source_sequence_length:0')
        keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')

        for candidate_transcript in speech.candidate_transcripts:
            transcription_sentence = sentence_to_seq(candidate_transcript["transcript"], source_vocab_to_int)

            transcription_logits = sess.run(logits, {input_data: [transcription_sentence]*batch_size,
                                                 target_sequence_length: [len(transcription_sentence)*2]*batch_size,
                                                 source_sequence_length: [len(transcription_sentence)]*batch_size,
                                                 keep_prob: 1.0})[0]
            prediction_transcript = " ".join([target_int_to_vocab[i] for i in transcription_logits])
            # Remove <EOS> Token
            prediction_transcript = prediction_transcript.replace('<EOS>','')

            print()
            # print('  Word Ids:      {}'.format([i for i in transcription_sentence]))
            print(' GCS Candidate Transcript: \n{}'.format(" ".join([source_int_to_vocab[i] for i in transcription_sentence])))
            # print('  Word Ids:      {}'.format([i for i in transcription_logits]))
            print('  Seq2Seq Model Prediction Transcript: \n{}'.format(prediction_transcript))
            print('  Ground Truth Transcript: \n{}'.format(gt_transcript))
            print()

            # Compute the Accuracy, based on the Levenshtein Distance (a.k.a. Edit Distance)
            gcs_ed = nltk.edit_distance(candidate_transcript["transcript"].lower(), speech.ground_truth_transcript.lower())
            gcs_upper_bound = max(len(candidate_transcript["transcript"]),len(gt_transcript))
            gcs_accuracy = (1.0 - gcs_ed/gcs_upper_bound)

            clm_ed = nltk.edit_distance(prediction_transcript.lower(), gt_transcript)
            clm_upper_bound = max(len(prediction_transcript),len(gt_transcript))
            clm_accuracy = (1.0 - clm_ed/clm_upper_bound)

            cloud_speech_api_accuracy.append(gcs_accuracy)
            custom_lang_model_accuracy.append(clm_accuracy)
        print('Average Candidate Transcript Accuracy:', np.mean(cloud_speech_api_accuracy))
        print('Average Seq2Seq Model Accuracy:', np.mean(custom_lang_model_accuracy))
        print()


INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
the <UNK> of the brain but we can also <UNK> the <UNK> in <UNK> <UNK> of <UNK> does <UNK> and <UNK> <UNK> <UNK> of oxygen that means that it's possible to map out the activity of the brain
  Seq2Seq Model Prediction Transcript: 
the thing is gorgeous they are the bible says that if two men are in a fight and the wife of my life because i decided to follow those and actually go through they were in my face 
  Ground Truth Transcript: 
the structure of the brain but we can    also measure      the difference    in magnetic properties of blood    that's      oxygenated    and   blood that's depleted of oxygen      that means that it's possible to map      out    the   activity   of the brain          


 GCS Candidate Transcript: 
the <UNK> of the brain but we can also <UNK> the <UNK> in <UNK> <UNK> of <UNK> that's <UNK> and <UNK> <UNK> <UNK> of oxygen that means that it's possible to map out the activi


 GCS Candidate Transcript: 
and friends to <UNK> and this is something that has been approximately 400 <UNK> so far just in the part of <UNK> that i come from that has been going to go and go to both of us in the past 4 years
  Seq2Seq Model Prediction Transcript: 
and it is my time every time for me is what they do that they would say well what can you dressed in the park and i started to realize the hundreds of little things that go right every day and 
  Ground Truth Transcript: 
in the forensic   case and   {um} this is something that    there's been      approximately four hundred cases so far    just in the {um} part of sweden that   i come from    that   has   been   undergoing virtual autopsies in the past four years   


 GCS Candidate Transcript: 
the friends to <UNK> and this is something that has been approximately 400 <UNK> so far just in the part of <UNK> that i come from that has been going to go and go to both of us in the past 4 years
  Seq2Seq Model Prediction Transc


 GCS Candidate Transcript: 
and you know when he <UNK> in i could just tell that he
  Seq2Seq Model Prediction Transcript: 
and you know you end up acting like a crazy person and stoning adulterers 
  Ground Truth Transcript: 
   and you know   {um} when he   walked in i could just   tell that   he   


 GCS Candidate Transcript: 
you know when he <UNK> in i could just tell that you
  Seq2Seq Model Prediction Transcript: 
but it's really interesting because i was doing my year of my 
  Ground Truth Transcript: 
   and you know   {um} when he   walked in i could just   tell that   he   

Average Candidate Transcript Accuracy: 0.545399143041
Average Seq2Seq Model Accuracy: 0.285686926143

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
the <UNK> have been burned to the <UNK> and so this woman this is young son <UNK> i <UNK> we will have <UNK> me of the most that i have learned about race and <UNK>
  Seq2Seq Model Prediction Transcript: 
the second 


 GCS Candidate Transcript: 
you can see that there is <UNK> or <UNK> on the <UNK> of metal in the <UNK> that said that's what does <UNK> are coming from
  Seq2Seq Model Prediction Transcript: 
and it is that they say in the rules that will get you into a little trouble in twenty one st century america and that 
  Ground Truth Transcript: 
you can   see that there is scattering of x rays {um} on      the teeth    the metal    in the teeth that's    where      those artifacts are   coming from      but


 GCS Candidate Transcript: 
you can see that there is <UNK> or <UNK> on the <UNK> of metal in the <UNK> that said that's what it does <UNK> are coming from
  Seq2Seq Model Prediction Transcript: 
and it is that they say in the rules that will get you into a little trouble in twenty one st century america and they focus on 
  Ground Truth Transcript: 
you can   see that there is scattering of x rays {um} on      the teeth    the metal    in the teeth that's    where      those artifacts 


 GCS Candidate Transcript: 
i'm just by changing the functions and i can decide what's going to be <UNK> and what's going to be visible i can look at the the skull <UNK> and i can see that okay this is where they open up the skeleton this woman and that's what i've been in
  Seq2Seq Model Prediction Transcript: 
my behavior changed my thoughts this was i was doing my year because i spent in the park and i started to realize the hundreds of arcane and obscure laws that are in my biblical clothing sandals and a white robe you know because again the outer 
  Ground Truth Transcript: 
and   just by changing the functions then i can   decide what's   going to be transparent    and what's going to be visible      i can      look at      the    skull structure    and   i can   see that okay this is where they opened up the skull    on   this woman    and that's where they went in     


 GCS Candidate Transcript: 
and just by changing the functions and i can decide what's going to be <UNK> a


 GCS Candidate Transcript: 
he's moving his pain at towards the heart and the heart is not beating in front of him so you can see how the <UNK> beating is taking the pain and is moving at two words to heart and he's <UNK> it on the heart and then it <UNK> the <UNK>
  Seq2Seq Model Prediction Transcript: 
they say jesus did talk a lot of the huge is there's my life for me so that if anyone men and they go through these amazing mental gymnastics to accomplish this and i will say though it's just this interaction and you know they were pebbles 
  Ground Truth Transcript: 
and   he   's   moving his   pen    towards   the heart    and the heart is now beating in front of him      so he   can see how the heart is beating    he      's taken the pen    and he   's moving it towards   the heart      and he   's   putting   it on   the heart and   then he   feels the heartbeats     


 GCS Candidate Transcript: 
he's moving his pain at towards the heart and the heart is not beating in front o

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
society in this is called <UNK> the <UNK>
  Seq2Seq Model Prediction Transcript: 
there's me reading the bible that's how i hailed taxi cabs 
  Ground Truth Transcript: 
in society and this is called    swallowing the bitterness


 GCS Candidate Transcript: 
society this is called <UNK> the <UNK>
  Seq2Seq Model Prediction Transcript: 
there's me reading the bible so 
  Ground Truth Transcript: 
in society and this is called    swallowing the bitterness


 GCS Candidate Transcript: 
society and this is called <UNK> the <UNK>
  Seq2Seq Model Prediction Transcript: 
there's me reading the bible that's how i hailed taxi cabs 
  Ground Truth Transcript: 
in society and this is called    swallowing the bitterness


 GCS Candidate Transcript: 
society in this is called a <UNK> the <UNK>
  Seq2Seq Model Prediction Transcript: 
there's me reading the bible that's how i hailed taxi cabs 
  Ground Truth Transc


 GCS Candidate Transcript: 
he would use a bell he had about because he had this <UNK> <UNK> thing that my issue was having <UNK> with a little <UNK> and they was <UNK> then each other that would be his reason i'm just talking about the <UNK> night
  Seq2Seq Model Prediction Transcript: 
and they range from the famous ones that i had heard of a cake versus another guy not wearing clothes of mixed fabrics would be quite in head of the bible and i became a little bit of a better person so 
  Ground Truth Transcript: 
oh he would use   a belt he had a    belt because   he   had this    warped perverted thing that myesha was having sex with her    little brother and they was    fondling   each other that would be his reason i'm   just      talking about the particular   night   


 GCS Candidate Transcript: 
oh he would use a bell he had about because he had this <UNK> <UNK> thing that my issue is having <UNK> with a little <UNK> and they was <UNK> then each other that would be his reason

  Seq2Seq Model Prediction Transcript: 
definitely take my projects seriously but {um} why are the best month of my year because i am a workaholic so having this one day where you cannot work it really 
  Ground Truth Transcript: 
in a more    clinical      situation    there's a youtube that      you can   download and   look at this if you want to convey the   information   to other people about virtual        


 GCS Candidate Transcript: 
any more action <UNK> up a <UNK> that's a youtube video that you can <UNK> and look at this if you want to <UNK> the information to other people about <UNK>
  Seq2Seq Model Prediction Transcript: 
saying that they distort all the data to fit their model and they go through these amazing mental gymnastics to accomplish this and i will say though 
  Ground Truth Transcript: 
in a more    clinical      situation    there's a youtube that      you can   download and   look at this if you want to convey the   information   to other people about virtual

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
we told the <UNK> that we had like lost
  Seq2Seq Model Prediction Transcript: 
there's me reading the bible that's how i hailed taxi cabs 
  Ground Truth Transcript: 
and      we    told a police that   we had like    lost her


 GCS Candidate Transcript: 
we told the <UNK> that we had life lost
  Seq2Seq Model Prediction Transcript: 
we went to the bible that's how i hailed taxi cabs 
  Ground Truth Transcript: 
and      we    told a police that   we had like    lost her


 GCS Candidate Transcript: 
we told the <UNK> that we had like loss
  Seq2Seq Model Prediction Transcript: 
there's me reading the bible that's how i hailed taxi cabs 
  Ground Truth Transcript: 
and      we    told a police that   we had like    lost her


 GCS Candidate Transcript: 
we told the <UNK> that we had life loss
  Seq2Seq Model Prediction Transcript: 
we destroyed the bible is my year as i 
  Ground Truth Transcript: 


 GCS Candidate Transcript: 
no i didn't give you the year but in <UNK> i thought that i was going to go around and find <UNK> <UNK> and pig <UNK> and people like that and i got <UNK> on race <UNK>
  Seq2Seq Model Prediction Transcript: 
but i've become increasingly interested in religion i do i met with creationists i went to me and i like to immerse myself in my topics {um} i just like my corners were so that 
  Ground Truth Transcript: 
you know   i didn't   give you the year but in seventy   nine i thought that i was going to go around and find bull riders and pig farmers and people like that and   i got sidetracked on race relations    finally    i did find a bull rider      two years   ago    and i've    been   going to the rodeos with   him and we've bonded   


 GCS Candidate Transcript: 
no i didn't give you the year but in <UNK> i thought that i was going to go around and find <UNK> <UNK> in pig <UNK> in people like that and i got <UNK> on race <UNK>
  Seq2Seq Model Predictio

Average Candidate Transcript Accuracy: 0.550173827365
Average Seq2Seq Model Accuracy: 0.275538302384

INFO:tensorflow:Restoring parameters from checkpoints/dev
Average Candidate Transcript Accuracy: 0.550173827365
Average Seq2Seq Model Accuracy: 0.275538302384

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
if i push a little bit <UNK> i'll go through the skin and i can feel it in the <UNK> <UNK> inside if i push even <UNK> i'll go through the <UNK> <UNK> <UNK> close to the ear where the <UNK> is very soft and then i can feel the rain inside in the <UNK>
  Seq2Seq Model Prediction Transcript: 
a movement so if you can 't you have a pamphlet that says here 's what jesus said about homosexuality and you open it up and there's nothing in the bible and i became a little bit of a better person and i became a little bit of a better person and 
  Ground Truth Transcript: 
if i push a little bit harder i'll go through the skin and   i can feel    the {um


 GCS Candidate Transcript: 
if i push a little bit <UNK> i'll go through the skin and i can feel it in the <UNK> <UNK> inside if i push even <UNK> i'll go through the <UNK> <UNK> <UNK> close to the ear where the <UNK> is very soft and then i can feel the rain inside in the <UNK>
  Seq2Seq Model Prediction Transcript: 
a movement so if you can 't you have a pamphlet that says here 's what jesus said about homosexuality and you open it up and there's nothing in the bible and i became a little bit of a better person and i became a little bit of a better person and 
  Ground Truth Transcript: 
if i push a little bit harder i'll go through the skin and   i can feel    the {um} bone structure inside    if i push    even harder    i'll go through the bone structure especially   close to the      ear   where   the bone is very soft      and then i can   feel the brain inside and      this   

Average Candidate Transcript Accuracy: 0.558181094564
Average Seq2Seq Model Accuracy: 0.275024981331



 GCS Candidate Transcript: 
<UNK> when they see him <UNK> it is at <UNK> functions like this and even there it is him who <UNK> them
  Seq2Seq Model Prediction Transcript: 
that they distort all these big people {um} why would be much of what they say also thou shalt not just the sabbath can be 
  Ground Truth Transcript: 
and      when they see him      physically it      is at    public    functions    like    this    and   even there      it    is    him {um} who advises them      we    have        


 GCS Candidate Transcript: 
and when they see him <UNK> it is at <UNK> functions like this and even there it is him who <UNK> them
  Seq2Seq Model Prediction Transcript: 
and they range from my life for me so she sat in every seat and they were not stupid people at all 
  Ground Truth Transcript: 
and      when they see him      physically it      is at    public    functions    like    this    and   even there      it    is    him {um} who advises them      we    have        


 GCS 

Average Candidate Transcript Accuracy: 0.539139219313
Average Seq2Seq Model Accuracy: 0.272326400955

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
just played all of the world <UNK> <UNK> for <UNK> to go to <UNK> they did not so there were <UNK> and <UNK> i be able to get if there was a second <UNK> order by george bush <UNK> and
  Seq2Seq Model Prediction Transcript: 
this is all of the last things about the year is that they say also the bible says that if they do i don't know but it's not a better person and stoning adulterers or here and they focus on 
  Ground Truth Transcript: 
was   played all over the world everybody thought the four cops would go to jail they did not so there were   riots    and    what a lot of people forget   is there was a second trial      ordered by george      bush      sr        and that


 GCS Candidate Transcript: 
just played all of the world <UNK> thought the for <UNK> to go to <UNK> they did not so there we


 GCS Candidate Transcript: 
i mean funny things like this happened i was in it i was in the <UNK> office last cat scan and there was a <UNK> <UNK> <UNK> <UNK> <UNK> seven ways to get lucky
  Seq2Seq Model Prediction Transcript: 
i undertook this for two reasons in religion in america so i was thinking i was thinking i was doing all these rituals these biblical rituals separating my life 
  Ground Truth Transcript: 
and   {um} i mean funny things like this    happen i was   in    a    doctor 's   office last   cat scan    and there was a reader 's   digest    october two thousand   and two    it was like seven ways to get lucky        


 GCS Candidate Transcript: 
i mean funny things like this happened i was in it i was in the <UNK> office last <UNK> and there was a <UNK> <UNK> <UNK> <UNK> sock seven ways to get lucky
  Seq2Seq Model Prediction Transcript: 
i undertook this for two reasons in religion in america so i was thinking i was doing my life because i couldn't believe in the b


 GCS Candidate Transcript: 
and so he put her in the <UNK> and i was in the <UNK> with the baby at four months before this happened before months before my <UNK> i thought i could really fix this
  Seq2Seq Model Prediction Transcript: 
and it was about it in the year i spent reading the encyclopedia britannica from a god there's my quest to learn everything in my book and i will say though it's not quite there 
  Ground Truth Transcript: 
and so he put her in the    bathtub {um} and i was in the bedroom with   the baby      and four months before   this happened    four months    before myesha died    i    thought i could really   fix this man


 GCS Candidate Transcript: 
and so he put her in the <UNK> and i was in the <UNK> with the baby at 4 months before this happened before months before my <UNK> i thought i could really fix this
  Seq2Seq Model Prediction Transcript: 
and it was about it in the year i spent reading the encyclopedia britannica from a god there's my quest to learn

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
came back with two two <UNK> going to <UNK> and two <UNK> <UNK> <UNK> i was at that <UNK>
  Seq2Seq Model Prediction Transcript: 
here were also very interested in religion because she sat in every seat in our apartment and i 
  Ground Truth Transcript: 
came back with two    cops    going to   jail and   two    cops declared innocent i was   at that trial       


 GCS Candidate Transcript: 
i'm back with two two <UNK> going to <UNK> and two <UNK> <UNK> <UNK> i was at that <UNK>
  Seq2Seq Model Prediction Transcript: 
i'm very proud because this was very offensive so she sat in every seat in our apartment and 
  Ground Truth Transcript: 
came back with two    cops    going to   jail and   two    cops declared innocent i was   at that trial       


 GCS Candidate Transcript: 
came back with two two <UNK> going to <UNK> in two <UNK> <UNK> <UNK> i was at that <UNK>
  Seq2Seq Model Prediction Transcrip


 GCS Candidate Transcript: 
all around the hair was just <UNK> and he was about two <UNK>
  Seq2Seq Model Prediction Transcript: 
and {um} the bible about how i hailed taxi cabs 
  Ground Truth Transcript: 
all around her    head was just   swollen her head was about two sizes   of


 GCS Candidate Transcript: 
all around her hair was just <UNK> and he was about two <UNK>
  Seq2Seq Model Prediction Transcript: 
and varied movement that {um} was a very job of laws that 
  Ground Truth Transcript: 
all around her    head was just   swollen her head was about two sizes   of

Average Candidate Transcript Accuracy: 0.512301758634
Average Seq2Seq Model Accuracy: 0.265179243473

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
and i still feel as though i've got to get <UNK> for my <UNK> <UNK> even though i've <UNK> most of them and she said and then <UNK> i don't know enough about the negative imagination in <UNK> <UNK> certainly <UNK> us but that's a w


 GCS Candidate Transcript: 
and i still feel as though i've got to get <UNK> for my <UNK> <UNK> even though i've <UNK> most of them and she said and then <UNK> i don't know enough about the negative imagination in <UNK> <UNK> certainly <UNK> us that that's a whole area we don't <UNK> so
  Seq2Seq Model Prediction Transcript: 
and linen and i would ask these religious people who do what do i disagree with hundreds of years and it's sort of evolved it's not about what they said you cannot say it really the idea of sacredness and that our rituals can be sacred the sabbath can be 
  Ground Truth Transcript: 
and   i still feel as   though i've      got to get coffee for   my male colleagues even though i've outlived most of them    and she said      and then      intellectually      i don't   know   enough about the negative imagination and    september eleventh    certainly taught us that   that's a whole area we don't investigate so   


 GCS Candidate Transcript: 
and i still feel as t


 GCS Candidate Transcript: 
10 years in time when i go to the <UNK> to buy my first <UNK> <UNK> it was a huge machine it had it was cabinets of <UNK> and <UNK> and everything
  Seq2Seq Model Prediction Transcript: 
and i've really interesting because i was the year as i was in my biblical clothing sandals and a white robe you know but again the outer 
  Ground Truth Transcript: 
ten years   in time when i got    the funding to buy my first graphics computer    it was a huge   machine    it was cabinets of processors and storage   and   everything        


 GCS Candidate Transcript: 
10 years in time when i go to the <UNK> to buy my first <UNK> <UNK> it was a huge machine it was cabinets of <UNK> and <UNK> and everything
  Seq2Seq Model Prediction Transcript: 
and i've really interesting because i was the year because i was doing to the bible and you know they were not stupid people at all things that 
  Ground Truth Transcript: 
ten years   in time when i got    the funding to buy my


 GCS Candidate Transcript: 
one of the <UNK> i'm really really happy to be able to show you here today is our go to an <UNK> table
  Seq2Seq Model Prediction Transcript: 
and they range from the time the year is that i almost pretended to be a better person and stoning adulterers or here 
  Ground Truth Transcript: 
one   of the things that i'm   {um} really really happy    to be able to show you here today is    our      virtual {um} autopsy table          

Average Candidate Transcript Accuracy: 0.533175980082
Average Seq2Seq Model Accuracy: 0.265049422567

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
so we've been <UNK> all the <UNK> that you can do them on the table and you can think of it as an as an <UNK>
  Seq2Seq Model Prediction Transcript: 
that they distort all the data to fit their model and they go through these amazing mental gymnastics to accomplish this and i will say though 
  Ground Truth Transcript: 
so we've    implemented 


 GCS Candidate Transcript: 
and they're <UNK> and human body
  Seq2Seq Model Prediction Transcript: 
oh my did was that i hailed taxi cabs 
  Ground Truth Transcript: 
quiet    and they're efficient    and there's    a voice on the train    you know   the    voice      was   a      human voice      you see in the      old days      we   


 GCS Candidate Transcript: 
and they're <UNK> and there's a human body
  Seq2Seq Model Prediction Transcript: 
and they wanted it to be a great job 
  Ground Truth Transcript: 
quiet    and they're efficient    and there's    a voice on the train    you know   the    voice      was   a      human voice      you see in the      old days      we   


 GCS Candidate Transcript: 
and they're <UNK>
  Seq2Seq Model Prediction Transcript: 
and you know they were in
  Ground Truth Transcript: 
quiet    and they're efficient    and there's    a voice on the train    you know   the    voice      was   a      human voice      you see in the      old days      


 GCS Candidate Transcript: 
i want <UNK> to think we were a <UNK> family i mean we had all the <UNK> list
  Seq2Seq Model Prediction Transcript: 
i had some very preconceived notions about for instance evangelical christianity and i found that it's such a wide 
  Ground Truth Transcript: 
i want everybody to think we were      a normal family   i mean we had all the materialistic


 GCS Candidate Transcript: 
i want <UNK> think we were a <UNK> family i mean we had all the <UNK> list
  Seq2Seq Model Prediction Transcript: 
i had some very preconceived notions about for instance evangelical christianity and i found that it's such a wide 
  Ground Truth Transcript: 
i want everybody to think we were      a normal family   i mean we had all the materialistic


 GCS Candidate Transcript: 
i <UNK> if i think we were a <UNK> family i mean we had all the material list
  Seq2Seq Model Prediction Transcript: 
i had some of these authors and editors over a couple of years ago i wrote an article 


 GCS Candidate Transcript: 
and it again is <UNK> <UNK> so you can <UNK> and you can look at things in real time and on the <UNK> here without saying too much about this case this is a <UNK> <UNK> <UNK> <UNK> that hit a woman
  Seq2Seq Model Prediction Transcript: 
and it is really interesting by an authors and editors over hundreds of years and they said about homosexuality and you open it up and there's nothing to talk a little bit of a better person and stoning adulterers or here 
  Ground Truth Transcript: 
   and    again it's fully interactive   so you can        rotate and   you can   look at things in real time    on        these systems here    without saying too much about this case this is a traffic accident {um} a    drunk    driver    hit a woman   


 GCS Candidate Transcript: 
again it's <UNK> <UNK> so you can <UNK> and you can look at things in real time and on the <UNK> here without saying too much about this case this is a <UNK> <UNK> <UNK> <UNK> that hit the woman
 

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
<UNK> king
  Seq2Seq Model Prediction Transcript: 
ewing who played 
  Ground Truth Transcript: 
they became     


 GCS Candidate Transcript: 
did <UNK>
  Seq2Seq Model Prediction Transcript: 
a year 
  Ground Truth Transcript: 
they became     


 GCS Candidate Transcript: 
<UNK> inn
  Seq2Seq Model Prediction Transcript: 
and so i 
  Ground Truth Transcript: 
they became     


 GCS Candidate Transcript: 
then became
  Seq2Seq Model Prediction Transcript: 
and i did 
  Ground Truth Transcript: 
they became     


 GCS Candidate Transcript: 
<UNK> <UNK>
  Seq2Seq Model Prediction Transcript: 
and you know they
  Ground Truth Transcript: 
they became     


 GCS Candidate Transcript: 
did the king
  Seq2Seq Model Prediction Transcript: 
that was fall in 
  Ground Truth Transcript: 
they became     


 GCS Candidate Transcript: 
did the kid
  Seq2Seq Model Prediction Transcript: 
that i did anyway 
 


 GCS Candidate Transcript: 
just about to close an advice without losing a <UNK> just because of like a country where they <UNK> <UNK>
  Seq2Seq Model Prediction Transcript: 
a movement so that are actually a lot of time so they answered my emails they answered my phone they argued 
  Ground Truth Transcript: 
and    they're just   about to   close      the pneumatic    doors    and that      voice    without      losing a    beat    says      because of late      entry we're   delayed thirty   seconds      just


 GCS Candidate Transcript: 
just about to close an advice without losing a bit just because of like a country where they like <UNK> seconds
  Seq2Seq Model Prediction Transcript: 
about it was a big lesson of a cake versus another guy not wearing clothes of mixed fabrics would the martians say well that 
  Ground Truth Transcript: 
and    they're just   about to   close      the pneumatic    doors    and that      voice    without      losing a    beat    says      because o

  Seq2Seq Model Prediction Transcript: 
and they range from the famous ones that i had heard of the bible so 
  Ground Truth Transcript: 
and   famine      although    they're part and   parcel    of    our    african   reality    they are not      the   only    reality      and   secondly    they are the smallest    reality     


 GCS Candidate Transcript: 
order that part and <UNK> of <UNK> reality they're not the only reality reality
  Seq2Seq Model Prediction Transcript: 
and they range from my year to live those of the bible is the 
  Ground Truth Transcript: 
and   famine      although    they're part and   parcel    of    our    african   reality    they are not      the   only    reality      and   secondly    they are the smallest    reality     


 GCS Candidate Transcript: 
order the part and <UNK> of <UNK> reality they are not the only reality reality
  Seq2Seq Model Prediction Transcript: 
so that the law in leviticus you cannot shave the corners of your beard i didn't kn


 GCS Candidate Transcript: 
kind of an area i would say and then many <UNK> that i have <UNK> knows that the los angeles <UNK> happened before because for <UNK> <UNK> up a black man man named <UNK> king was <UNK> on <UNK> technology
  Seq2Seq Model Prediction Transcript: 
where i hired a team of people in bangalore india to live my life and i started to realize the hundreds of arcane and obscure laws that are not stupid people who don't know but they focus on high 
  Ground Truth Transcript: 
it's a kind of an      aria      i would say and in many    tapes that   i have      everybody knows that   the los angeles riots happened    because      four cops      beat up a black    man named rodney king it was captured on      video tape      technology     


 GCS Candidate Transcript: 
it's kind of an area i would say and then many <UNK> that i have <UNK> knows that the los angeles <UNK> happened before because for <UNK> <UNK> up a black man man named <UNK> king was <UNK> on <UNK> techn


 GCS Candidate Transcript: 
<UNK> <UNK> and <UNK>
  Seq2Seq Model Prediction Transcript: 
you know that anyway 
  Ground Truth Transcript: 
korean   victims   and    other    victims   who


 GCS Candidate Transcript: 
and <UNK> and <UNK>
  Seq2Seq Model Prediction Transcript: 
and so i did anyway 
  Ground Truth Transcript: 
korean   victims   and    other    victims   who


 GCS Candidate Transcript: 
<UNK> and the <UNK>
  Seq2Seq Model Prediction Transcript: 
what you did a 
  Ground Truth Transcript: 
korean   victims   and    other    victims   who


 GCS Candidate Transcript: 
the <UNK> and <UNK>
  Seq2Seq Model Prediction Transcript: 
the next day 
  Ground Truth Transcript: 
korean   victims   and    other    victims   who


 GCS Candidate Transcript: 
<UNK> <UNK> and <UNK> who
  Seq2Seq Model Prediction Transcript: 
oh my life is a 
  Ground Truth Transcript: 
korean   victims   and    other    victims   who


 GCS Candidate Transcript: 
<UNK> and other <UNK> who
  Seq2Seq Mo


 GCS Candidate Transcript: 
is using <UNK> <UNK> and <UNK> <UNK> to scan of the brain or any part of the body is what we really getting out of this is information
  Seq2Seq Model Prediction Transcript: 
rule that was difficult to obey was the rules that will get you into a little trouble in twenty one st where you cannot work it really the lord is 
  Ground Truth Transcript: 
is using magnetic fields and radio frequencies to scan    the brain    or any part of the body    so what   we're   really   getting out of this      is      information       

Average Candidate Transcript Accuracy: 0.472445985899
Average Seq2Seq Model Accuracy: 0.254260311629

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
on the show my face up <UNK> had to <UNK> out my nose and they took these <UNK> and <UNK> them up
  Seq2Seq Model Prediction Transcript: 
on the thing is that are the first is that i was doing my year doing my corners were so i decided 
  Ground Truth T


 GCS Candidate Transcript: 
i would not have my new <UNK> for a <UNK> i would still be on my bad hip that was so <UNK> so i left his office and i was walking to the hospital and that's when i had my <UNK>
  Seq2Seq Model Prediction Transcript: 
i was able to stone one adulterer and editors over a better person and i work in my book because i am a workaholic so having this one day where you cannot work it really was that they don't know but 
  Ground Truth Transcript: 
i    would not have   my new   hip for   ted two thousand and eight    i would still be on   my bad hip    that was so disappointing    so i left his office    and i was      walking through the hospital    and that's when   i had my epiphany   


 GCS Candidate Transcript: 
i would not have my new have for a <UNK> i would still be on my bad hip that was so <UNK> so i left his office and i was walking to the hospital and that's when i had my <UNK>
  Seq2Seq Model Prediction Transcript: 
i got out by religion in religion 


 GCS Candidate Transcript: 
this is something that we could use as a <UNK> to really understand how the nearest working how the brain is working and we can do this very very high <UNK> <UNK> and very fast <UNK>
  Seq2Seq Model Prediction Transcript: 
this is started by this is a year to happiness for me so that we all the bible possibly tell us to be picking and choosing the key is the bigger so that 
  Ground Truth Transcript: 
   this is something that we could use   as a tool    to    really understand how the neurons are working how the brain is working    and   we can   do this with   very very high visual quality    and   very fast resolution        


 GCS Candidate Transcript: 
is this something that we could use as a <UNK> to really understand how the nearest work and how the brain is working and we can do this very very high <UNK> <UNK> and very fast <UNK>
  Seq2Seq Model Prediction Transcript: 
there's this is this is actually a key to happiness for me is to just the bible 


 GCS Candidate Transcript: 
<UNK> <UNK> a little bit of a challenge the challenge of <UNK> with data data that we have to deal with in a medical
  Seq2Seq Model Prediction Transcript: 
a movement or of the bible is a lot of people who are the bible be a little trouble in twenty one st century america and 
  Ground Truth Transcript: 
        start    by      posing a little bit of a challenge    the challenge    of    dealing with data        data   that we have   to deal with      in    medical       

Average Candidate Transcript Accuracy: 0.473742174569
Average Seq2Seq Model Accuracy: 0.256067529395

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
that would <UNK> to get 1 <UNK> of data that's eight hundred thousand books and <UNK> <UNK> of phone books does one patient one day that's it and this is what we have to deal with
  Seq2Seq Model Prediction Transcript: 
and they range from my year doing an amazing because i do that they wanted it to b


 GCS Candidate Transcript: 
this is really nice and they are to take that even further this is his heart and this is all so new to this fantastic new <UNK> that justin is <UNK> seconds so i can scan the whole heart
  Seq2Seq Model Prediction Transcript: 
this is started by a psychologist in virginia who says that you should never ever lie except maybe during poker and golf his only exceptions and it is what they say also thou shalt not just the only 
  Ground Truth Transcript: 
so this is really nice    and      to take that even further this is a    heart    and   this is also {um} due to these fantastic new   scanners that just in    zero point three seconds   i can   scan the whole heart     


 GCS Candidate Transcript: 
this is really nice and they are to take that even further this isn't heart and this is all so new to this fantastic new <UNK> that justin is <UNK> seconds so i can scan the whole heart
  Seq2Seq Model Prediction Transcript: 
this is started to a psychologist in v


 GCS Candidate Transcript: 
fantastic <UNK> that uses <UNK> x-ray <UNK> that are <UNK> very fast around the <UNK> text about <UNK> seconds to go through the whole machine
  Seq2Seq Model Prediction Transcript: 
jewish shalt not harmful but no one went hungry and there's them to be a little trouble in twenty one st century america and 
  Ground Truth Transcript: 
it's a fantastic device it uses x rays    x ray beams that are   rotating very fast around the human   body    it takes about thirty seconds   to go through the whole machine     


 GCS Candidate Transcript: 
fantastic <UNK> that uses <UNK> x-ray <UNK> that are <UNK> very fast around the human body takes about <UNK> seconds to go through the whole machine
  Seq2Seq Model Prediction Transcript: 
jewish shalt not harmful but {um} rituals by themselves are not to be dismissed and it is a movement that rule that 
  Ground Truth Transcript: 
it's a fantastic device it uses x rays    x ray beams that are   rotating very fast around


 GCS Candidate Transcript: 
do you think about the data that we can <UNK> today just don't know <UNK> mobile <UNK> if you translate that's to phone books it's about one <UNK> of phone books in the
  Seq2Seq Model Prediction Transcript: 
that you tell you change your mind and you know they expressed interest in the bible and you open it up and there's nothing in the bible and threw them at my face 
  Ground Truth Transcript: 
you think about the data      we can   handle today just   on {um} normal    mobile devices    if you translate that    to        phone books    it's about one meter of phone books in the {um}     


 GCS Candidate Transcript: 
do you think about the data that we can <UNK> today just don't know <UNK> mobile <UNK> if you translate that's to phone books it's about one meter of phone books in the
  Seq2Seq Model Prediction Transcript: 
that you tell you change your mind and you know they expressed interest in the bible and you open it up and there's nothing in the bi


 GCS Candidate Transcript: 
show me where you're living <UNK> and he can <UNK> how the heart is moving they can go in side ocean side of the heart every to see how the <UNK> are moving
  Seq2Seq Model Prediction Transcript: 
my behavior changed my thoughts this was one of these authors and editors over hundreds of years and they go through these amazing mental gymnastics to accomplish this and i will say though 
  Ground Truth Transcript: 
from the real living patient    then he can   examine how the heart is moving    he can      go inside      push inside of the heart    and really   feel how the valves are   moving          


 GCS Candidate Transcript: 
show me where you're living patient and he can <UNK> how the heart is moving it can go inside ocean side of the heart every to see how the <UNK> are moving
  Seq2Seq Model Prediction Transcript: 
my behavior changed my thoughts this was one of the huge lessons of the year is that they say also the idea of sacredness and they focus 


 GCS Candidate Transcript: 
now you can see how it <UNK> peeling off first you saw the body bag that the body came in that i'm peeling of the skin you can see the <UNK> and eventually you can see the <UNK> <UNK> all know this woman
  Seq2Seq Model Prediction Transcript: 
and they range from my year to give you know we do about homosexuality they have a pamphlet that says here 's what jesus said about homosexuality and you open it up and there's nothing in the bible and threw them at my face 
  Ground Truth Transcript: 
and   you can   see how i'm      gradually   peeling off first you saw the body bag that the body came in    then i'm peeling off the skin you can   see    the muscles and eventually   you can   see the bone structure    of {um} this    woman          


 GCS Candidate Transcript: 
and you can see how it <UNK> peeling off first you saw the body bag that the body came in that i'm peeling off the skin you can see the <UNK> and eventually you can see the <UNK> <UNK> all kn

Average Candidate Transcript Accuracy: 0.476926226659
Average Seq2Seq Model Accuracy: 0.256504696295

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
i wish that did i wish that
  Seq2Seq Model Prediction Transcript: 
and i wrote my life 
  Ground Truth Transcript: 
i    wish    that    i    wish    that     


 GCS Candidate Transcript: 
i wish that i wish that
  Seq2Seq Model Prediction Transcript: 
and i did anyway my 
  Ground Truth Transcript: 
i    wish    that    i    wish    that     


 GCS Candidate Transcript: 
i wish that that i wish that
  Seq2Seq Model Prediction Transcript: 
and i wrote my life for me 
  Ground Truth Transcript: 
i    wish    that    i    wish    that     


 GCS Candidate Transcript: 
i wish that didn't i wish that
  Seq2Seq Model Prediction Transcript: 
and i wrote my life for me 
  Ground Truth Transcript: 
i    wish    that    i    wish    that     


 GCS Candidate Transcript: 
i wish that too i wish that
  Seq

Average Candidate Transcript Accuracy: 0.471578566225
Average Seq2Seq Model Accuracy: 0.256569084356

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
i can see that this woman had a problem that was so she had a <UNK> up in the brain and that's been fixed with a little <UNK> metal <UNK> that's <UNK> up the <UNK>
  Seq2Seq Model Prediction Transcript: 
and i thought i'd tell you a little bit about just a year and these are the ultimate literalists and it was fascinating because they were not stupid people at all of the other 
  Ground Truth Transcript: 
and      i    can   see    that    this woman    had a    problem      she had a bleeding up in the brain    and that's been      fixed with a      little stent a metal clamp    that's tightening up    the vessel   


 GCS Candidate Transcript: 
i can see that this woman had a problem that was said she had a <UNK> up in the brain and that's been fixed with a little <UNK> metal <UNK> that's <UNK> up 


 GCS Candidate Transcript: 
seeing this at the beginning at technology <UNK> that's <UNK> right now is that we <UNK> to look at time result <UNK> as well so we getting the dynamics out of the body as well i just <UNK> that we will be <UNK> data doing 5 seconds
  Seq2Seq Model Prediction Transcript: 
my behavior changed the thoughts that are actually by the bible says you cannot wear clothes {um} and they said that jesus never have to follow those and they go through these amazing mental gymnastics to accomplish this and i will say though 
  Ground Truth Transcript: 
   we're      seeing this it's beginning a    technology trend that's happening   right now is that we're starting to      look at      time result situations as well    so we're getting the dynamics out of the body as well    and   just assume {um} that we will be collecting data during   five seconds     


 GCS Candidate Transcript: 
seeing this at the beginning at technology <UNK> that's <UNK> right now is that we <UNK


 GCS Candidate Transcript: 
my <UNK> <UNK> <UNK> was <UNK> right after a <UNK> at the <UNK> shop so naturally i had my <UNK> in my <UNK> i'd <UNK> them <UNK> over the <UNK> in the <UNK> office
  Seq2Seq Model Prediction Transcript: 
my son bedtime stories it was the best month of my life because i just sat back and i read books and watched movies the sabbath can be and they will say though 
  Ground Truth Transcript: 
my next surgeon 's   appointment was    coincidentally    right    after      a shift at the   gift shop    so naturally   i had my vest and my    identification    i draped them casually   over the chair in the doctor   's   office   


 GCS Candidate Transcript: 
my <UNK> <UNK> <UNK> was <UNK> right after a <UNK> at the <UNK> shop so naturally i had my <UNK> in my <UNK> i'd rate them <UNK> over the <UNK> in the <UNK> office
  Seq2Seq Model Prediction Transcript: 
my son bedtime stories it was the best month of my life because i just sat back and i read books and watche


 GCS Candidate Transcript: 
<UNK> <UNK>
  Seq2Seq Model Prediction Transcript: 
and you know they
  Ground Truth Transcript: 
     imitating      robots     


 GCS Candidate Transcript: 
<UNK>
  Seq2Seq Model Prediction Transcript: 
and you
  Ground Truth Transcript: 
     imitating      robots     


 GCS Candidate Transcript: 
<UNK> <UNK>
  Seq2Seq Model Prediction Transcript: 
and you know they
  Ground Truth Transcript: 
     imitating      robots     

Average Candidate Transcript Accuracy: 0.470213740725
Average Seq2Seq Model Accuracy: 0.25550859669

INFO:tensorflow:Restoring parameters from checkpoints/dev
Average Candidate Transcript Accuracy: 0.470213740725
Average Seq2Seq Model Accuracy: 0.25550859669

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
<UNK> because what <UNK> seeing is actually this is seeing his own brain
  Seq2Seq Model Prediction Transcript: 
there's another shot and this is a great job is whether 
  Ground Truth Tran

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
all the way up high <UNK>
  Seq2Seq Model Prediction Transcript: 
the {um} bible so i 
  Ground Truth Transcript: 
and    a    house    and      we    are  


 GCS Candidate Transcript: 
<UNK> high <UNK>
  Seq2Seq Model Prediction Transcript: 
and you know they 
  Ground Truth Transcript: 
and    a    house    and      we    are  


 GCS Candidate Transcript: 
all the way up high
  Seq2Seq Model Prediction Transcript: 
and you know that they 
  Ground Truth Transcript: 
and    a    house    and      we    are  


 GCS Candidate Transcript: 
<UNK> high <UNK>
  Seq2Seq Model Prediction Transcript: 
and you know they 
  Ground Truth Transcript: 
and    a    house    and      we    are  


 GCS Candidate Transcript: 
all the way up high <UNK>
  Seq2Seq Model Prediction Transcript: 
the {um} bible so i 
  Ground Truth Transcript: 
and    a    house    and      we    are  


 GCS Candidate Transcript: 
all


 GCS Candidate Transcript: 
oh that she still wants to know and she said well personally i still feel like i have to <UNK> when i see the president of my university
  Seq2Seq Model Prediction Transcript: 
and varied movement that it was very offensive so she sat in every seat and i work with here and they focus on the sabbath can be 
  Ground Truth Transcript: 
know   that   she still    wants to know   and she said well personally   i still feel like i have   to curtsey when   i see the president of my university   


 GCS Candidate Transcript: 
oh that she still wants to know jesus will personally i still feel like i have to <UNK> when i see the president of my university
  Seq2Seq Model Prediction Transcript: 
and varied movement that it was very offensive so i was praying giving these prayers of thanksgiving which was odd for an agnostic but 
  Ground Truth Transcript: 
know   that   she still    wants to know   and she said well personally   i still feel like i have   to curtsey 

and it was about i did about the rules that i was dressed in my biblical clothing sandals and a white robe 
  Ground Truth Transcript: 
   so    it's a very nice little device    so if you have   the   opportunity please    try it out    it's    really      a hands   on experience      so it's    gained some traction    and   we're trying to roll this out and trying to   use it    for    educational purposes but also perhaps in the future     


 GCS Candidate Transcript: 
<UNK> and i still <UNK> so if you have the opportunity please said try it out it's it's really a <UNK> experience
  Seq2Seq Model Prediction Transcript: 
and it was about i did about you know but you do that you end up acting like a crazy person and 
  Ground Truth Transcript: 
   so    it's a very nice little device    so if you have   the   opportunity please    try it out    it's    really      a hands   on experience      so it's    gained some traction    and   we're trying to roll this out and trying to   use i


 GCS Candidate Transcript: 
she was saying you going to make me look like an <UNK> cuz i've never been to <UNK> and i wouldn't be talking <UNK> anything i said would love to a woman talk to me for hours you know if i wasn't talking like you know she wanted me to talk i don't think she would even come out here
  Seq2Seq Model Prediction Transcript: 
there was also increasingly interested in religion that are started to be completely dismissed and i started to change my perspective and i started to change my perspective and i said well i am an adulterer are you going to stone me and i said well that would be great and {um} came down from on high 
  Ground Truth Transcript: 
and   she was   saying you're   going to make me look like an   idiot because   i've never been   to college and   i wouldn't be talking professional or   anything i said well look the woman talked to me for      four hours you know   if i wasn't   talking you know   like    you know   she wanted me to talk    i    d

Average Candidate Transcript Accuracy: 0.46611025272
Average Seq2Seq Model Accuracy: 0.255160999203

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
here we ask much to go to calculate <UNK> from 100 so it's going <UNK> <UNK> and then he's going <UNK> and you can see how little math <UNK> is working up here and explain
  Seq2Seq Model Prediction Transcript: 
where we are my projects cotton t shirts the bible says that if you do then again and they explained that they would be google it and they said well that would be great 
  Ground Truth Transcript: 
and        here we asked motts    to calculate backwards from one hundred so he   's   going   one hundred    ninety seven    ninety four and   then he   's   going   backwards    and you can   see how the little math processor is working up here in his   brain     


 GCS Candidate Transcript: 
here we ask much to go to calculate <UNK> from 100 so he's going <UNK> <UNK> and then he's going <UNK> an

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
the window at the hospital tiny <UNK>
  Seq2Seq Model Prediction Transcript: 
the museum is gorgeous they really did 
  Ground Truth Transcript: 
in the window    of the hospital    's   tiny gift shop     


 GCS Candidate Transcript: 
the window at the hospital tiny <UNK>
  Seq2Seq Model Prediction Transcript: 
the museum is gorgeous they really did 
  Ground Truth Transcript: 
in the window    of the hospital    's   tiny gift shop     


 GCS Candidate Transcript: 
in the window at the hospital tiny <UNK>
  Seq2Seq Model Prediction Transcript: 
the museum is gorgeous they really did 
  Ground Truth Transcript: 
in the window    of the hospital    's   tiny gift shop     


 GCS Candidate Transcript: 
a window at the hospital tiny <UNK>
  Seq2Seq Model Prediction Transcript: 
the museum is gorgeous they really did 
  Ground Truth Transcript: 
in the window    of the hospital    's   tiny gift shop

  Seq2Seq Model Prediction Transcript: 
and varied movement that the one i will get you know because if you cannot wear clothes made of mixed fibers so {um} i was doing my life for me 
  Ground Truth Transcript: 
off of a tape    and i title things because   i think people    speak in organic poems   and this is called a mirror to   her mouth and   this is    an      inmate      named paulette    jenkins        

Average Candidate Transcript Accuracy: 0.463950332637
Average Seq2Seq Model Accuracy: 0.255203700277

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
truth was the law was
  Seq2Seq Model Prediction Transcript: 
what did the bible 
  Ground Truth Transcript: 
the truth was the    law {um} was    that     


 GCS Candidate Transcript: 
the truth was the law was
  Seq2Seq Model Prediction Transcript: 
the clearest example of 
  Ground Truth Transcript: 
the truth was the    law {um} was    that     


 GCS Candidate Transcript: 
trust was t


 GCS Candidate Transcript: 
so what's inside of this <UNK> is what enables me to do the things that i'm doing with the medical data <UNK> what i'm doing is using these <UNK> little <UNK>
  Seq2Seq Model Prediction Transcript: 
so i thought i'd tell you a little man in america to be picking and choosing the bible says that we all these biblical rituals separating my 
  Ground Truth Transcript: 
so what's inside of this machine    is what enables   me to do the things that i'm   doing      with the medical   data        so {um} really what   i'm   doing    is using these fantastic    little devices   and   you know        going   back    maybe   

Average Candidate Transcript Accuracy: 0.460158789241
Average Seq2Seq Model Accuracy: 0.255000268495

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
put up <UNK> images <UNK> like there's <UNK> in there is the problem they can't do that anymore that's impossible and so we have to do something that's a li

Average Candidate Transcript Accuracy: 0.46071224889
Average Seq2Seq Model Accuracy: 0.255232542625

INFO:tensorflow:Restoring parameters from checkpoints/dev

 GCS Candidate Transcript: 
or anything
  Seq2Seq Model Prediction Transcript: 
and they 
  Ground Truth Transcript: 
for        anything     


 GCS Candidate Transcript: 
for anything
  Seq2Seq Model Prediction Transcript: 
and you know they
  Ground Truth Transcript: 
for        anything     


 GCS Candidate Transcript: 
anything
  Seq2Seq Model Prediction Transcript: 
and you
  Ground Truth Transcript: 
for        anything     


 GCS Candidate Transcript: 
four anything
  Seq2Seq Model Prediction Transcript: 
and they 
  Ground Truth Transcript: 
for        anything     


 GCS Candidate Transcript: 
4 anything
  Seq2Seq Model Prediction Transcript: 
and they 
  Ground Truth Transcript: 
for        anything     


 GCS Candidate Transcript: 
your anything
  Seq2Seq Model Prediction Transcript: 
you know you're old
  Ground


 GCS Candidate Transcript: 
i needed something i'm <UNK> next thing you know i have my bright <UNK> <UNK> best i had my photo <UNK> and i was <UNK> trained by my 89 year old <UNK> i worked alone
  Seq2Seq Model Prediction Transcript: 
i thought i'd tell you a little about what i was able to make generalizations about it so i thought they were not stupid people and they focus on high 
  Ground Truth Transcript: 
they needed   some    young blood    so    next thing you know   i had my bright blue volunteer vest i had my      photo id      and i was      fully trained by my      eighty nine year old boss    i worked alone        


 GCS Candidate Transcript: 
i needed somebody i'm <UNK> so next thing you know i have my bright <UNK> <UNK> best i had my photo <UNK> and i was <UNK> trained by my 89 year old <UNK> i worked alone
  Seq2Seq Model Prediction Transcript: 
i do not recommend you know you know but it's a book that was like you know but it's a couple of my life because i decided t


 GCS Candidate Transcript: 
golden <UNK> mean <UNK> <UNK> <UNK> me you know i mean it's like my wife <UNK> her <UNK> always saying you know you ever think he's just a <UNK> <UNK> seems like he had so much bad <UNK> and i'll be there
  Seq2Seq Model Prediction Transcript: 
my argument is what was actually an elderly man mid seventies just so you know but he 's still an adulterer and still quite angry he grabbed them out of my hand and threw them at my face 
  Ground Truth Transcript: 
well i'm   an   optimist i mean basically    i'm an optimist i mean you know   i mean it's like my wife jolene her    family   's   always saying you know      you    ever think he 's   just   a born loser it seems like he has so much    bad luck you know   but then     


 GCS Candidate Transcript: 
<UNK> mean <UNK> <UNK> <UNK> me you know i mean it's like my wife <UNK> her family is always saying you know you ever think he's just a <UNK> <UNK> seems like he has so much bad <UNK> and i'll be there
  Seq2


 GCS Candidate Transcript: 
this is a good thing but there's certain other <UNK> to the word <UNK> the same thing about the word <UNK> what is <UNK> <UNK> was wonderful <UNK> who
  Seq2Seq Model Prediction Transcript: 
this is a type of experiments so is a god there's something important and beautiful about the idea of sacredness and that our rituals can be sacred the sabbath can be 
  Ground Truth Transcript: 
     like    this as      a    good    thing    but there are   certain    other connotations to the word risk and the same thing about the word nature what is nature    maxine greene who 's   a    wonderful    philosopher who 's  


 GCS Candidate Transcript: 
this is a good thing but there's certain other <UNK> to the word risks in the same thing about the word <UNK> what is <UNK> <UNK> was wonderful <UNK> who
  Seq2Seq Model Prediction Transcript: 
this is a type of experiments so is a god there's something important and beautiful about the idea of sacredness and that our ri


 GCS Candidate Transcript: 
is not the case and <UNK> and this is also again showing us what we can do it's very easy to look at metal <UNK> that we can show inside of the body
  Seq2Seq Model Prediction Transcript: 
this is all the data to fit their model and they go through these amazing mental gymnastics to accomplish this and i will say though it's not quite so that 
  Ground Truth Transcript: 
   here 's    another case a   knifing and      this is also again showing us what   we can   do    it's very    easy to look at metal    artifacts that we can   show inside of the body   


 GCS Candidate Transcript: 
there's another case an <UNK> and this is also again showing us what we can do it's very easy to look at metal <UNK> that we can show inside of the body
  Seq2Seq Model Prediction Transcript: 
they say jesus did talk about one of years and they said you know but it's about what jesus said about homosexuality and you open it up and there's nothing in it so 
  Ground Truth Tran