# A Glance at CTC



### Key Word List :
- CTC
- Wrap-CTC
- CTC-loss-layer

### Key Paper Stack
- [Deep Speech 2](https://arxiv.org/pdf/1512.02595v1.pdf)
- [Connectionist Temporal Classification]()

### Reference 
- [Example : bdlstem_train_sample](https://github.com/jonrein/tensorflow_CTC_example/blob/master/bdlstm_train_sample.py)
- [Example TF c++ implementation](https://github.com/tensorflow/tensorflow/blob/d42facc3cc9611f0c9722c81551a7404a0bd3f6b/tensorflow/core/kernels/ctc_loss_op.cc)
- [Keras Issue :  383](https://github.com/fchollet/keras/issues/383)
- [Project : keras CTC](https://github.com/david-leon/Keras_CTC)
- [Project : ctc-loss-op-test in TF](https://github.com/tensorflow/tensorflow/blob/679f95e9d8d538c3c02c0da45606bab22a71420e/tensorflow/python/kernel_tests/ctc_loss_op_test.py)
- [Project : rnn_ctc](https://github.com/rakeshvar/rnn_ctc)

### Problem Target : 
- **<font color='blue'>Temporal Classification</font>** :  Labelling or predicting lables from noisy, unsegmented data sequence, like spectrogram or Waveform as fig-1.
 - Framewise Temporal Classification (Segmented Sequential Data)
 - Connectionist Temporal Classification

### Fig1 

<img src='image/Glance-03-ConnectionistTemporalClassification-01.png'/>

### Temporal Classification in Math Expression 
#####  Label Error Rate is using [Edit Distance Algorithm](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.19.7158&rep=rep1&type=pdf) or called Levenshtein Distance.

- $ LER(h,S') = \frac{1}{Z} \sum_{(x,z) \in S'} ED(h(x)) $
 - **Insertion** of a single symbol. If a = uv, then inserting the symbol x produces uxv. This can also be denoted ε→x, using ε to denote the empty string.
 - **Deletion** of a single symbol changes uxv to uv (x→ε).
 - **Substitution** of a single symbol x for a symbol y ≠ x changes uxv to uyv (x→y).
 
 ***
 
#### Edit distance in code implementation in Theory

In [72]:
import numpy as np

def ED(x, y, plot_func=None):
    assert len(x)>0 ; assert len(y)>0
    # ED table, where is the memo, memory-matrix, attention-memory ... etc 
    # lots of similar idea
    # here we treat the cost of substitution == 1 
    # we could also it as 2 in alternative
    # Actually, we can make lots of tricks here lol...
    
    # init
    memo = np.zeros((len(x)+1,len(y)+1))
    for i in range(1, len(x)+1):
        memo[i][0] = i
    for j in range(1, len(y)+1):
        memo[0][j] = j
    
    # DP 
    for i in range(1,len(x)+1):
        for j in range(1,len(y)+1):
            if x[i-1]==y[j-1]:
                memo[i][j]=min( memo[i-1][j]+1, memo[i][j-1]+1, memo[i-1][j-1]  )
            else:
                memo[i][j]=min( memo[i-1][j]+1, memo[i][j-1]+1, memo[i-1][j-1]+1  )
    if plot_func:
        plot_func(memo)
    return memo[len(x)][len(y)]


### Features
- For labelling sequence data with RNNs that removes the need for pre-segmented training data and post-processed outputs, and models all aspects of the sequence within a single network architecture.

### Core Algorithm
* ctc_loss
* ctc_greedy_decoder
* ctc_beam_search_decoder

In [None]:
import tensorflow as tf 
import editdistance

# training Pair  = x, z 
# training Set = S ~ Distribution of (X,Z)
# -----------------------------------------------------------
# The CTC loss automatically performs the softmax operation, 
# so we can skip this operation. 
# Also, the CTC requires an input of shape 
# [max_timesteps, batch_size, num_classes] 
# (and I don’t know why, because the Tensoflow’s code isn’t time major by default).

class ConnectionistTemporal(layer):
    def __init__(self, max_length, batch_size):
        # factory
        self.batch_size = batch_size
        self.max_length = max_length
        
    
    def loss():
        # customized loss
        
        pass
    
    def encoding():
        # pre-process & post-process Pair
        pass
    
    def greedy_decoding():
        # preprocess & post-process Pair
        pass
    
    def beam_search_decoding(self):
        # pre-process & post-process Pair
        pass
    
    def validate(self, tensor):
        assert tensor.get_shape()==3 # batch, length, each_label_is_a vector
    
    def label_error_rate(self, truY, preY):
        # Z = total_labels_number_in_test_dataset 
        # edit distance 
        loss = 0
        for iid in range(self.batch_size):
            Z = len(truY[iid,:,:])
            loss += self.eD(truY[iid,:,:],preY[iid,:,:])/float(Z)
        return loss/float(batch_size)
    
    def eD(self,x,y):
        return editdistance.eval
    

In [3]:
# http://www.swharden.com/wp/2016-07-19-realtime-audio-visualization-in-python/
import pyaudio
import numpy as np

CHUNK = 2**11
RATE = 44100

p=pyaudio.PyAudio()
stream=p.open(format=pyaudio.paInt16,channels=1,rate=RATE,input=True,
              frames_per_buffer=CHUNK)

for i in range(int(10*44100/1024)): #go for a few seconds
    data = np.fromstring(stream.read(CHUNK),dtype=np.int16)
    peak=np.average(np.abs(data))*2
    bars="#"*int(50*peak/2**16)
    print("%04d %05d %s"%(i,peak,bars))

stream.stop_stream()
stream.close()
p.terminate()

0000 01049 
0001 00147 
0002 00116 
0003 00101 
0004 00187 
0005 00327 
0006 00252 
0007 00310 
0008 00355 
0009 00153 
0010 00079 
0011 00077 
0012 00080 
0013 00058 
0014 00046 
0015 00077 
0016 00202 
0017 00295 
0018 00249 
0019 00349 
0020 01069 
0021 02008 #
0022 03153 ##
0023 03219 ##
0024 05315 ####
0025 06513 ####
0026 03724 ##
0027 02934 ##
0028 04062 ###
0029 05777 ####
0030 06264 ####
0031 05301 ####
0032 02421 #
0033 01971 #
0034 03515 ##
0035 05433 ####
0036 06329 ####
0037 05076 ###
0038 01939 #
0039 03158 ##
0040 04187 ###
0041 04066 ###
0042 03215 ##
0043 03615 ##
0044 03204 ##
0045 03132 ##
0046 05395 ####
0047 03521 ##
0048 02088 #
0049 01408 #
0050 04797 ###
0051 05172 ###
0052 04234 ###
0053 04315 ###
0054 01240 
0055 00395 
0056 01692 #
0057 05518 ####
0058 06917 #####
0059 06400 ####
0060 03772 ##
0061 02289 #
0062 02860 ##
0063 04328 ###
0064 04857 ###
0065 04689 ###
0066 04307 ###
0067 03509 ##
0068 02984 ##
0069 02887 ##
0070 02765 ##
0071 02502 #
0072 02333 #

### Tensorflow 

In Tensorflow, there is a C implementation in CTC_loss
ctc_ops.ctc_loss(logits, targets, seq_len)

In [18]:
import tensorflow as tf 
import editdistance

In [16]:
a = tf.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], shape=[3, 4])
tf.global_variables_initializer()
with tf.Session() as sess:
    print a.values

Tensor("SparseTensor_6/values:0", shape=(2,), dtype=int32)


In [24]:
int(editdistance.eval('abc', 'bca'))

2

[TF Doc :: Links](https://www.tensorflow.org/versions/r0.11/api_docs/python/nn/conectionist_temporal_classification__ctc_#ctc_loss)

The inputs Tensor's innermost dimension size, num_classes, represents num_labels + 1 classes, where num_labels is the number of true labels, and the largest value (num_classes - 1) is reserved for the blank label.

### Ref 
1. [Edit Distance PPT from Standfard](https://web.stanford.edu/class/cs124/lec/med.pdf)