Title: LSTM neural network for sequence learning
Date: 2017-11-19 22:00
Tags: LSTM, artificial intelligence, jupyter, tensorflow
Slug: My-first-LSTM
Authors: Dinne Bosman
Lang:en
Summary: My first attempt at a LSTM for sequence prediction

In 1996, during my last year in High School, I borrowed a book of a friend about neural networks. It explained how a two layer perceptron network could learn the XOR function. I tried implementing the formulas and was able to do the feed-forward calculations. The training algorithm however still eluded me. Being able to perform forward calculations was already very exciting. I created a windows 95 screen save which whould fill the screen with the output of a randomized neural network. The output images we're very interesting. Especially when replacing the activation functions of the network by exotic ones such as sin(x), abs(x) etc. (Although I lost the source code, you can still download it [here](http://www.free-downloads-center.com/download/neural-screen-saver-v1-0-11252.html))

At the time it seemed that Neural networks were just another statistical method to interpolate data. Furthermore limited training data and the problem of vanishing gradients limited their usefullnes. Fast forward to 2017. Massive amounts of training data and computing power are available. A number of relatively small improvements in the basic neural network algorithms have made it possible to train networks consisting of many more layers. These so-called deep neural networks have fueled progress and interest in Artificial Intelligence.

One particular innovation that caught my attention is the LSTM neural network architecture. This architecture solves the issue of vanishing gradients. LSTM networks are especially suited to perform analysis of sequences and time series. Some interesting links:
  * article about text generation kernel code
  * fake news generator
  * LSTM architecture
  * Voice synthesis (non LSTM)
  * Audio generation

Especially the last topic was very inspiring. Think about the possibilites of voice synthesis and generating music!

In this article I will discuss my experience using Tensor Flow and LSTM networks. I used the following documentation sources:
  * Tensorflow (this is API reference documentation and will not help you understand how to apply LSTM networks)
  * Keras LSTM example
  * Tensorflow example
  * https://arxiv.org/pdf/1506.00019.pdf
  * http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  * https://distill.pub/2016/augmented-rnns/
 
The first test I wanted to implement is too see if I could implement a sine wave predictor. It's a limited example:
  * The network is trained using mini-batches. Due to the periodic nature of the sine wave the train, dev, and test set overlap.
  * I train the network in batches, but want to generate per sample
  

In [22]:
import plotly
from plotly.graph_objs import Scatter, Layout
import numpy as np
import tensorflow as tf
import sys
plotly.offline.init_notebook_mode(connected=True)
import IPython.display

## Training data
The following cell generates the training data. I decided to add some noise to the sine wave which forces some regularization.    

In [1]:
sample_length = 50001
time_per_sample = 0.01
signal_time = np.linspace(num=sample_length,start = 0, stop = sample_length * time_per_sample )
signal_amp = np.sin(signal_time*2*np.pi) + np.random.normal(size=sample_length)*0.02
    #np.sin(2+signal_time*1.7*np.pi)*0.5 + \
    #np.sin(1+signal_time*2.2*np.pi) + \
    

In [3]:
s_i = 0
e_i = s_i + 100
x = plotly.offline.iplot({
    "data": [Scatter(x=signal_time[s_i:e_i],y=signal_amp[s_i:e_i])],
    "layout": Layout(title="")
    
})



In [4]:
sequence_length = 50
prediction_length = 1
input_feature_count = 1
output_feature_count = 1
hidden_count_per_layer = [16,16,16]

tf.reset_default_graph()

inputs = tf.placeholder(tf.float32, [None, sequence_length, input_feature_count], name = 'inputs')
targets = tf.placeholder(tf.float32, [None, output_feature_count], name = 'targets')
keep_prob = tf.placeholder(tf.float32, name = 'keep')
learning_rate = tf.placeholder(tf.float32, name = 'learning_rate')


## Defining the LSTM multi layer network

In [5]:
layers = []


for hidden_count in hidden_count_per_layer:
    layer =  tf.nn.rnn_cell.LSTMCell(hidden_count, state_is_tuple=True)
    layer_with_dropout = tf.nn.rnn_cell.DropoutWrapper(layer,
                                          input_keep_prob=keep_prob,
                                          output_keep_prob=1.0)
    layers.append(layer)
hidden_network = tf.nn.rnn_cell.MultiRNNCell(layers, state_is_tuple=True)   



## Packing/Unpacking the LSTM network state
In order to use the LSTM network to generate a predicted sequence of arbitrary length you need to store the state of the network. The output state after predicting a sample should be fed back in to the network when predicting the next sample.

Unfortunately the LSTM implementation in Tensor flow uses a LSTMStateTuple(c,h) data structure which is not very convenient to work with. The idea is to pack this LSTMStateTuple(c,h) into a 1D vector.

I'm not happy with these functions as they use the batch_size. This complicates things when using the network to generate predictions.

There is a pointer on how to use dynamic batch_sizes and packing/unpacking states [here](https://stackoverflow.com/questions/40438107/tensorflow-changing-batch-size-for-rnn-during-text-generation) The implementation of pack/unpack however doesn't work with my organisation of hidden_network).

In [6]:
def get_network_state_size(network):
    """Returns the number of states variables in the network"""
    states = 0
    for layer_size in hidden_network.state_size:
        states += layer_size[0]
        states += layer_size[1]
    return states


In [7]:
#based on https://stackoverflow.com/questions/40438107/tensorflow-changing-batch-size-for-rnn-during-text-generation
def pack_state_tuple(state_tuple, indent=0):
    """Returns a (batch_size,network_state_size) matrix of the states in the network
        state_tupel = the states obtained from  _ , state = tf.nn.dynamic_rnn(...)
    """
    if isinstance(state_tuple, tf.Tensor) or not hasattr(state_tuple, '__iter__'):
        #The LSTMSTateTuple contains 2 Tensors
        return state_tuple
    else:
        l = []
        #an unpacked LSTM network is tuple of layer size, each element of the tuple is an LSTMStateTuple
        #state_tupel is either the tuple of LSTMStateTuples or it is a LSTMSTateTuple (via recursive call)
        for item in state_tuple:
            # item is either an LSTMStateTuple (top level call)
            # or it is an element of the LSTMStateTuple (first recursive call)
            i = pack_state_tuple(item, indent+2)
            l.append(i)
        
        #convert the list of [Tensor(bsz,a), Tensor(bsz,b), ...] Into one long Tensor (bsz, a-b-c-...)
        return tf.concat(l,1)
    

In [8]:
def unpack_state_tuple(state_tensor, sizes):
    """The inverse of pack, given a packed_states vector of (batch_size,x) return the LSTMStateTuple 
    datastructure that can be used as initial state for tf.nn.dynamic_rnn(...) 
        sizes is the network state size list (cell.state_size)
    """

    def _unpack_state_tuple( sizes_, offset_, indent):
        if isinstance(sizes_, tf.Tensor) or not hasattr(sizes_, '__iter__'): 
            #get a small part (batch size, c or h size of LSTMStateTuple) of the packed state vector of shape (batch size, network states)
            return tf.reshape(state_tensor[:, offset_ : (offset_ + sizes_) ], (-1, sizes_)), offset_ + sizes_
        else:
            result = []
            #Top level: sizes is a tuple of size network layers, each element of the tuple is an LSTMStateTuple(c size, h size)
            #Recursive call: sizes_ is a LSTMStateTuple
            for size in sizes_:
                #size is an LSTMStateTuple (toplevel)
                #or size is c size or h size (recursive call)
                s, offset_ = _unpack_state_tuple( size, offset_, indent+2)
                result.append(s)
            if isinstance(sizes_, tf.nn.rnn_cell.LSTMStateTuple):
                #end of recursive call
                #Build a LSTMStateTuple using the c size and h size elements in the result list
                return tf.nn.rnn_cell.LSTMStateTuple(*result), offset_
            else:
                # end of toplevel call
                # create a tuple of size network layers. Result is a list of LSTMStateTuple
                return tuple(result), offset_
    return _unpack_state_tuple( sizes, 0,0)[0]

### Testing the packing/unpacking functions
In the following cell I check if the pack and unpack functions are indeed eachothers inverse. The vectors should be packed/unpacked in the correct order

In [9]:
#Test pack and unpack

#create a placeholder in which we can feed a packed initial_state
state_packed_in = tf.placeholder(
    tf.float32, 
    (None,get_network_state_size(hidden_network)), 
    name="state_packed_1")


#Unpack the packed states
state_unpacked_out = unpack_state_tuple(state_packed_in,hidden_network.state_size)
#Repack the unpacked states
state_packed_out = pack_state_tuple(state_unpacked_out)


inputs_batch_size = 4
a_batch_of_inputs = np.zeros((inputs_batch_size, sequence_length, input_feature_count))

#create an initial state vector and fill it with test data
an_initial_state = np.zeros((inputs_batch_size*get_network_state_size(hidden_network),1))
an_initial_state[:,0] = np.linspace(start=0,stop=an_initial_state.shape[0]-1,num=an_initial_state.shape[0])
#reshape it as an packed state 
an_initial_state_packed = np.reshape(an_initial_state, (inputs_batch_size,get_network_state_size(hidden_network)))


init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    up,p = sess.run([state_unpacked_out, state_packed_out],  feed_dict={state_packed_in: an_initial_state_packed})
    # compare the original packed states with the ones the were unpacked and then repacked
    diff = an_initial_state_packed - p
    # should return 0
    print("diff",np.sum(np.abs(diff)))

diff 0.0


In [10]:
sz = get_network_state_size(hidden_network)
print("states in network", sz)

#pl_batch_size = tf.placeholder(tf.int32, name = 'bsz')


#initial_state and zero_state are both packed versions of the network state

zero_state = pack_state_tuple(hidden_network.zero_state(10, tf.float32))
print(zero_state.shape)

initial_state_packed = tf.placeholder_with_default(
    zero_state, 
    (None,sz), 
    name="initial_state")

print(initial_state_packed.shape)
state_unpacked = unpack_state_tuple(initial_state_packed,hidden_network.state_size)
print(state_unpacked)

states in network 96
(10, 96)
(?, 96)
(LSTMStateTuple(c=<tf.Tensor 'Reshape_6:0' shape=(?, 16) dtype=float32>, h=<tf.Tensor 'Reshape_7:0' shape=(?, 16) dtype=float32>), LSTMStateTuple(c=<tf.Tensor 'Reshape_8:0' shape=(?, 16) dtype=float32>, h=<tf.Tensor 'Reshape_9:0' shape=(?, 16) dtype=float32>), LSTMStateTuple(c=<tf.Tensor 'Reshape_10:0' shape=(?, 16) dtype=float32>, h=<tf.Tensor 'Reshape_11:0' shape=(?, 16) dtype=float32>))


## Forward propagation

In [11]:
#out_weights=tf.Variable(tf.random_normal([hidden_count_per_layer[-1],output_feature_count]))
#out_bias=tf.Variable(tf.random_normal([output_feature_count]))
print("inputs ",inputs.shape)
outputs, state_unpacked_network_out = tf.nn.dynamic_rnn(hidden_network, inputs, initial_state = state_unpacked, dtype=tf.float32) #, initial_state=rnn_tuple_state, )
state_packed_network_out = pack_state_tuple(state_unpacked_network_out)
print("packed state", state_packed_network_out.shape)
print("outputs before transpose", outputs.shape)
outputs = tf.transpose(outputs, [1, 0, 2])
print("outputs after transpose", outputs.shape)
#last_output = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)
last_output =  outputs[outputs.shape[0]-1,:,:]
print("last output", last_output.shape)
                                   
#---------------------------------------------    
# Create the cells for the RNN network
#lstm = tf.nn.rnn_cell.BasicLSTMCell(128)

# Get the output and state from dynamic rnn
#output, state = tf.nn.dynamic_rnn(lstm, sequence, dtype=tf.float32, sequence_length = seqlen)

# Convert output to a tessor and reshape it
#outputs = tf.reshape(tf.pack(output), [-1, lstm.output_size])

# Set partions to 2
#num_partitions = 2

# The partitions argument is a tensor which is already fed to a placeholder.
# It is a 1-D tensor with the length of batch_size * max_sequence_length.
# In this partitions tensor, you need to set the last output idx for each seq to 1 and 
# others remain 0, so that the result could be separated to two parts,
# one is the last outputs and the other one is the non-last outputs.
#res_out = tf.dynamic_partition(outputs, partitions, num_partitions)

# prediction
#preds = tf.matmul(res_out[1], weights) + bias
#-------------------------------------------------------   
    
#out_size = target.get_shape()[2].value
predictions = tf.contrib.layers.fully_connected(last_output, output_feature_count, activation_fn=None)
print("prediction", predictions.shape)
print("targets", targets.shape)
#prediction = tf.nn.softmax(logit)
#loss = tf.losses.softmax_cross_entropy(target, logit)


inputs  (?, 50, 1)
packed state (?, 96)
outputs before transpose (?, 50, 16)
outputs after transpose (50, ?, 16)
last output (?, 16)
prediction (?, 1)
targets (?, 1)


## Backward pass, training

In [12]:
loss = tf.reduce_sum(tf.squared_difference(predictions, targets))

In [13]:
opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

## Defining the train, dev and test set

In [None]:

start_indices = np.linspace(
    0,
    sample_length-sequence_length-prediction_length-1,
    sample_length-sequence_length-prediction_length-1, dtype= np.int32)

dev_size_perc = 0.20
test_size_perc = 0.20
batch_size = 128 #512 

dev_size = int(np.floor(start_indices.shape[0] * dev_size_perc))
test_size  = int(np.floor(start_indices.shape[0] * test_size_perc))
train_size = start_indices.shape[0] - test_size - dev_size
train_batch_count = int(np.floor(train_size / batch_size))
dev_batch_count = int(np.floor(dev_size / batch_size))
test_batch_count = int(np.floor(test_size / batch_size))

print("dataset size %d" %(start_indices.shape[0]))
print("%d Examples (%d batches) in train set" %(train_size, train_batch_count))
print("%d Examples (%d batches) in dev set" %(dev_size,dev_batch_count))
print("%d Examples (%d batches) in test set" %(test_size,test_batch_count))



In [15]:
np.random.shuffle (start_indices)
train_indices = start_indices[0:int(train_size)]
dev_indices= start_indices[int(train_size):int(train_size+dev_size)]
test_indices = start_indices[int(train_size+dev_size):int(train_size+dev_size+test_size)]

def get_batch(batch_index, indexes, size=batch_size):
    batch_start_indexes = indexes[batch_index*size:batch_index*size+size]
    batch_inputs = np.zeros((size,sequence_length, input_feature_count))
    batch_targets = np.zeros((size,prediction_length))
    for i in range(size):
        se = batch_start_indexes[i]
        part = signal_amp[se:se+sequence_length]
        batch_inputs[i,0:sequence_length,0] = part
        batch_targets[i,0] = signal_amp[se+sequence_length+1]

    return batch_inputs,batch_targets

batch_inputs,batch_targets = get_batch(train_batch_count-1,train_indices)
print(batch_inputs.shape,batch_targets.shape)

example_inputs = batch_inputs[0,:,:]
example_targets =  batch_targets[0,:]
print(example_inputs.shape)
#b_i = 1
#b_s = batch_inputs[b_i,0:sequence_length,0]
#plotly.offline.iplot({
#    "data": [Scatter(y=b_s)],
#    "layout": Layout(title="")
#})

(128, 50, 1) (128, 1)
(50, 1)


## Test training using a single batch
In the next cell I check if I can train the network on one single batch. Just to check if the optimizer indeed trains the network. In the output you will see the loss decreasing (first column)

In [19]:
np.random.shuffle (start_indices)
train_indices = start_indices[0:int(train_size)]
dev_indices= start_indices[int(train_size):int(train_size+dev_size)]
test_indices = start_indices[int(train_size+dev_size):int(train_size+dev_size+test_size)]

zero_state_packed = np.zeros((batch_size, get_network_state_size(hidden_network)))


init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)

    np.random.shuffle (train_indices)
    
    batch_inputs,batch_targets = get_batch(0, train_indices)
    print("batch input shape", batch_inputs.shape)
    #v_outputs, v_state = sess.run([outputs,state], feed_dict={inputs: batch_inputs, targets: batch_targets})
    v_predictions, v_state_unpacked = sess.run([predictions, state_unpacked_network_out], 
                                      feed_dict={
                                          inputs: batch_inputs, 
                                          targets: batch_targets,
                                          initial_state_packed: zero_state_packed
                                      })
    print(v_predictions.shape)
    print(v_predictions[0],batch_targets[0])
    for i in range(0,120):
        v_predictions, v_outputs, v_state_unpacked, v_loss, v_opt = sess.run(
            [predictions, outputs, state_unpacked_network_out, loss, opt], 
            feed_dict={
                learning_rate: 0.02, 
                inputs: batch_inputs, 
                targets: batch_targets,
                state_unpacked: v_state_unpacked
            }) #})
        print(v_loss,v_predictions[0],batch_targets[0])
 

    
    

batch input shape (128, 50, 1)
(128, 1)
[-0.00938506] [ 0.73128641]
52.2535 [-0.00946718] [ 0.73128641]
45.2883 [ 0.15765706] [ 0.73128641]
24.7474 [ 0.20230854] [ 0.73128641]
16.1199 [ 0.39142293] [ 0.73128641]
7.70782 [ 0.82317805] [ 0.73128641]
41.3318 [ 1.06179476] [ 0.73128641]
5.10417 [ 0.46416807] [ 0.73128641]
16.4105 [ 0.05707151] [ 0.73128641]
5.75008 [ 0.37633395] [ 0.73128641]
7.85079 [ 0.6688652] [ 0.73128641]
7.10021 [ 0.77954197] [ 0.73128641]
3.02173 [ 0.81083065] [ 0.73128641]
2.9519 [ 0.78214532] [ 0.73128641]
3.8564 [ 0.80796784] [ 0.73128641]
1.6532 [ 0.86031741] [ 0.73128641]
1.148 [ 0.87650692] [ 0.73128641]
1.74538 [ 0.87335992] [ 0.73128641]
1.68153 [ 0.85683346] [ 0.73128641]
0.98539 [ 0.83094937] [ 0.73128641]
0.796153 [ 0.80023313] [ 0.73128641]
1.29375 [ 0.77302444] [ 0.73128641]
1.42221 [ 0.75979626] [ 0.73128641]
0.935665 [ 0.76170456] [ 0.73128641]
0.636474 [ 0.7716862] [ 0.73128641]
0.770781 [ 0.78375113] [ 0.73128641]
0.846203 [ 0.79437506] [ 0.73128641

## Training and Testing



In [21]:
np.random.shuffle (start_indices)
train_indices = start_indices[0:int(train_size)]
dev_indices= start_indices[int(train_size):int(train_size+dev_size)]
test_indices = start_indices[int(train_size+dev_size):int(train_size+dev_size+test_size)]

batch_zero_state_packed = np.zeros((batch_size, get_network_state_size(hidden_network)))


epoch_count = 2

loss_results = np.zeros((epoch_count,2))

def get_dev_loss():
    epoch_dev_loss = 0.0
    for devi in range(dev_batch_count):
        batch_inputs,batch_targets = get_batch(devi, dev_indices)

        batch_dev_loss = sess.run(loss,feed_dict={
            inputs:batch_inputs,
            targets:batch_targets,
            initial_state_packed: batch_zero_state_packed
        })
        if devi % 20 == 0:
            print("  Dev results batch %d, loss %s" %(  devi, str(batch_dev_loss)))  

        epoch_dev_loss += batch_dev_loss
        #sys.stdout.write('.')
        #sys.stdout.flush()
    return epoch_dev_loss / dev_size

def generate_graph(graph_size=200):
    prime_size = 20
    
    prime_signal_start_i = 0
    
    tmp_signal = np.zeros((graph_size,1))
    tmp_signal[0:prime_size,0] = signal_amp[prime_signal_start_i:(prime_signal_start_i+prime_size)]
    #tmp_signal[0:prime_size,0] = np.random.normal(size=prime_size)*0.6+0.1
    seq = np.zeros((1,sequence_length,1))
    
    seq_state_packed = np.zeros((1, get_network_state_size(hidden_network)))
    
    _state_unpacked = None
    for end in range(prime_size, graph_size):
        #end = prime_size
        seq[0,:,0] = tmp_signal.take(range((end-sequence_length),end), mode='wrap')
        seq_state_packed , _prediction = sess.run(
            [state_packed_network_out, predictions[0,0]], 
            feed_dict={
                learning_rate: 0.02, 
                initial_state_packed: seq_state_packed,
                inputs: seq})
            
        #print(_prediction)
        tmp_signal[end,0] = _prediction
        sys.stdout.write('.')
        sys.stdout.flush()
    print("")
    plotly.offline.iplot({
       "data": [Scatter(y=tmp_signal[:,0])],
       "layout": Layout(title="")})


init=tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)

    epoch_dev_loss = get_dev_loss()    

    #print("")            
    print("Dev results epoch start, loss %s" %(  str(epoch_dev_loss),))  

    for epoch in range(0,epoch_count):
        np.random.shuffle (train_indices)
        epoch_train_loss = 0.0
        for ti in range(train_batch_count):
            batch_inputs,batch_targets = get_batch(ti, train_indices)

            batch_train_loss, _ = sess.run([loss, opt], 
                                           feed_dict={
                                               learning_rate: 0.002, 
                                               inputs: batch_inputs, 
                                               targets: batch_targets,
                                               initial_state_packed: batch_zero_state_packed
                                           })
            if ti % 20 == 0:
                print("  Train results batch %d, loss %s" %(  ti, str(batch_train_loss)))  
            epoch_train_loss += batch_train_loss
            #sys.stdout.write('.')
            #sys.stdout.flush()
        #print("")
        epoch_train_loss = epoch_train_loss / train_size
        print("Training results epoch %d, loss %s" %( epoch, str(epoch_train_loss)))
        epoch_dev_loss = get_dev_loss()    
        #print("")            
        print("Dev results epoch %d, loss %s" %( epoch, str(epoch_dev_loss)))  
        loss_results[epoch,0] = epoch_train_loss
        loss_results[epoch,1] = epoch_dev_loss
        ti += 1
        generate_graph()
    generate_graph(graph_size=1000)
        

  Dev results batch 0, loss 64.6853
  Dev results batch 20, loss 59.924
  Dev results batch 40, loss 64.3831
  Dev results batch 60, loss 60.0177
Dev results epoch start, loss 0.501490246296
  Train results batch 0, loss 63.4884
  Train results batch 20, loss 1.51249
  Train results batch 40, loss 1.53891
  Train results batch 60, loss 0.148184
  Train results batch 80, loss 0.0676131
  Train results batch 100, loss 0.0605076
  Train results batch 120, loss 0.0715172
  Train results batch 140, loss 0.0435619
  Train results batch 160, loss 0.0453466
  Train results batch 180, loss 0.0579558
  Train results batch 200, loss 0.0738496
  Train results batch 220, loss 0.0640711
Training results epoch 0, loss 0.0236120101267
  Dev results batch 0, loss 0.068854
  Dev results batch 20, loss 0.0649479
  Dev results batch 40, loss 0.0810422
  Dev results batch 60, loss 0.072207
Dev results epoch 0, loss 0.000520994107499
..........................................................................

  Train results batch 0, loss 0.0627837
  Train results batch 20, loss 0.0526446
  Train results batch 40, loss 0.0605808
  Train results batch 60, loss 0.0548361
  Train results batch 80, loss 0.0673049
  Train results batch 100, loss 0.0400483
  Train results batch 120, loss 0.072192
  Train results batch 140, loss 0.0751223
  Train results batch 160, loss 0.0570399
  Train results batch 180, loss 0.0579671
  Train results batch 200, loss 0.0901873
  Train results batch 220, loss 0.0851254
Training results epoch 1, loss 0.000498810676817
  Dev results batch 0, loss 0.0624524
  Dev results batch 20, loss 0.0612139
  Dev results batch 40, loss 0.0738324
  Dev results batch 60, loss 0.0611263
Dev results epoch 1, loss 0.000493027050552
....................................................................................................................................................................................


....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................


## Conclusion, next steps

  * Implement a dynamic batch size
  * Apply on raw audio
      * Sample microfone via WebAudo, send the samples to the notebook via WebSocket, analyze and feed the result back
  * Implement a phase vocodor, instead of raw audio, input the frequency features
  * Achieve something like [this](https://deepmind.com/blog/wavenet-generative-model-raw-audio/) :)