In [0]:
!pip install tensorflow==1.1.0



In [0]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### What does our data look like?


In [0]:
!cd /content/drive/My\ Drive/WORKSHOP
# get the dataset from this link: https://docs.google.com/file/d/0B04GJPshIjmPRnZManQwWEdTZjg/edit
# put the files in CharLSTM/datasets

PATH = '/content/drive/My Drive/WORKSHOP'
train_set = '{}/CharLSTM/datasets/training.1600000.processed.noemoticon.csv'.format(PATH)
test_set = '{}/CharLSTM/datasets/testdata.manual.2009.06.14.csv'.format(PATH)

with open(train_set, 'r') as f:
    file = f.readlines()
    print('length training set: {}'.format(len(file)))
    print(file[0])

with open(test_set, 'r') as f:
    file = f.readlines()
    print('length test set: {}'.format(len(file)))
    print(file[0])

length training set: 1600000
"0","1467810369","Mon Apr 06 22:19:45 PDT 2009","NO_QUERY","_TheSpecialOne_","@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer.  You shoulda got David Carr of Third Day to do it. ;D"

length test set: 498
"4","3","Mon May 11 03:17:40 UTC 2009","kindle2","tpryan","@stellargirl I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantastic in its own right."



## Creating the Datasets / Preprocessing (Optional) 

Datasets are already created for you, thus, you don't need to run it, but you can keep this function for future reference. Here's what it does:

i. Shuffle the dataset

ii. Reshape the lines as tuples: (sentiment (0 or 1), sentence) 

iii. Remove non readable sentences

TODO: Remove links, hashtags and references to other twitter users

In [0]:
!cd /content/drive/My\ Drive/WORKSHOP
import random
import csv

VALID_PERC = 0.05


def reshape_lines(lines):
    data = []
    for l in lines:
        split = l.split('","')
        content = (split[0][1:], split[-1][:-2])
        try:
          word_tokenize(content[1])
          data.append(content)
        except:
          pass
    return data


def save_csv(out_file, data):
    with open(out_file, 'wb') as f:
        writer = csv.writer(f)
        writer.writerows(data)
    print('Data saved to file: %s' % out_file)


def shuffle_datasets(valid_perc=VALID_PERC):
    """ Shuffle the dataset """

    # Create training and validation set
    print('Creating training & validation set...')

    with open(train_set, 'r') as f:
        lines = f.readlines()
        random.shuffle(lines)
        lines_train = lines[:int(len(lines) * (1 - valid_perc))]
        lines_valid = lines[int(len(lines) * (1 - valid_perc)):]

    
    

    save_csv('valid_set.csv', reshape_lines(lines_valid))
    save_csv('train_set.csv', reshape_lines(lines_train))

    print('Creating testing set...')

    with open(test_set, 'r') as f:
        lines = f.readlines()
        random.shuffle(lines)

        save_csv('test_set.csv', reshape_lines(lines))
    print('All datasets have been created!')

# shuffle_datasets()
# Once the files are created, download them and put them in /content/drive/My\ Drive/WORKSHOP
# We're forced to do it manually because colab does not let you write that directory... I think?

# Building the Embedding

3 Main Advantages of Working with Character-Level Embeddings:

i. The model will be much smaller (around ~50-100 mb for the whole model compared to over 3GB for the classic Word2Vec from Google - this is only taking into account the embedding, not the actual model);

ii. The model can understand the underlying emotion of repetitive letters (e.g. hellooooo!!);

iii. The model is almost immune to typos.

## I. Defining an Alphabet

In [0]:
import numpy as np

# We're going to embed characters by their position in the ascii 
# table

characters = 'abcdefghijklmnopqrstuvwxyz0123456789-,;' \
                 '.!?:\'"/\\|_@#$%^&*~`+-=<>()[]{} '
characters_ascii = np.frombuffer(np.array(list(characters)),
                                           dtype=np.uint8) - 32

print('The 70 most common characters can be represented as:')
print(['{}: {}'.format(characters[i], characters_ascii[i]) for i in range(len(characters))])

# You can then one-hot encode a word using the following code (In this example, we encode
# the 70 most common characters):
alphabet_length = np.max(characters_ascii)
embedding = (characters_ascii[:, None] == np.arange(alphabet_length)).astype(int)

print('\n')
print('Creating one-hot vector from the 70 most common characters:')
print(embedding)
print('one-hot vector shape: {}'.format(embedding.shape))

The 70 most common characters can be represented as:
['a: 65', 'b: 66', 'c: 67', 'd: 68', 'e: 69', 'f: 70', 'g: 71', 'h: 72', 'i: 73', 'j: 74', 'k: 75', 'l: 76', 'm: 77', 'n: 78', 'o: 79', 'p: 80', 'q: 81', 'r: 82', 's: 83', 't: 84', 'u: 85', 'v: 86', 'w: 87', 'x: 88', 'y: 89', 'z: 90', '0: 16', '1: 17', '2: 18', '3: 19', '4: 20', '5: 21', '6: 22', '7: 23', '8: 24', '9: 25', '-: 13', ',: 12', ';: 27', '.: 14', '!: 1', '?: 31', ':: 26', "': 7", '": 2', '/: 15', '\\: 60', '|: 92', '_: 63', '@: 32', '#: 3', '$: 4', '%: 5', '^: 62', '&: 6', '*: 10', '~: 94', '`: 64', '+: 11', '-: 13', '=: 29', '<: 28', '>: 30', '(: 8', '): 9', '[: 59', ']: 61', '{: 91', '}: 93', ' : 0']


Creating one-hot vector from the 70 most common characters:
[[0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 ...
 [0 0 0 ... 1 0 0]
 [0 0 0 ... 0 0 1]
 [1 0 0 ... 0 0 0]]
one-hot vector shape: (70, 94)


## II. Encoding a Sentence

We need to convert sentences to a tensor of shape `(sentence_length, max_word_length, alphabet_length)`. 

PS: We want `max_word_length` to be static for every sentences in our batch, otherwise we won't be able to feed it to our tensorflow model, because, in our case, we will need a fixed number of weights.

In [0]:
!cd /content/drive/My\ Drive/WORKSHOP
from nltk.tokenize import word_tokenize
import nltk

nltk.download('punkt')

# word_tokenize('Hello, how are you?') -> ['hello', 'how', 'are', 'you', '?']

TRAIN_SET = '/content/drive/My Drive/WORKSHOP/CharLSTM/datasets/train_set.csv'

# We set the maximal number of characters per word to 16
max_word_length = 16

with open(TRAIN_SET, 'r') as f:
    lines = f.readlines()
    
    # print first line
    print(lines[1])
    
    # Each lines look like: (label, sentence)
    sentence = lines[1].split(',')[1]
    
    # word_tokenize converts a sentence to a list:
    # 'hello, there' -> ['hello', 'there']
    sentence = word_tokenize(sentence)
    
    # We store the embedding in sentence_emb
    sentence_emb = np.zeros(shape=(len(sentence), max_word_length, alphabet_length))
    print(sentence)


    for i, word in enumerate(sentence):

        # Transform word to list: 'hi' -> ['h', 'i']
        word_list = np.array(list(word))

        # Get the index of each character in our dictionary: ['h', 'i'] -> [72, 73]
        word_ints = np.frombuffer(word_list, dtype=np.uint8) - 32

        # Transform word to one-hot vector of shape (word_length, alphabet_size)
        word_emb = (word_ints[:, None] == np.arange(alphabet_length)).astype(int)[:max_word_length]

        # Add the embedded word to the embedded sentence tensor of shape (sentence_length, max_word_length, alphabet_size)
        # sentence_emb[0] = [0, 0, 0, 0, 0, ..., 0] (shape=(16, 94)) word_emb = [0, 0, 0, 1, ..., 0] (shape=(word_len, 94))
        # where word_length <= 16
        sentence_emb[i, 0:len(word_list), :] = word_emb
        
    print(sentence_emb[0, 0])
    print(sentence_emb.shape)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
0,@Qdakid718 I want to come home....  

['@', 'Qdakid718', 'I', 'want', 'to', 'come', 'home', '...', '.']
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
(9, 16, 94)


# Creating the Data Reader

## I. One-Hot Encoding for Sentences 

This code is the same thing as above but in a function

In [0]:
alphabet_length = 94

def encode_one_hot(sentence, max_word_length=16):
    sentence = word_tokenize(sentence)
    sentence_emb = np.zeros(shape=(len(sentence), max_word_length, alphabet_length))

    for i, word in enumerate(sentence):

        # Transform word to list: 'hi' -> ['h', 'i']
        word_list = np.array(list(word))

        # Get the index of each character in our dictionary: ['h', 'i'] -> [72, 73]
        word_ints = np.frombuffer(word_list, dtype=np.uint8) - 32

        # Transform word to one-hot vector of shape (word_length, alphabet_size)
        word_emb = (word_ints[:, None] == np.arange(alphabet_length)).astype(int)[:max_word_length]

        # Add the embedded word to the embedded sentence tensor of shape (sentence_length, max_word_length, alphabet_size)
        sentence_emb[i, 0:len(word_list), :] = word_emb
    
    return sentence_emb, len(sentence)

## II. Creating Minibatches

`make_minibatch` take as inputs the raw sentences from our dataset. It calls `encode_one_hot` to encode them into one-hot vectors and then insert them in a tensor of shape `(minibatch_size, max_sentence_length, max_word_length, alphabet_length)`.

In [0]:
def make_minibatch(sentences, minibatch_size=64, max_word_length=16):
    
    assert len(sentences) == minibatch_size
    
    # binary_cross_entropy takes 2 inputs y_true (labels) of shape (minibatch_size, #_classes (2))
    # and y_hat (predictions) of shape (minibatch_size, #_classes (2))
    minibatch_y = np.zeros(shape=(minibatch_size, 2))
    x = []
    sentence_lengths = []
    
    for i, sentence in enumerate(sentences):
        line = sentence.split(',')
        label, s = line[0], line[1]
        
        # '0' -> negative '4' -> positive we convert these numbers to one-hot vectors
        # for tensorflow's binary_cross_entropy method
        minibatch_y[i, :] = np.array([0, 1]) if sentence[:1] == '0' else np.array([1, 0])
        res = encode_one_hot(s)
        x.append(res[0])
        sentence_lengths.append(res[1])
    
    
    # Create the minibatch for the sentences
    max_sentence_length = np.max(sentence_lengths)
    minibatch_x = np.zeros(shape=(minibatch_size, max_sentence_length, max_word_length, alphabet_length))
    for i, emb_s in enumerate(x):
        minibatch_x[i, 0:len(emb_s)] = emb_s

    return minibatch_x, minibatch_y

In [0]:
# Testing
sentences = lines[:64]
m_x, m_y = make_minibatch(sentences)
print(m_x.shape)

(64, 32, 16, 94)


III. Building the Iterator
The iterator function let's you loop through your dataset and creates the minibatch that you will feed to your model during training.

In [0]:
def iterate_minibatch(dataset=TRAIN_SET, minibatch_size=64, max_word_length=16):
    with open(dataset, 'r') as f:
        lines = f.readlines()
        n_batch = int(len(lines) // minibatch_size)
        
        for i in range(n_batch):
            
            sentences = lines[i * minibatch_size: i * minibatch_size + minibatch_size]
            m_x, m_y = make_minibatch(sentences, minibatch_size=minibatch_size,
                                     max_word_length=max_word_length)
            yield m_x, m_y

In [0]:
# Testing
test = next(iterate_minibatch())
print(test[0].shape)
print(test[1].shape)

(64, 32, 16, 94)
(64, 2)


# Building the Model


![](https://drive.google.com/uc?id=14QbvpXnYAhZ_DqWXAHbF3Y7YlJltc1hy)



## I. Graph Inputs

Tensorflow take as inputs numpy arrays, every time you want to evaluate a tensorflow variable you first need to call a session to compile your graph and then feed them your numpy array.


In [0]:
import tensorflow as tf

max_word_length = 16
alphabet_length = 94
minibatch_size = 64

# Placeholder for the embedded sentences shape: (minibatch_size, max_sentence_length, max_word_length, alphabet_length)
# Remember that max_sentence_length changes with each minibatch
x = tf.placeholder('float32', shape=[None, None, max_word_length, alphabet_length], name='X')

# Placeholder for the embedded labels (0, 1) -> negative (1, 0) -> positive
y = tf.placeholder('float32', shape=[None, 2], name='Y')

In [0]:
# Testing
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    x_, y_ = sess.run([x, y], feed_dict={x: test[0], y: test[1]})
    print(x_.shape)
    print(y_.shape)

(64, 32, 16, 94)
(64, 2)


## II. Time-Delay Neural Network

The CharCNN model is composed of multiple convolutional layers with different kernel widths (in this implementation, I use 25 x kernels of size 1, 50 x kernels of size 2, …, 175 x kernels of size 7 - height is always the same as the length of our character-alphabet). Each of them take as input ONE word of the sentence at a time (the CharCNN is used as an embedding of the words that will then be fed to the LSTM where each time step in the LSTM is a word of the sentence).

After the convolution, we do a max pooling operation for every kernels over the resulting width of the convolution, this operation acts as a sort of arrangement of the most important features of the word for every n-grams. For example, a kernel with a width of 4 might have learned to detect repetitive characters and would “fire” when it sees “oooo” in “helloooo!”.

You can learn more about this model [here](https://arxiv.org/pdf/1508.06615.pdf).

![](https://drive.google.com/uc?id=1IpdS5nAw6_2_Z3UXMnPbhs_EwEzzi0yI)

In [0]:
def conv2d(input_, output_dim, k_h, k_w, name="conv2d"):
    with tf.variable_scope(name):

        w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim])
        b = tf.get_variable('b', [output_dim])

    return tf.nn.conv2d(input_, w, strides=[1, 1, 1, 1], padding='VALID') + b

kernels=[1, 2, 3, 4, 5, 6, 7]
kernel_features=[25, 50, 75, 100, 125, 150, 175]

def tdnn(input_, kernels, kernel_features, scope='TDNN'):
    """ Time Delay Neural Network
    :input:           input float tensor of shape [(batch_size*num_unroll_steps) x max_word_length x embed_size]
    :kernels:         array of kernel sizes
    :kernel_features: array of kernel feature sizes (parallel to kernels)
    """
    assert len(kernels) == len(kernel_features), 'Kernel and Features must have the same size'

    # input_ is a np.array of shape ('b', 'sentence_length', 'max_word_length', 'alphabet_length') we
    # need to convert it to shape ('b * sentence_length', 1, 'max_word_length', 'embed_size') to
    # use conv2D
    input_ = tf.reshape(input_, [-1, max_word_length, alphabet_length])
    input_ = tf.expand_dims(input_, 1)

    layers = []
    with tf.variable_scope(scope):
        for kernel_size, kernel_feature_size in zip(kernels, kernel_features):
            reduced_length = max_word_length - kernel_size + 1

            # [batch_size * sentence_length x max_word_length x embed_size x kernel_feature_size]
            conv = conv2d(input_, kernel_feature_size, 1, kernel_size, name="kernel_%d" % kernel_size)

            # [batch_size * sentence_length x 1 x 1 x kernel_feature_size]
            pool = tf.nn.max_pool(tf.tanh(conv), [1, 1, reduced_length, 1], [1, 1, 1, 1], 'VALID')

            layers.append(tf.squeeze(pool, [1, 2]))

        if len(kernels) > 1:
            output = tf.concat(layers, 1)
        else:
            output = layers[0]

    return output


# TDNN outputs a tensor of shape [minibatch_size * max_sentence_length, n_kernels]
tdnn_output = tdnn(x, kernels=kernels, kernel_features=kernel_features)

In [0]:
# Testing
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    t = sess.run([tdnn_output], feed_dict={x: test[0], y: test[1]})
    print(t[0].shape)
    print(test[0].shape[0] * test[0].shape[1])

(2048, 700)
2048


## III. Highway Network

To understand highway networks it's important to understand Residual Blocks: Residual Blocks are basic MLPS with a skip connection; Basically, we add a linear and a non-linear network together, they're very popular because they help the gradient propagate easily during training. You can read more about them [here](https://blog.waya.ai/deep-residual-learning-9610bb62c355).

The Highway Networks are an extension of Residual Blocks where we simply add a "forget gate" parameter that controls the 2 networks. 

Here's how they look like in math:

```
# MLP
y = f(Wx + b)

# Residual Block
y = f(Wx + b) + Wx + b

# Highway Network
y = t * f(Wx + b) + (1-t) * (Wx + b) t in [0, 1]

```

![](https://drive.google.com/uc?id=1i7mTIJ6MdxmwB3p0B5xYuysMFqOOIUrr)

In [0]:
def linear(input_, output_size, scope=None):
    shape = input_.get_shape().as_list()
    if len(shape) != 2:
        raise ValueError("Linear is expecting 2D arguments: %s" % str(shape))
    if not shape[1]:
        raise ValueError("Linear expects shape[1] of arguments: %s" % str(shape))
    input_size = shape[1]

    with tf.variable_scope(scope or "SimpleLinear"):
        matrix = tf.get_variable("Matrix", [output_size, input_size], dtype=input_.dtype)
        bias_term = tf.get_variable("Bias", [output_size], dtype=input_.dtype)

    return tf.matmul(input_, tf.transpose(matrix)) + bias_term


n_kernels = 700

def highway(input_, size, num_layers=1, bias=-2.0, f=tf.nn.relu, scope='Highway'):
    """Highway Network (cf. http://arxiv.org/abs/1505.00387).
    t = sigmoid(Wy + b)
    z = t * g(Wy + b) + (1 - t) * y
    where g is a nonlinearity, t is the transform gate, and (1 - t) is the carry gate.
    """

    with tf.variable_scope(scope):
        for idx in range(num_layers):
            g = f(linear(input_, size, scope='highway_lin_%d' % idx))

            t = tf.sigmoid(linear(input_, size, scope='highway_gate_%d' % idx) + bias)

            output = t * g + (1. - t) * input_
            input_ = output

    return output

hw_network = highway(tdnn_output, n_kernels)

In [0]:
# Testing
# The output of our highway network is basically the embedding of the words in our sentence
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    t = sess.run([hw_network], feed_dict={x: test[0], y: test[1]})
    print(t[0].shape)

(2048, 700)


## IV. Long Short-Term Memory Network

Now that we have our embedding, we can feed each word in our LSTM

![](https://drive.google.com/uc?id=1iyk1h6nnr-nLGwpNhmAiSoFR8oEI7mej)

In [0]:
from tensorflow.contrib import rnn


# tdnn() returns a tensor of shape [batch_size * sentence_length, kernel_features]
# highway() returns a tensor of shape [batch_size * sentence_length, size] to use
# tensorflow dynamic_rnn module we need to reshape it to [batch_size, sentence_length, size]

minibatch_size = 64

# Number of neurons in the LSTM layer
rnn_size = 650

lstm_input = tf.reshape(hw_network, [minibatch_size, -1, n_kernels])
cell = rnn.BasicLSTMCell(rnn_size, state_is_tuple=True, forget_bias=0.0, reuse=False)
initial_rnn_state = cell.zero_state(minibatch_size, dtype='float32')
outputs, final_rnn_state = tf.nn.dynamic_rnn(cell, lstm_input,
                                             initial_state=initial_rnn_state,
                                             dtype=tf.float32)

# In this implementation, we only care about the last outputs of the RNN
# i.e. the output at the end of the sentence
outputs = tf.transpose(outputs, [1, 0, 2])
last = outputs[-1]

In [0]:
# Testing
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    lstm_input_, outputs_, last_ = sess.run([lstm_input, outputs, last],
                                            feed_dict={x: test[0], y: test[1]})
    
    # lstm_input shape = (minibatch_size, max_sentence_length, n_kernels)
    print(lstm_input_.shape)
    
    # outputs shape = (max_sentence_length / # steps for the lstm, minibatch_size, rnn_size)
    print(outputs_.shape)
    
    # If we take only the last output
    # last shape = (minibatch_size, rnn_size) 
    print(last_.shape)

(64, 32, 700)
(32, 64, 650)
(64, 650)


## V. Prediction

Finally, our prediction is simply an MLP with a softmax output (output is between [0, 1] because we want the probability that a Tweet is positive/negative)

In [0]:
def softmax(input_, out_dim, scope=None):
    with tf.variable_scope(scope or 'softmax'):
        W = tf.get_variable('W', [input_.get_shape()[1], out_dim])
        b = tf.get_variable('b', [out_dim])

    return tf.nn.softmax(tf.matmul(input_, W) + b)

# Finally, our prediction is simply an MLP with a softmax output (output is between [0, 1] 
# because we want the probability that a Tweet is positive/negative)
prediction = softmax(last, 2)

In [0]:
# Testing
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    p_ = sess.run([prediction], feed_dict={x: test[0], y: test[1]})
    
    # prediction shape = (minibatch_size, # classes)
    print(p_[0].shape)

(64, 2)


# Training the Model

## I. Defining our Loss Function

In [0]:
# loss is a float
loss = - tf.reduce_sum(y * tf.log(tf.clip_by_value(prediction, 1e-10, 1.0)))

# Computing the accuracy:
predictions = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
acc = tf.reduce_mean(tf.cast(predictions, 'float32'))

In [0]:
# Testing
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    l_, a_, preds_, p_ = sess.run([loss, acc, predictions, prediction],
                                  feed_dict={x: test[0], y: test[1]})
    
    # prediction shape = (minibatch_size, # classes)
    print('loss: {}'.format(l_))
    print('accuracy: {}'.format(a_))
    for i, s in enumerate(sentences[:5]):
        print('Sentence: "{}" -- prediction: {}'.format(s, p_[i]))

loss: 47.6570968628
accuracy: 0.40625
Sentence: "4,@siltoso hi girl sdo u remember me?? we meet us in team matt's chat .. jiji.. 
" -- prediction: [0.5915477 0.4084522]
Sentence: "0,@Qdakid718 I want to come home....  
" -- prediction: [0.59157836 0.40842173]
Sentence: "4,@NikiScherzinger This will be awesome for sure. I still have the Eden's Crush cd!! 
" -- prediction: [0.5915785  0.40842146]
Sentence: "4,@BBLucia alrighty.  who else is going?
" -- prediction: [0.59157836 0.40842164]
Sentence: "0,I miss Vancouver already  so many fun things to do!
" -- prediction: [0.5915785  0.40842152]


## II. Defining the Optimizer

In [0]:
learning_rate = 0.0001

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

## III. Training Loop

In [0]:
TEST_SET = '/content/drive/My Drive/WORKSHOP/CharLSTM/datasets/test_set.csv'
VALID_SET = '/content/drive/My Drive/WORKSHOP/CharLSTM/datasets/valid_set.csv'
TRAIN_SET = '/content/drive/My Drive/WORKSHOP/CharLSTM/datasets/train_set.csv'
LOGGING_PATH = 'log.txt'

n_epochs = 100
n_batch = 23695

# Variable to save our model
saver = tf.train.Saver()

# Parameter for early stopping
initial_patience = 100000
patience = initial_patience

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    best_acc = 0.0
    epoch = 0
    while epoch <= n_epochs:
        
        total_loss = 0.0
        batch = 1
        epoch += 1
        
        for minibatch in iterate_minibatch(dataset=TRAIN_SET):
            batch_x, batch_y = minibatch
            _, l, a = sess.run([optimizer, loss, acc], feed_dict={x: batch_x, y: batch_y})
            total_loss += l
            
            if batch % 100 == 0:
                # Compute Accuracy on the Training set and print some info
                print('Epoch: %5d/%5d -- batch: %5d/%5d -- Loss: %.4f -- Train Accuracy: %.4f' %
                      (epoch, n_epochs, batch, n_batch, total_loss/batch, a))
                # Write loss and accuracy to some file
                log = open(LOGGING_PATH, 'a')
                log.write('%s, %6d, %.5f, %.5f \n' % ('train', epoch * batch, total_loss/batch, a))
                log.close()
                
            # Compute Accuracy on the Validation set, check if validation has improved,
            # save model, etc
            if batch % 50 == 0:
                accuracy = []
                
                # Validation set is very large, so accuracy is computed on testing set
                # instead of valid set, change TEST_SET to VALID_SET to compute accuracy on valid set
                for mb in iterate_minibatch(dataset=TEST_SET):
                    valid_x, valid_y = mb
                    a = sess.run([acc], feed_dict={x: valid_x, y: valid_y})
                    accuracy.append(a[0])
                mean_acc = np.mean(accuracy)
                
                # if accuracy has improved, save model and boost patience
                if mean_acc > best_acc:
                    best_acc = mean_acc
                    save_path = saver.save(sess, SAVE_PATH)
                    patience = initial_patience
                    print('Model saved in file: %s' % save_path)
                # else reduce patience and break loop if necessary
                else:
                    patience -= 500
                    if patience <= 0:
                        DONE = True
                        break

                print('Epoch: %5d/%5d -- batch: %5d/%5d -- Valid Accuracy: %.4f' %
                     (epoch, n_epochs, batch, n_batch, mean_acc))

                # Write validation accuracy to log file
                log = open(LOGGING_PATH, 'a')
                log.write('%s, %6d, %.5f \n' % ('valid', epoch * batch, mean_acc))
                log.close()
            batch += 1

KeyboardInterrupt: ignored