### Basé sur: A ten-minute introduction to sequence-to-sequence learning in Keras [ten_minute]


[ten_minute]: https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html

#### 1. Trivial [addnum]

[trivial_addnum]: https://github.com/keras-team/keras/blob/master/examples/addition_rnn.py


In [2]:
# -*- coding: utf-8 -*-
'''An implementation of sequence to sequence learning for performing addition
Input: "535+61"
Output: "596"
Padding is handled by using a repeated sentinel character (space)
Input may optionally be reversed, shown to increase performance in many tasks in:
"Learning to Execute"
http://arxiv.org/abs/1410.4615
and
"Sequence to Sequence Learning with Neural Networks"
http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf
Theoretically it introduces shorter term dependencies between source and target.
Two digits reversed:
+ One layer LSTM (128 HN), 5k training examples = 99% train/test accuracy in 55 epochs
Three digits reversed:
+ One layer LSTM (128 HN), 50k training examples = 99% train/test accuracy in 100 epochs
Four digits reversed:
+ One layer LSTM (128 HN), 400k training examples = 99% train/test accuracy in 20 epochs
Five digits reversed:
+ One layer LSTM (128 HN), 550k training examples = 99% train/test accuracy in 30 epochs
'''  # noqa

from __future__ import print_function
from keras.models import Sequential
from keras import layers
import numpy as np
from six.moves import range


class CharacterTable(object):
    """Given a set of characters:
    + Encode them to a one-hot integer representation
    + Decode the one-hot or integer representation to their character output
    + Decode a vector of probabilities to their character output
    """
    def __init__(self, chars):
        """Initialize character table.
        # Arguments
            chars: Characters that can appear in the input.
        """
        self.chars = sorted(set(chars))
        self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
        self.indices_char = dict((i, c) for i, c in enumerate(self.chars))

    def encode(self, C, num_rows):
        """One-hot encode given string C.
        # Arguments
            C: string, to be encoded.
            num_rows: Number of rows in the returned one-hot encoding. This is
                used to keep the # of rows for each data the same.
        """
        x = np.zeros((num_rows, len(self.chars)))
        for i, c in enumerate(C):
            x[i, self.char_indices[c]] = 1
        return x

    def decode(self, x, calc_argmax=True):
        """Decode the given vector or 2D array to their character output.
        # Arguments
            x: A vector or a 2D array of probabilities or one-hot representations;
                or a vector of character indices (used with `calc_argmax=False`).
            calc_argmax: Whether to find the character index with maximum
                probability, defaults to `True`.
        """
        if calc_argmax:
            x = x.argmax(axis=-1)
        return ''.join(self.indices_char[x] for x in x)


class colors:
    ok = '\033[92m'
    fail = '\033[91m'
    close = '\033[0m'


Using TensorFlow backend.


In [3]:
# Parameters for the model and dataset.
TRAINING_SIZE = 50000
DIGITS = 3
REVERSE = True

# Maximum length of input is 'int + int' (e.g., '345+678'). Maximum length of
# int is DIGITS.
MAXLEN = DIGITS + 1 + DIGITS

# All the numbers, plus sign and space for padding.
chars = '0123456789+ '
ctable = CharacterTable(chars)

questions = []
expected = []
seen = set()
print('Generating data...')
while len(questions) < TRAINING_SIZE:
    f = lambda: int(''.join(np.random.choice(list('0123456789'))
                    for i in range(np.random.randint(1, DIGITS + 1))))
    a, b = f(), f()
    # Skip any addition questions we've already seen
    # Also skip any such that x+Y == Y+x (hence the sorting).
    key = tuple(sorted((a, b)))
    if key in seen:
        continue
    seen.add(key)
    # Pad the data with spaces such that it is always MAXLEN.
    q = '{}+{}'.format(a, b)
    query = q + ' ' * (MAXLEN - len(q))
    ans = str(a + b)
    # Answers can be of maximum size DIGITS + 1.
    ans += ' ' * (DIGITS + 1 - len(ans))
    if REVERSE:
        # Reverse the query, e.g., '12+345  ' becomes '  543+21'. (Note the
        # space used for padding.)
        query = query[::-1]
    questions.append(query)
    expected.append(ans)
print('Total addition questions:', len(questions))


Generating data...
Total addition questions: 50000


In [4]:
# questions

In [5]:
print('Vectorization...')
x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=np.bool)
y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=np.bool)
for i, sentence in enumerate(questions):
    x[i] = ctable.encode(sentence, MAXLEN)
for i, sentence in enumerate(expected):
    y[i] = ctable.encode(sentence, DIGITS + 1)

# Shuffle (x, y) in unison as the later parts of x will almost all be larger
# digits.
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]

Vectorization...


In [6]:
# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]

print('Training Data:')
print(x_train.shape)
print(y_train.shape)

print('Validation Data:')
print(x_val.shape)
print(y_val.shape)


Training Data:
(45000, 7, 12)
(45000, 4, 12)
Validation Data:
(5000, 7, 12)
(5000, 4, 12)


In [7]:
# Try replacing GRU, or SimpleRNN.
RNN = layers.LSTM
HIDDEN_SIZE = 128
BATCH_SIZE = 128
LAYERS = 1


In [8]:
print('Build model...')
model = Sequential()
# "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE.
# Note: In a situation where your input sequences have a variable length,
# use input_shape=(None, num_feature).
model.add(RNN(HIDDEN_SIZE, input_shape=(MAXLEN, len(chars))))
# As the decoder RNN's input, repeatedly provide with the last output of
# RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
# length of output, e.g., when DIGITS=3, max output is 999+999=1998.
model.add(layers.RepeatVector(DIGITS + 1))
# The decoder RNN could be multiple layers stacked or a single layer.
for _ in range(LAYERS):
    # By setting return_sequences to True, return not only the last output but
    # all the outputs so far in the form of (num_samples, timesteps,
    # output_dim). This is necessary as TimeDistributed in the below expects
    # the first dimension to be the timesteps.
    model.add(RNN(HIDDEN_SIZE, return_sequences=True))

Build model...


In [9]:
len(chars)

12

In [10]:
layers

<module 'keras.layers' from '/Users/jmpoulin/miniconda3/lib/python3.6/site-packages/keras/layers/__init__.py'>

In [11]:
# Apply a dense layer to the every temporal slice of an input. For each of step
# of the output sequence, decide which character should be chosen.
model.add(layers.TimeDistributed(layers.Dense(len(chars), activation='softmax')))
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 128)               72192     
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 4, 128)            0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 4, 128)            131584    
_________________________________________________________________
time_distributed_1 (TimeDist (None, 4, 12)             1548      
Total params: 205,324
Trainable params: 205,324
Non-trainable params: 0
_________________________________________________________________


In [13]:
# As per:
# https://stackoverflow.com/questions/53014306/error-15-initializing-libiomp5-dylib-but-found-libiomp5-dylib-already-initial

import os

os.environ['KMP_DUPLICATE_LIB_OK']='True'

In [14]:
# Train the model each generation and show predictions against the validation
# dataset.
for iteration in range(1, 200):
    print()
    print('-' * 50)
    print('Iteration', iteration)
    model.fit(x_train, y_train,
              batch_size=BATCH_SIZE,
              epochs=1,
              validation_data=(x_val, y_val))
    # Select 10 samples from the validation set at random so we can visualize
    # errors.
    for i in range(10):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
        preds = model.predict_classes(rowx, verbose=0)
        q = ctable.decode(rowx[0])
        correct = ctable.decode(rowy[0])
        guess = ctable.decode(preds[0], calc_argmax=False)
        print('Q', q[::-1] if REVERSE else q, end=' ')
        print('T', correct, end=' ')
        if correct == guess:
            print(colors.ok + '☑' + colors.close, end=' ')
        else:
            print(colors.fail + '☒' + colors.close, end=' ')
        print(guess)


--------------------------------------------------
Iteration 1
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 806+8   T 814  [91m☒[0m 108 
Q 716+738 T 1454 [91m☒[0m 107 
Q 38+60   T 98   [91m☒[0m 471 
Q 61+856  T 917  [91m☒[0m 107 
Q 53+205  T 258  [91m☒[0m 533 
Q 186+920 T 1106 [91m☒[0m 108 
Q 78+531  T 609  [91m☒[0m 107 
Q 282+722 T 1004 [91m☒[0m 103 
Q 584+69  T 653  [91m☒[0m 100 
Q 3+813   T 816  [91m☒[0m 47  

--------------------------------------------------
Iteration 2
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 201+988 T 1189 [91m☒[0m 102 
Q 577+493 T 1070 [91m☒[0m 1358
Q 32+39   T 71   [91m☒[0m 232 
Q 26+961  T 987  [91m☒[0m 702 
Q 56+628  T 684  [91m☒[0m 776 
Q 454+15  T 469  [91m☒[0m 555 
Q 37+778  T 815  [91m☒[0m 808 
Q 492+557 T 1049 [91m☒[0m 102 
Q 772+923 T 1695 [91m☒[0m 1388
Q 67+92   T 159  [91m☒[0m 108 

--------------------------------------------------
Iteration 3
Train on 45000 samples, valida

Q 62+350  T 412  [92m☑[0m 412 
Q 56+622  T 678  [92m☑[0m 678 
Q 453+130 T 583  [92m☑[0m 583 
Q 58+412  T 470  [92m☑[0m 470 
Q 800+32  T 832  [92m☑[0m 832 
Q 52+85   T 137  [92m☑[0m 137 
Q 294+69  T 363  [92m☑[0m 363 
Q 451+936 T 1387 [92m☑[0m 1387
Q 481+452 T 933  [92m☑[0m 933 
Q 591+85  T 676  [92m☑[0m 676 

--------------------------------------------------
Iteration 16
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 108+60  T 168  [92m☑[0m 168 
Q 865+248 T 1113 [92m☑[0m 1113
Q 42+53   T 95   [91m☒[0m 96  
Q 96+167  T 263  [92m☑[0m 263 
Q 79+44   T 123  [92m☑[0m 123 
Q 511+43  T 554  [92m☑[0m 554 
Q 149+403 T 552  [92m☑[0m 552 
Q 307+36  T 343  [92m☑[0m 343 
Q 149+89  T 238  [92m☑[0m 238 
Q 232+71  T 303  [92m☑[0m 303 

--------------------------------------------------
Iteration 17
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 650+2   T 652  [92m☑[0m 652 
Q 53+205  T 258  [92m☑[0m 258 
Q 461+47  T 508  [92m☑[0

Q 19+658  T 677  [92m☑[0m 677 
Q 820+0   T 820  [92m☑[0m 820 
Q 0+67    T 67   [92m☑[0m 67  
Q 965+909 T 1874 [92m☑[0m 1874
Q 49+639  T 688  [92m☑[0m 688 
Q 80+176  T 256  [92m☑[0m 256 
Q 875+228 T 1103 [92m☑[0m 1103
Q 936+32  T 968  [92m☑[0m 968 
Q 918+35  T 953  [92m☑[0m 953 
Q 8+309   T 317  [92m☑[0m 317 

--------------------------------------------------
Iteration 30
Train on 45000 samples, validate on 5000 samples
Epoch 1/1
Q 1+696   T 697  [92m☑[0m 697 
Q 133+65  T 198  [92m☑[0m 198 
Q 55+938  T 993  [92m☑[0m 993 
Q 309+60  T 369  [92m☑[0m 369 
Q 12+58   T 70   [92m☑[0m 70  
Q 952+65  T 1017 [92m☑[0m 1017
Q 466+528 T 994  [92m☑[0m 994 
Q 426+719 T 1145 [92m☑[0m 1145
Q 34+120  T 154  [92m☑[0m 154 
Q 22+28   T 50   [92m☑[0m 50  

--------------------------------------------------
Iteration 31
Train on 45000 samples, validate on 5000 samples
Epoch 1/1

KeyboardInterrupt: 

2018-11-20 14:25:11.915648: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX  
2018-11-20 14:25:11.915971: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.  
OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.  
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.


https://stackoverflow.com/questions/53014306/error-15-initializing-libiomp5-dylib-but-found-libiomp5-dylib-already-initial

https://github.com/dmlc/xgboost/issues/1715