## RNN in ``tf`` 

LSTM model in ``tf``:

<center>
<img src="https://github.com/DataScienceUB/DeepLearningfromScratch2018/blob/master/images/tf-lstm.png?raw=true![image.png](attachment:image.png)" alt="" style="width: 600px;"/> 
Source: https://cloud.google.com/blog/big-data/2017/01/learn-tensorflow-and-deep-learning-without-a-phd
</center>

GRU model in ``tf``:

<center>
<img src="https://github.com/DataScienceUB/DeepLearningfromScratch2018/blob/master/images/tf-gru.png?raw=true" alt="" style="width: 600px;"/> 
Source: https://cloud.google.com/blog/big-data/2017/01/learn-tensorflow-and-deep-learning-without-a-phd
<center>

## Learning to add in ``tf``

Source: http://projects.rajivshah.com/blog/2016/04/05/rnn_addition/ 

The objective of this code developed by Rajiv Shah is to train a RNN for adding a sequence of integers. In this case we must process all the sequence in order to produce a result. We will use the ``tf`` implementation of the ``seq2seq`` architecure, originally proposed by Sutskever, Vinyals and Le in 2014. 

The architecture diagram from their paper is:

<center>
<img src="https://github.com/DataScienceUB/DeepLearningfromScratch2018/blob/master/images/seq2seq.png?raw=true" alt="" style="width: 800px;"/> 
Source: https://arxiv.org/abs/1409.3215
<center>

Rectangles are recurrent layers. Encoder receives ``[A, B, C]`` sequence as inputs. We don't care about encoder outputs, only about the hidden state it accumulates while reading the sequence. 

After input sequence ends, encoder passes its final state to decoder, which receives ``[<EOS>, W, X, Y, Z]`` and is trained to output ``[W, X, Y, Z, <EOS>]``. 

``tf`` implementation of the ``seq2seq`` allows sequences of different lenghts.

Sequence-to-sequence, or *Seq2Seq*, is a relatively new paradigm made up of two recurrent networks: an *encoder* which takes the model's input sequence as input and encodes it into a fixed-size *context vector*, and a *decoder*, which uses the context vector from above as a *seed* from which to generate an output sequence.

For this reason, Seq2Seq models are often referred to as *encoder-decoder* models.

The encoder network's job is to read the input sequence and generate a fixed-dimensional context vector $C$ for the sequence. 

To do so, the encoder will use a recurrent neural network cell -- usually an LSTM -- to read the input tokens one at a time. The final hidden state of the cell will then become $C$. 

However, because it's so difficult to compress an arbitrary-length sequence into a single fixed-size vector, the encoder will usually consist of *stacked* LSTMs. The *final* layer's LSTM hidden state will be used as $C$.

The decoder is also an LSTM network, but its usage is a little more complex than the encoder network. Essentially, we'd like to use it as a language model that's "aware" of the words that it's generated so far *and* of the input. 

To that end, we'll keep the "stacked" LSTM architecture from the encoder, but we'll initialize the hidden state of our first layer with the context vector from above; the decoder will literally use the context of the input to generate an output.

When both input sequences and output sequences have the same length, you can implement such models simply with a Keras LSTM or GRU layer (or stack thereof).

One caveat of this approach is that it assumes that it is possible to generate ``target[...t]`` given ``input[...t]``. That works in some cases (e.g. adding strings of digits) but does not work for most use cases. In the general case, information about the entire input sequence is necessary in order to start generating the target sequence.


In [12]:
'''An implementation of sequence to sequence learning for performing addition
Input: "535+61 "
Output:"596 "
Padding is handled by using a repeated sentinel character (space)
'''

from __future__ import print_function
from keras.models import Sequential
from keras import layers
import numpy as np
from six.moves import range


class CharacterTable(object):
    """Given a set of characters:
    + Encode them to a one hot integer representation
    + Decode the one hot integer representation to their character output
    + Decode a vector of probabilities to their character output
    """
    def __init__(self, chars):
        """Initialize character table.
        # Arguments
            chars: Characters that can appear in the input.
        """
        self.chars = sorted(set(chars))
        self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
        self.indices_char = dict((i, c) for i, c in enumerate(self.chars))

    def encode(self, C, num_rows):
        """One hot encode given string C.
        # Arguments
            num_rows: Number of rows in the returned one hot encoding. This is
                used to keep the # of rows for each data the same.
        """
        x = np.zeros((num_rows, len(self.chars)))
        for i, c in enumerate(C):
            x[i, self.char_indices[c]] = 1
        return x

    def decode(self, x, calc_argmax=True):
        if calc_argmax:
            x = x.argmax(axis=-1)
        return ''.join(self.indices_char[x] for x in x)


class colors:
    ok = '\033[92m'
    fail = '\033[91m'
    close = '\033[0m'

# Parameters for the model and dataset.
TRAINING_SIZE = 50000
DIGITS = 3
REVERSE = True

# Maximum length of input is 'int + int' (e.g., '345+678'). Maximum length of
# int is DIGITS.
MAXLEN = DIGITS + 1 + DIGITS

# All the numbers, plus sign and space for padding.
chars = '0123456789+ '
ctable = CharacterTable(chars)

questions = []
expected = []
seen = set()
print('Generating data...')
while len(questions) < TRAINING_SIZE:
    f = lambda: int(''.join(np.random.choice(list('0123456789'))
                    for i in range(np.random.randint(1, DIGITS + 1))))
    a, b = f(), f()
    # Skip any addition questions we've already seen
    # Also skip any such that x+Y == Y+x (hence the sorting).
    key = tuple(sorted((a, b)))
    if key in seen:
        continue
    seen.add(key)
    # Pad the data with spaces such that it is always MAXLEN.
    q = '{}+{}'.format(a, b)
    query = q + ' ' * (MAXLEN - len(q))
    ans = str(a + b)
    # Answers can be of maximum size DIGITS + 1.
    ans += ' ' * (DIGITS + 1 - len(ans))
    questions.append(query)
    expected.append(ans)
print('Total addition questions:', len(questions))
print(questions[3], expected[3])

Generating data...
Total addition questions: 50000
642+1   643 


In [13]:
print('Vectorization...')
x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=np.bool)
y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=np.bool)
for i, sentence in enumerate(questions):
    x[i] = ctable.encode(sentence, MAXLEN)
for i, sentence in enumerate(expected):
    y[i] = ctable.encode(sentence, DIGITS + 1)

# Shuffle (x, y) 
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]

# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]

print('Training Data:')
print(x_train.shape)
print(y_train.shape)

print('Validation Data:')
print(x_val.shape)
print(y_val.shape)

# Try replacing GRU, or SimpleRNN.
RNN = layers.LSTM
HIDDEN_SIZE = 128
BATCH_SIZE = 128
LAYERS = 1

print('Build model...')
model = Sequential()

# "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE.
# Note: In a situation where your input sequences have a variable length,
# use input_shape=(None, num_feature).
# Returns the last output in the output sequence
model.add(RNN(HIDDEN_SIZE, input_shape=(MAXLEN, len(chars))))

# As the decoder RNN's input, repeatedly provide with the last hidden state of
# RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
# length of output, e.g., when DIGITS=3, max output is 999+999=1998.
model.add(layers.RepeatVector(DIGITS + 1))

# The decoder RNN could be multiple layers stacked or a single layer.
for _ in range(LAYERS):
    # By setting return_sequences to True, return not only the last output but
    # all the outputs so far in the form of (num_samples, timesteps,
    # output_dim). This is necessary as TimeDistributed in the below expects
    # the first dimension to be the timesteps.
    model.add(RNN(HIDDEN_SIZE, return_sequences=True))

# Apply a dense layer to the every temporal slice of an input. For each of step
# of the output sequence, decide which character should be chosen.
model.add(layers.TimeDistributed(layers.Dense(len(chars))))
model.add(layers.Activation('softmax'))
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
model.summary()

# Train the model each generation and show predictions against the validation
# dataset.
for iteration in range(1, 200):
    print()
    print('-' * 50)
    print('Iteration', iteration)
    model.fit(x_train, y_train,
              batch_size=BATCH_SIZE,
              epochs=1,
              validation_data=(x_val, y_val))
    # Select 10 samples from the validation set at random so we can visualize
    # errors.
    for i in range(10):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
        preds = model.predict_classes(rowx, verbose=0)
        q = ctable.decode(rowx[0])
        correct = ctable.decode(rowy[0])
        guess = ctable.decode(preds[0], calc_argmax=False)
        print('Q', q[::-1] if REVERSE else q, end=' ')
        print('T', correct, end=' ')
        if correct == guess:
            print(colors.ok + '☑' + colors.close, end=' ')
        else:
            print(colors.fail + '☒' + colors.close, end=' ')
        print(guess)

Vectorization...
Training Data:
(45000, 7, 12)
(45000, 4, 12)
Validation Data:
(5000, 7, 12)
(5000, 4, 12)
Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_3 (LSTM)                (None, 128)               72192     
_________________________________________________________________
repeat_vector_2 (RepeatVecto (None, 4, 128)            0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 4, 128)            131584    
_________________________________________________________________
time_distributed_2 (TimeDist (None, 4, 12)             1548      
_________________________________________________________________
activation_2 (Activation)    (None, 4, 12)             0         
Total params: 205,324
Trainable params: 205,324
Non-trainable params: 0
_________________________________________________________________

-------------

KeyboardInterrupt: 

## Example

A Recurrent Neural Network (LSTM) implementation example using TensorFlow library.

This example is using the MNIST database of handwritten digits (http://yann.lecun.com/exdb/mnist/)
Long Short Term Memory paper: http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf

Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/

In [14]:
from __future__ import print_function

import tensorflow as tf
from tensorflow.contrib import rnn

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


To classify images using a recurrent neural network, we consider every image row as a sequence of pixels. 

Because MNIST image shape is $28 \times 28$ px, we will then handle 28 sequences of 28 steps for every sample.

In [15]:
# Training Parameters
learning_rate = 0.001
training_steps = 10000
batch_size = 128
display_step = 200

# Network Parameters
num_input = 28 # MNIST data input (img shape: 28*28)
timesteps = 28 # timesteps
num_hidden = 128 # hidden layer num of features
num_classes = 10 # MNIST total classes (0-9 digits)

In [16]:
# tf Graph input
X = tf.placeholder("float", [None, timesteps, num_input])
Y = tf.placeholder("float", [None, num_classes])

# Define weights
weights = {
    'out': tf.Variable(tf.random_normal([num_hidden, num_classes]))
}
biases = {
    'out': tf.Variable(tf.random_normal([num_classes]))
}

In [17]:
def RNN(x, weights, biases):

    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, timesteps, n_input)
    # Required shape: 'timesteps' tensors list of shape (batch_size, n_input)

    # Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
    x = tf.unstack(x, timesteps, 1)

    # Define a lstm cell with tensorflow
    lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)

    # Get lstm cell output
    outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)

    # Linear activation, using rnn inner loop last output
    return tf.matmul(outputs[-1], weights['out']) + biases['out']

logits = RNN(X, weights, biases)
prediction = tf.nn.softmax(logits)

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.



In [18]:
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    for step in range(1, training_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_x = batch_x.reshape((batch_size, timesteps, num_input))
        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

    print("Optimization Finished!")

    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input))
    test_label = mnist.test.labels[:test_len]
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))

Step 1, Minibatch Loss= 2.5969, Training Accuracy= 0.102
Step 200, Minibatch Loss= 2.0649, Training Accuracy= 0.344
Step 400, Minibatch Loss= 2.0016, Training Accuracy= 0.289


KeyboardInterrupt: 

## Bidirectional LSTM

So  far,  we  have  focused  on  RNNs  that  look  into  the  past    to predict future values in the sequence, but not to  to make predictions based on future values by reading throught the sequence backwards?

**Bi-directional  deep  neural networs**,  at each time-step, $t$,  maintain two hidden layers, one for the left-to-right propagation and another for the right-to-left  propagation (hence, consuming twice  as  much  memory  space).   

The final classification result, $\hat{y}$, is generated through combining the score results produced by both RNN hidden layers.

<img src = "https://github.com/DataScienceUB/DeepLearningfromScratch2018/blob/master/images/t9.png?raw=true"  width = "600">

The equations are (arrows are for designing left-to-right and right-to-left tensors):

$$
	\overrightarrow{h}_t = f(\overrightarrow{W} x_t + \overrightarrow{V} \overrightarrow{h}_{t-1} + \overrightarrow{b})
$$
$$
	\overleftarrow{h}_t = f(\overleftarrow{W} x_t + \overleftarrow{V} \overleftarrow{h}_{t+1} + \overleftarrow{b})
$$
$$
	\hat{y}_t = g(U h_t + c) = g(U [\overrightarrow{h}_t; \overleftarrow{h}_t] + c)
$$

$[\overrightarrow{h}_t; \overleftarrow{h}_t]$ summarizes the past and future of a single element of the sequence.

Biderectional RNNs can be stacked as usual!

## Name generation with LSTM

We are going to train RNN "character-level" language models. 

That is, we’ll give the RNN a huge chunk of text and ask it to model the probability distribution of the next character in the sequence given a sequence of previous characters. This will then allow us to generate new text one character at a time.

We will encode each character into a vector using ``1-of-k`` encoding (i.e. all zero except for a single one at the index of the character in the vocabulary), and feed them into the RNN one at a time. 

At test time, we will feed a character into the RNN and get a distribution over what characters are likely to come next. We sample from this distribution, and feed it right back in to get the next letter. Repeat this process and you’re sampling text!

We can also play with the temperature of the Softmax during sampling. Decreasing the temperature from 1 to some lower number (e.g. 0.5) makes the RNN more confident, but also more conservative in its samples. Conversely, higher temperatures will give more diversity but at cost of more mistakes.

> Remider: **Softmax**
>$$
	P(y = j | \mathbf{x}) = \frac{\exp(\mathbf{x}^T \mathbf{w}_j)/\tau}{\sum_{k=1}^K \exp(\mathbf{x}^T \mathbf{w}_k)/\tau}
$$
> $\tau$ is called *temperature*. For high temperatures ( $ \tau \to \infty $ ), all $y$ have nearly the same probability. For a low temperature ( $ \tau \to 0^{+}$), the probability of the most probable $y$ tends to 1.

In order to process sequences of symbols with RNN we need to represent these symbols by numbers.

Let's suppose we have $|V|$ different symbols. The most simple representation is the **one-hot vector**: Represent every word as an $\mathbb{R}^{|V|\times1}$ vector with all $0$s and one $1$ at the index of that word in the sorted english language. Symbol vectors in this type of encoding would appear as the following:

$$w^{s_1} = \left[ \begin{array}{c} 1 \\ 0 \\ 0 \\ \vdots \\ 0 \end{array} \right], w^{s_2} = \left[ \begin{array}{c} 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{array} \right], w^{s_3} = \left[ \begin{array}{c} 0 \\ 0 \\ 1 \\ \vdots \\ 0 \end{array} \right], \cdots 
w^{s_{|V|}} = \left[ \begin{array}{c} 0 \\ 0 \\ 0 \\ \vdots \\ 1 \end{array} \right] $$

We represent each symbol as a completely independent entity. This symbol representation does not give us directly any notion of similarity.

To train our model we need text to learn from a large dataset of names. Fortunately we don’t need any labels to train a language model, just raw text.

Places names: You can download 52,700 Catalan names from a dataset available on http://territori.gencat.cat/ca/01_departament/11_normativa_i_documentacio/03_documentacio/02_territori_i_mobilitat/cartografia/nomenclator_oficial_de_toponimia_de_catalunya/

In [3]:
!wget -O - 'https://raw.githubusercontent.com/DataScienceUB/DeepLearningfromScratch2018/master/files/womennamesbarcelona.txt' > womennamesbarcelona.txt

--2018-06-13 18:10:32--  https://raw.githubusercontent.com/DataScienceUB/DeepLearningfromScratch2018/master/files/womennamesbarcelona.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.132.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.132.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 47544 (46K) [text/plain]
Saving to: 'STDOUT'


2018-06-13 18:10:33 (1.30 MB/s) - written to stdout [47544/47544]



In [4]:
from __future__ import print_function
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop
import numpy as np
import random
import sys

import codecs
f = codecs.open('womennamesbarcelona.txt', "r", "utf-8")
string = f.read()
string.encode('utf-8')
text = string.lower()

# text = text.replace("\n", " ")
    
print('corpus length:', len(text))

chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

# cut the text in semi-redundant sequences of maxlen characters
maxlen = 20
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

print('Vectorization...')
X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

corpus length: 47527
total chars: 30
nb sequences: 15836
Vectorization...


Using TensorFlow backend.


In [6]:
# build the model

print('Build model...')
model = Sequential()
model.add(LSTM(64, 
               input_shape=(20, 30), 
               dropout=0.2, 
               recurrent_dropout=0.2))

model.add(Dense(len(chars)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Activation('softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Build model...


The simplest way to use the Keras LSTM model to make predictions is to first start off with a seed sequence as input, generate the next character then update the seed sequence to add the generated character on the end and trim off the first character. 

This process is repeated for as long as we want to predict new characters

In [7]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

# train the model, output generated text after each iteration
for iteration in range(1, 60):
    print()
    print('-' * 50)
    print('Iteration', iteration)
    model.fit(X, y, batch_size=256, epochs=1)

    start_index = random.randint(0, len(text) - maxlen - 1)
    generated = ''
    sentence = text[start_index: start_index + maxlen]
    generated += sentence
    print('----- Generating with seed: "' + sentence.replace("\n", " ") + '"')
        
    for diversity in [0.5, 1.0, 1.2]:
        print()
        print('----- diversity:', diversity)
        for i in range(50):
            
            x = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x[0, t, char_indices[char]] = 1.

            preds = model.predict(x, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()


--------------------------------------------------
Iteration 1




Epoch 1/1
----- Generating with seed: "elena gabriela elena"

----- diversity: 0.5
 eria
a
çeria arña
maria
pene
a
qjara
acia
anañja


----- diversity: 1.0
yxa sçebln
slariana
sfrj eanenh
ada
eezqyyazuba
hj

----- diversity: 1.2

w oaqdqqiy
xtonzoza
velia wbsz 
ç
icgckionç nivar

--------------------------------------------------
Iteration 2
Epoch 1/1
----- Generating with seed: " maria alba marina a"

----- diversity: 0.5
rina
waria
ryska
saria gelina
aria licia
vinelna
ñ

----- diversity: 1.0
irian
tea
hvia kazgnae
exjza zñib
xlaç
drgaeçxa
ñx

----- diversity: 1.2
a
sgninnñrkalaed
ytñpiyfknçgina
rlydz g
 umefsa
ni

--------------------------------------------------
Iteration 3
Epoch 1/1
----- Generating with seed: "des carmen montserra"

----- diversity: 0.5

rkelia
anzisa
saria
laciea
ana
arenia
agatia
esa 

----- diversity: 1.0
gñuhika o
e pfrawfa
skviawsaban
narfa anaeqcaduina

----- diversity: 1.2

dzfiyensrçwangghañygsvprtakullazw

vwitejzlparraw

-------------------------------

KeyboardInterrupt: 

In [None]:
'lianda' in text

In the case of places, after several hours you can generate names such as:

+ Alzinetes, torrent de les
+ Alzinetes, vall de les
+ **Alzinó, Mas d'**
+ Alzinosa, collada de l'
+ Alzinosa, font de l'

-

+ Benavent, roc de
+ Benaviure, Cal
+ **Benca**
+ Bendiners, pla de
+ Benedi, roc del

-

+ Fiola, la
+ Fiola, puig de la
+ **Fiper, Granja del**
+ Firassa, Finca
+ Firell

-

+ Regueret, lo
+ Regueret, lo
+ **Regueró**
+ Reguerols, els
+ Reguerons, els

-

+ Vallverdú, Mas de
+ Vallverdú, serrat de
+ **Vallvicamanyà**
+ Vallvidrera
+ Vallvidrera, riera de

-

+ Terraubella, Corral de
+ Terraubes
+ **Terravanca**
+ Terrer Nou, Can
+ Terrer Roig, lo

where names in **bold** are generated and other names are the nearest neighbours (in the training dataset) of the generated name.