# Character-Level LSTM

In [1]:
import torch
import torch.nn.functional as F
from torch import nn, optim

In [2]:
import numpy as np

In [3]:
import os

In [4]:
%matplotlib inline
import matplotlib.pyplot as plt

In [5]:
np.random.seed(42)
torch.manual_seed(42)

<torch._C.Generator at 0x7fae7836b590>

### Device

In [6]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cpu


## Data

Firs things first, we have to load the training data. In order to train our character-level LSTM network we use the book _Beyond Good and Evil_ by Friedrich Nietzsche as released by the [Gutemberg PRoject](https://www.gutenberg.org).

In [7]:
fname = "data/Beyond-Good-and-Evil.txt"

In [8]:
# Load file in one string
with open(fname, "r") as f:
    text = f.read()
    
# Remove header
text = text[328:]
    
# Print firs 500 character of text
print(text[:500])

CHAPTER I. PREJUDICES OF PHILOSOPHERS


1. The Will to Truth, which is to tempt us to many a hazardous
enterprise, the famous Truthfulness of which all philosophers have
hitherto spoken with respect, what questions has this Will to Truth not
laid before us! What strange, perplexing, questionable questions! It is
already a long story; yet it seems as if it were hardly commenced. Is
it any wonder if we at last grow distrustful, lose patience, and turn
impatiently away? That this Sphinx teaches us 


In [9]:
print(f"Total number of characters: {len(text)}")

Total number of characters: 373094


### Tokens

Our text is composed of character and a character-level LSTM network will produce new text character by character. We can extract all the character of the text to see the number of tokens our model will work with:

In [10]:
tokens = sorted(set(text))
n_tokens =  len(tokens)
print(f"Unique characters: {n_tokens}")

Unique characters: 78


Now we want to map the characters (tokens) to a unique integer that can be understood by the model:

In [11]:
char2int = {c : i for i, c in enumerate(tokens)}
print(char2int)

{'\n': 0, ' ': 1, '!': 2, '"': 3, "'": 4, '(': 5, ')': 6, ',': 7, '-': 8, '.': 9, '0': 10, '1': 11, '2': 12, '3': 13, '4': 14, '5': 15, '6': 16, '7': 17, '8': 18, '9': 19, ':': 20, ';': 21, '?': 22, 'A': 23, 'B': 24, 'C': 25, 'D': 26, 'E': 27, 'F': 28, 'G': 29, 'H': 30, 'I': 31, 'J': 32, 'K': 33, 'L': 34, 'M': 35, 'N': 36, 'O': 37, 'P': 38, 'Q': 39, 'R': 40, 'S': 41, 'T': 42, 'U': 43, 'V': 44, 'W': 45, 'X': 46, 'Y': 47, 'Z': 48, '[': 49, ']': 50, '_': 51, 'a': 52, 'b': 53, 'c': 54, 'd': 55, 'e': 56, 'f': 57, 'g': 58, 'h': 59, 'i': 60, 'j': 61, 'k': 62, 'l': 63, 'm': 64, 'n': 65, 'o': 66, 'p': 67, 'q': 68, 'r': 69, 's': 70, 't': 71, 'u': 72, 'v': 73, 'w': 74, 'x': 75, 'y': 76, 'z': 77}


In order to revert the integer encoding to the original characters we also need to define the inverse mapping:

In [12]:
int2char = {i: c for c, i in char2int.items()}
print(int2char)

{0: '\n', 1: ' ', 2: '!', 3: '"', 4: "'", 5: '(', 6: ')', 7: ',', 8: '-', 9: '.', 10: '0', 11: '1', 12: '2', 13: '3', 14: '4', 15: '5', 16: '6', 17: '7', 18: '8', 19: '9', 20: ':', 21: ';', 22: '?', 23: 'A', 24: 'B', 25: 'C', 26: 'D', 27: 'E', 28: 'F', 29: 'G', 30: 'H', 31: 'I', 32: 'J', 33: 'K', 34: 'L', 35: 'M', 36: 'N', 37: 'O', 38: 'P', 39: 'Q', 40: 'R', 41: 'S', 42: 'T', 43: 'U', 44: 'V', 45: 'W', 46: 'X', 47: 'Y', 48: 'Z', 49: '[', 50: ']', 51: '_', 52: 'a', 53: 'b', 54: 'c', 55: 'd', 56: 'e', 57: 'f', 58: 'g', 59: 'h', 60: 'i', 61: 'j', 62: 'k', 63: 'l', 64: 'm', 65: 'n', 66: 'o', 67: 'p', 68: 'q', 69: 'r', 70: 's', 71: 't', 72: 'u', 73: 'v', 74: 'w', 75: 'x', 76: 'y', 77: 'z'}


In [13]:
# Test the conversion between char and ints
for t in tokens:
    assert t == int2char[char2int[t]]

With the `char2int` msapping we can finally encode the whole text (list of characters) into a list of integers.

In [14]:
# Encode text mapping characters to integers
encodedtext = np.array([char2int[char] for char in text])

print(encodedtext[:250])

[25 30 23 38 42 27 40  1 31  9  1 38 40 27 32 43 26 31 25 27 41  1 37 28
  1 38 30 31 34 37 41 37 38 30 27 40 41  0  0  0 11  9  1 42 59 56  1 45
 60 63 63  1 71 66  1 42 69 72 71 59  7  1 74 59 60 54 59  1 60 70  1 71
 66  1 71 56 64 67 71  1 72 70  1 71 66  1 64 52 65 76  1 52  1 59 52 77
 52 69 55 66 72 70  0 56 65 71 56 69 67 69 60 70 56  7  1 71 59 56  1 57
 52 64 66 72 70  1 42 69 72 71 59 57 72 63 65 56 70 70  1 66 57  1 74 59
 60 54 59  1 52 63 63  1 67 59 60 63 66 70 66 67 59 56 69 70  1 59 52 73
 56  0 59 60 71 59 56 69 71 66  1 70 67 66 62 56 65  1 74 60 71 59  1 69
 56 70 67 56 54 71  7  1 74 59 52 71  1 68 72 56 70 71 60 66 65 70  1 59
 52 70  1 71 59 60 70  1 45 60 63 63  1 71 66  1 42 69 72 71 59  1 65 66
 71  0 63 52 60 55  1 53 56 57]


### One-Hot Encoding

The LSTM model will take input in one-hot encoded form and therefore we need to write a function to transform our integer-encoded tokens into one-hot encoded tokens:

In [15]:
def one_hot_encoder(data, num_labels):
    """
    One hot encoding of integer-encoded data.
    """
    
    # Transform data to numpy array
    data = np.asarray(data)
    
    # Initialize one-hot encoding vector
    # PyTorch standard type is torch.float32
    # Declare numpy array as np.float32 to avoid conversion errors
    one_hot = np.zeros((data.size, num_labels), dtype=np.float32)
    
    # Row indices for hot elements (all rows)
    row_idx = np.arange(one_hot.shape[0])
    
    # Data contains integer-encoded characters
    # Hot element column indices correspond to their value
    col_idx = data.flatten()
    
    # Perform one-hot encoding
    one_hot[row_idx,col_idx] = 1.0
    
    # Reshape one-hot encoding with original data shape
    # An additional dimension for the one-hot encoding is added
    one_hot = one_hot.reshape((*data.shape, num_labels))
    
    return one_hot

We can finally test that `one_hot_encoder` performs as expected:

In [16]:
# Number of elements to one-hot encode in this test
n = 5

# Perform one-hot encoding
one_hot_test = one_hot_encoder(encodedtext[:n], n_tokens)

assert one_hot_test.shape == (n, n_tokens)

for idx, e in enumerate(encodedtext[:n]):
    assert one_hot_test[idx, e] == 1
    
print(encodedtext[:n])
print(one_hot_test)

[25 30 23 38 42]
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 

### Mini Batches

As usual, we want to create mini-barches for training. With text, we want to split the text in different sequences (of length `len_sequence`) and pool them in multiple batches that will be fed to our model simultaneously. This means that each batch will have a total size of `num_char_batch = batch_size * len_sequence` characters: `batch_size` indicates the number of sequences in a batch, not the number of characters (tokens). In order to make things easier we can discard the last partial batch and retain a total number of  batches given by `num_chars // num_char_batch`, where `num_chars` is the total number of characters in the training text used for training. Once the the data has been trimmed, we can split it in `batch_size` batches, by re-shaping the data.

In [17]:
len_sequence_test = 3 # Test sequence length
batch_size_test = 4 # Test batch size (number of sequences per batch)

print(f"Number of characters in a sequence: {len_sequence_test}")
print(f"Number of sequences in a batch: {batch_size_test}")

num_characters = 42 # Total number of characters 

# Number of characters per batch
num_char_batch_test = len_sequence_test * batch_size_test

# Actual number of batches
# Total number of characters divided by the number of characters per batch
# Integer division is performed with //
num_batches_test = num_characters // num_char_batch_test

print(f"Number of batches: {num_batches_test}")

# Create fictitious data with different numbers for characters of different sequences
data, seqence_idx = [], 0
for idx in range(num_characters):
    if idx % len_sequence_test == 0:
        seqence_idx += 1
        
    data.append(seqence_idx)
        
data = np.array(data)
print(f"Raw data:\n{data}")

# Trim data to have only complete batches
data = data[:num_batches_test * num_char_batch_test]
print(f"Trimmed data:\n{data}")

data = data.reshape(batch_size_test, -1)
print(f"Reshaped data:\n{data}")

assert data.shape == (batch_size_test, num_batches_test * len_sequence_test)

Number of characters in a sequence: 3
Number of sequences in a batch: 4
Number of batches: 3
Raw data:
[ 1  1  1  2  2  2  3  3  3  4  4  4  5  5  5  6  6  6  7  7  7  8  8  8
  9  9  9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14]
Trimmed data:
[ 1  1  1  2  2  2  3  3  3  4  4  4  5  5  5  6  6  6  7  7  7  8  8  8
  9  9  9 10 10 10 11 11 11 12 12 12]
Reshaped data:
[[ 1  1  1  2  2  2  3  3  3]
 [ 4  4  4  5  5  5  6  6  6]
 [ 7  7  7  8  8  8  9  9  9]
 [10 10 10 11 11 11 12 12 12]]


After re-shaping we have the data in `batch_size` rows, which is the number of sequences we want to use in one batch (and not the number of batches). We see that the `num_batches` are clearly separated along columns and have length `len_sequence`  along rows. Therefore, we can slide over the columns a window of length `len_sqeuence` to create `num_batches` batches (of size `len_sequence * batch_size`) for the input and target variables. The inputs are simply the window of shape `(batch_size, len_seqence)` sliding over the data in steps of `len_seqence`; the target is the same window shifted by one. Care should be taken for the last index, that needs to wrap around.

In [18]:
# Show structure of different batches
for idx, n in enumerate(range(0, num_batches_test * len_sequence_test, len_sequence_test)):
    data[:,n:n + len_sequence_test] = idx

print(f"Batched data:\n{data}")

Batched data:
[[0 0 0 1 1 1 2 2 2]
 [0 0 0 1 1 1 2 2 2]
 [0 0 0 1 1 1 2 2 2]
 [0 0 0 1 1 1 2 2 2]]


We can encode this complicated batching process in a function that yield the current batch:

In [19]:
def iterbatches(data, batch_size, len_sequence):
    
    # Number of characters per batch
    num_char_batch = batch_size * len_sequence
    
    # Total number of characters
    num_chars = len(data)
    
    # Total number of full batches
    # // performs integer division
    num_batches = num_chars // num_char_batch
    
    # Discard last charachters not filling a batch
    data = data[:num_batches * num_char_batch]
    
    # Reshape into batch_size rows
    data = data.reshape((batch_size, -1))
    
    assert data.shape[1] == num_batches * len_sequence
    
    for n in range(0, num_batches * len_sequence, len_sequence):
        
        # Input features
        inputs = data[:,n:n + len_sequence]
        
        # Target features
        # Input features shifted by one
        targets = np.zeros_like(inputs)
        targets[:,:-1] = inputs[:,1:] # Shift input by one
        try:
            targets[:,-1] = data[:,n + len_sequence] # Add last element
        except IndexError: # Last batch, wrap around 
            targets[:,-1] = data[:,0]
        
        # Yeld 
        yield inputs, targets

We can now test the `iterbarches` function to make sure that its output is what we expect:

In [20]:
testbatches = iterbatches(encodedtext, batch_size=3, len_sequence=10)

inputs, targets = next(testbatches)

assert inputs.shape == (3, 10)
assert targets.shape == (3, 10)

print(f"Input Sequence (0):\n{inputs}")
print(f"Target Sequence (0):\n{targets}")

inputs, targets = next(testbatches)

assert inputs.shape == (3, 10)
assert targets.shape == (3, 10)

print(f"Input Sequence (1):\n{inputs}")
print(f"Target Sequence (2):\n{targets}")

Input Sequence (0):
[[25 30 23 38 42 27 40  1 31  9]
 [70 71 60 54  1 66 57  1 71 59]
 [ 1 66 57  1 71 59 56  1 71 69]]
Target Sequence (0):
[[30 23 38 42 27 40  1 31  9  1]
 [71 60 54  1 66 57  1 71 59 56]
 [66 57  1 71 59 56  1 71 69 72]]
Input Sequence (1):
[[ 1 38 40 27 32 43 26 31 25 27]
 [56  1 71 76 67 56  1  3 57 69]
 [72 71 59 57 72 63  8  8 71 59]]
Target Sequence (2):
[[38 40 27 32 43 26 31 25 27 41]
 [ 1 71 76 67 56  1  3 57 69 56]
 [71 59 57 72 63  8  8 71 59 56]]


We see that the target sequence is the input sequence shifted by one on the `len_sequence` dimension (`axis=1`). It's also obvious that the last element of the last batch of the target sequence is taken from the next set of batches.

## LSTM Architecture

We can now define out character-level LSTM network, composed of a `nn.LSTM` module (with `n_layers` LSTM layers and a hidden state output of size `n_hidden`) and a fully connected layer taking `n_tokens` input features (corresponding to the one-hot encoding of the integer-encoded tokens).

In [21]:
class CharLSTM(nn.Module):
    
    def __init__(self, n_tokens, n_hidden=256, n_layers=2, pdrop=0.5):
        super().__init__()
        
        # Number of features in  the LSTM hidden state
        self.n_hidden = n_hidden
        
        # Number of LSTM hidden layers
        self.n_layers = n_layers
        
        # Dropout probability
        self.pdrop = pdrop
        
        # Define 
        self.lstm = nn.LSTM(
            n_tokens, 
            n_hidden, 
            n_layers, 
            dropout=pdrop, # LSTM dropout
            batch_first=True # Batch dimension is first
        )
        
        # Dropout layer for input of the  fully connected layer
        self.dropout = nn.Dropout(pdrop)
        
        self.fc = nn.Linear(n_hidden, n_tokens)
        
    def forward(self, x, hidden):
        
        # Forward pass in LSTM
        output, hidden = self.lstm(x, hidden)
        
        # Dropout
        output = self.dropout(output)
        
        # Stack LSTM outputs
        # First dimension is batches
        output = output.view(-1, self.n_hidden)
        
        # Forward pass through fully connected layer
        output = self.fc(output)
        
        # Return log probabilities
        output = F.log_softmax(output, dim=1)
        
        # Return output and hidden state
        return output, hidden

### Test Forward Pass

In order to make sure that the architecture works correctly, we can test a single forward pass:

In [22]:
# Instanciate CharLSTM with a hidden output size of 64
n_layers = 2
testlstm = CharLSTM(n_tokens, n_hidden=64, n_layers=n_layers)

# Get one batch of integer-encoded data
testbatches = iterbatches(encodedtext, batch_size=3, len_sequence=10)
inputs, targets = next(testbatches)

# Perform one-hot encoding and transform to PyTorch tensor
inputs = one_hot_encoder(inputs, n_tokens)
inputs = torch.from_numpy(inputs)

# Forward pass
output, hidden = testlstm(inputs, None)

# Check output shape: (batch_size * len_sequence, n_tokens)
assert output.shape == (3 * 10, n_tokens) 

# Test hidden is a tuple with n_layers elements
assert len(hidden) == n_layers 

for h in hidden:
    # Test hidden output shape for each layer (n_layers, batch_size, n_hidden)
    assert h.shape == (n_layers, 3, 64) 

## Training

We can finally define out training loop as usual:

In [23]:
def train(model, 
          optimizer, 
          loss_function, 
          data,
          n_tokens,
          epochs=10, 
          batch_size=10,
          len_sequence=50, 
          clip=5, 
          print_every=5,
          device=device):
    
    import time
    
    # Set model in training mode
    model.train()
    
    # Move model to devide
    model.to(device)
    
    model.train()
    for epoch in range(epochs):
        
        epoch_loss = 0
        
        start_time = time.time()
        
        # Initialize hidden state
        hidden = None
    
        for inputs, targets in iterbatches(data, batch_size, len_sequence):
            
            assert inputs.shape == targets.shape == (batch_size, len_sequence)
            
            inputs = one_hot_encoder(inputs, n_tokens)
            
            # Initialise tensors and move to device
            inputs = torch.from_numpy(inputs).to(device)
            targets = torch.from_numpy(targets).to(device)
            
            output, hidden = model(inputs, hidden)
            
            # Detach all hidden states from the computational graph
            # Avoid backpropagation through the entire history
            # Hidden states are stored in a tuple of size n_layers
            hidden = tuple(h.detach() for h in hidden)
            
            # Reset gradients
            optimizer.zero_grad()
            
            # Compute loss
            loss = loss_function(output, targets.view(batch_size * len_sequence))
            
            # Perform backpropagation
            loss.backward()
            
            # Accumulate epoch loss
            epoch_loss += loss.item()
            
            # Clip gradients norm
            # Prevents the exploding gradient problem
            nn.utils.clip_grad_norm_(model.parameters(), clip)
            
            # Optimise model parameters
            optimizer.step()
        else:
            stop_time = time.time()
            
            num_batches = data.size // (batch_size * len_sequence)
            
            print(f"--- Epoch {epoch:2}/{epochs:2} ---")
            print(f"Loss: {epoch_loss/num_batches:.5f}")
            print(f"Time: {stop_time-start_time:.2f} s")

Finally we can train the model:

In [24]:
n_hidden = 512
n_layers = 2

model = CharLSTM(n_tokens, n_hidden, n_layers)

print(model)

CharLSTM(
  (lstm): LSTM(78, 512, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5, inplace=False)
  (fc): Linear(in_features=512, out_features=78, bias=True)
)


In [25]:
batch_size = 128
len_sequence = 100
epochs = 20

#  Optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Loss function
# NLLLoss + LogSoftmax output is equivalent to CrossEntropyLoss
loss_function = nn.NLLLoss()

#  Train model
train(model, optimizer, loss_function, encodedtext, n_tokens, 
      epochs=epochs, batch_size=batch_size, len_sequence=len_sequence)

--- Epoch  0/20 ---
Loss: 3.39770
Time: 56.45 s
--- Epoch  1/20 ---
Loss: 3.17779
Time: 57.29 s
--- Epoch  2/20 ---
Loss: 3.16591
Time: 58.42 s
--- Epoch  3/20 ---
Loss: 3.10668
Time: 59.34 s
--- Epoch  4/20 ---
Loss: 2.93153
Time: 52.65 s
--- Epoch  5/20 ---
Loss: 2.66057
Time: 51.18 s
--- Epoch  6/20 ---
Loss: 2.52134
Time: 51.73 s
--- Epoch  7/20 ---
Loss: 2.42938
Time: 51.04 s
--- Epoch  8/20 ---
Loss: 2.34566
Time: 51.55 s
--- Epoch  9/20 ---
Loss: 2.27707
Time: 51.82 s
--- Epoch 10/20 ---
Loss: 2.21662
Time: 52.12 s
--- Epoch 11/20 ---
Loss: 2.16436
Time: 52.18 s
--- Epoch 12/20 ---
Loss: 2.11680
Time: 52.27 s
--- Epoch 13/20 ---
Loss: 2.07256
Time: 52.02 s
--- Epoch 14/20 ---
Loss: 2.03249
Time: 51.68 s
--- Epoch 15/20 ---
Loss: 1.99457
Time: 52.59 s
--- Epoch 16/20 ---
Loss: 1.95888
Time: 53.07 s
--- Epoch 17/20 ---
Loss: 1.92814
Time: 52.38 s
--- Epoch 18/20 ---
Loss: 1.89683
Time: 52.68 s
--- Epoch 19/20 ---
Loss: 1.87209
Time: 52.13 s


### Save Model

After training we can save the model for later use:

In [26]:
import os

# Make directory for models
try:
    os.mkdir("models")
except FileExistsError:
    pass

checkpoint = {
    "n_hidden": model.n_hidden,
    "n_layers": model.n_layers,
    "state_dict": model.state_dict(),
}
        
torch.save(checkpoint, "models/CharLSTM.pth")

## Predictions

A character-level LSTM network gives a probability distribution for the next character in a sequence (among all possible characters), given the previous character and an hidden state (memory of the network). We can therefore define an helper function that given a character and an hidden state predicts the next character from the probability distribution over all possible character. Such probability distribution is obtained by applying a `F.softmax` function to the raw output ofthe newtowrk. Instead of using the most probable character only, we can use a `top_k` policy, where the next characer is randomly selected among the top $k$ most probable ones (with a probability proportional to their original probability).

In [27]:
def predict(char, hidden, model, tokens, top_k=3):
    
    # Evaluation mode
    model.eval()
    
    # Transform char to integer encoding
    inputs = np.array([[char2int[char]]])
    
    # One-hot encode input
    n_tokens = len(tokens)
    inputs = one_hot_encoder(inputs, n_tokens)
    
    # Transform numpy array to torch tensor
    inputs = torch.from_numpy(inputs).to(device)
    
    with torch.no_grad():
        
        # Propagation  through the network
        output, hidden = model(inputs, hidden)

        # Get probability distribution for next character
        # Fist dimension is batches
        # Network output is LogSoftmax
        probabilities = torch.exp(output)
        
        # Get top characters
        p, top_char = probabilities.topk(top_k)
        top_char = top_char.cpu().numpy().squeeze()
        
        # Select next character amont top_k most probable
        # Assign probabilities proportional to predicted probability
        p = p.cpu().numpy().squeeze()
        nextchar = np.random.choice(top_char, p=p/p.sum())
        
        # Return predicted char and hidden state
        return int2char[nextchar], hidden

### Sampling

Finally we can generate new text. We start with a prime input that is used to initialise the hidden state of the network, then new characters are sampled using the LSTM network prediction:

In [28]:
def sample(model, length, prime, top_k=3):
    
    # List of prime characters
    chars = list(prime)
    
    # Initialise hidden state
    hidden = None
    
    # Run on prime
    for char in chars:
        char, hidden = predict(char, hidden, model, tokens, top_k=top_k)

    # Append first prediction after prime
    chars.append(char)
    
    # Use previous prediction to obtain a new prediction
    for _ in range(length):
        char, hidden = predict(chars[-1], hidden, model, tokens, top_k=top_k)
        chars.append(char)

    return "".join(chars)

In [29]:
generatedtext = sample(model, 2000, "As a matter of fact,")
print(generatedtext)

As a matter of fact, the manterents
the more, the stint and the serest, and allose the ponered and the sore the stiling, to the profount to bat the moral at allost an and and an the store, the
constrection of the most along there
as allow an is andest a precies the
postined, the perhaps and to the more also of things and so a prosound of the most of an a could this
the soul of the possed," and
alse stan that and some a some of the spiest and a moral of the stile to beloode a sente ther are and self-to beligetion tastery, and stours, and the sour of self a soul to the meant of
the sente of the senter and to that an a sould to the more the posers and the still to that in to sees, the some and senter, the man only in alout that the prosounes to be all allowing of the precient of the
strongion, and along, to the some as the more of the presance, which the prose of the man a something
of that the still and allatity, and and senses of the sporiting and the self an incentions of a simplices t

The generated text is far from perfect, as expected from a character-level LSTM network. However it could be improved by playing with the hyperparameters of the model.