## Character-Level LSTM in PyTorch
In this notebook, I'll construct a character-level LSTM with PyTorch. The network will train character by character on some text, then generate new text character by character. As an example, I will train on Anna Karenina. This model will be able to generate new text based on the text from the book!

This network is based off of Andrej Karpathy's post on RNNs and implementation in Torch. Below is the general architecture of the character-wise RNN.

![](images/charseq.jpeg)


First let's load in our required resources for data loading and model creation.

In [1]:
import numpy as np
import torch
from torch import nn
import torch.nn.functional as F


In [2]:
## Load in Data.

In [3]:
# open text file and read in data as `text`
with open('data/anna.txt', 'r') as f:
    text = f.read()

In [4]:
# Let's check out the first 100 characters, make sure everything is peachy. According to the American Book Review, this is the 6th best first line of a book ever.
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

### Tokenization
In the cells, below, I'm creating a couple dictionaries to convert the characters to and from integers. Encoding the characters as integers makes it easier to use as input in the network.

In [6]:
# encode the text and map each character to an integer and vice versa

# we create two dictionaries:
# 1. int2char, which maps integers to characters
# 2. char2int, which maps characters to unique integers
chars = tuple(set(text))
int2char = dict(enumerate(chars))
char2int = {ch: ii for ii, ch in int2char.items()}

# encode the text
encoded = np.array([char2int[ch] for ch in text])



In [7]:
#And we can see those same characters from above, encoded as integers.

In [8]:
encoded[:100]

array([54, 30, 80, 74, 49, 17, 31, 12, 42, 60, 60, 60, 47, 80, 74, 74, 29,
       12, 76, 80, 59, 36, 81, 36, 17,  8, 12, 80, 31, 17, 12, 80, 81, 81,
       12, 80, 81, 36, 61, 17, 34, 12, 17, 38, 17, 31, 29, 12, 75, 13, 30,
       80, 74, 74, 29, 12, 76, 80, 59, 36, 81, 29, 12, 36,  8, 12, 75, 13,
       30, 80, 74, 74, 29, 12, 36, 13, 12, 36, 49,  8, 12, 26, 28, 13, 60,
       28, 80, 29, 22, 60, 60,  2, 38, 17, 31, 29, 49, 30, 36, 13])