# Text Generation with LSTM with Keras

## Load the data
We are going to use the moby dick text for this.

In [2]:
def read_file(filepath):
    """Read file.
    Simple function to read all the text from a file.
    Do not use it with large text files."""
    with open(filepath) as f:
        str_text = f.read()
    return str_text

In [9]:
file_path = "../../datasets/moby_dick_four_chapters.txt"
corpus = read_file(file_path)
print(corpus[:500], "...")

Call me Ishmael.  Some years ago--never mind how long
precisely--having little or no money in my purse, and nothing
particular to interest me on shore, I thought I would sail about a
little and see the watery part of the world.  It is a way I have of
driving off the spleen and regulating the circulation.  Whenever I
find myself growing grim about the mouth; whenever it is a damp,
drizzly November in my soul; whenever I find myself involuntarily
pausing before coffin warehouses, and bringing up t ...


Let's import space disabling what we do not need.

Remember that you need to download spacy data first:
* https://spacy.io/
* https://spacy.io/usage/models

In [10]:
!python -m spacy download en_core_web_sm

Collecting en_core_web_sm==2.0.0
[?25l  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
[K     |████████████████████████████████| 37.4MB 2.1MB/s eta 0:00:01     |███████████████████████████████▏| 36.4MB 2.1MB/s eta 0:00:01
[?25hBuilding wheels for collected packages: en-core-web-sm
  Building wheel for en-core-web-sm (setup.py) ... [?25ldone
[?25h  Created wheel for en-core-web-sm: filename=en_core_web_sm-2.0.0-cp37-none-any.whl size=37405978 sha256=7b6216dac1ea9a7f7576a1b0db729199dba80b1e42ed7152008c4a0d7b138933
  Stored in directory: /private/var/folders/7h/th34yqr102n5jz073xl10zc80000gn/T/pip-ephem-wheel-cache-_6d_otuz/wheels/54/7c/d8/f86364af8fbba7258e14adae115f18dd2c91552406edc3fdaa
Successfully built en-core-web-sm
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-2.0.0

[93m    Linking successful[0m
    /Users/OhtarMac/anaconda3/envs/nlp_training/lib/python3

In [12]:
import spacy

nlp = spacy.load('en_core_web_sm', disable=['parser', 'tagger', 'ner'])
# This is needed in case we want to process a bigger text file
nlp.max_length =1198623

Let's clean the text a little bit by eliminating punctuation

In [13]:
def separate_punctuation(doc_text, black_list='\n\n \n\n\n!"-#$%&()--.*+,-/:;<=>?@[\\]^_`{|}~\t\n '):
    return [token.text.lower() for token in nlp(doc_text) if token.text not in black_list]

In [14]:
tokens = separate_punctuation(corpus)
tokens

['call',
 'me',
 'ishmael',
 'some',
 'years',
 'ago',
 'never',
 'mind',
 'how',
 'long',
 'precisely',
 'having',
 'little',
 'or',
 'no',
 'money',
 'in',
 'my',
 'purse',
 'and',
 'nothing',
 'particular',
 'to',
 'interest',
 'me',
 'on',
 'shore',
 'i',
 'thought',
 'i',
 'would',
 'sail',
 'about',
 'a',
 'little',
 'and',
 'see',
 'the',
 'watery',
 'part',
 'of',
 'the',
 'world',
 'it',
 'is',
 'a',
 'way',
 'i',
 'have',
 'of',
 'driving',
 'off',
 'the',
 'spleen',
 'and',
 'regulating',
 'the',
 'circulation',
 'whenever',
 'i',
 'find',
 'myself',
 'growing',
 'grim',
 'about',
 'the',
 'mouth',
 'whenever',
 'it',
 'is',
 'a',
 'damp',
 'drizzly',
 'november',
 'in',
 'my',
 'soul',
 'whenever',
 'i',
 'find',
 'myself',
 'involuntarily',
 'pausing',
 'before',
 'coffin',
 'warehouses',
 'and',
 'bringing',
 'up',
 'the',
 'rear',
 'of',
 'every',
 'funeral',
 'i',
 'meet',
 'and',
 'especially',
 'whenever',
 'my',
 'hypos',
 'get',
 'such',
 'an',
 'upper',
 'hand',
 '