<h1>Overview<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Spooky-Author-Classification" data-toc-modified-id="Spooky-Author-Classification-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Spooky Author Classification</a></span><ul class="toc-item"><li><span><a href="#Generate-Tokenizer-Input-Data" data-toc-modified-id="Generate-Tokenizer-Input-Data-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Generate Tokenizer Input Data</a></span></li><li><span><a href="#Generate-Tokenizer" data-toc-modified-id="Generate-Tokenizer-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Generate Tokenizer</a></span></li><li><span><a href="#Tokenize-texts" data-toc-modified-id="Tokenize-texts-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Tokenize texts</a></span></li><li><span><a href="#Add-a-Regular-Tokenizer-to-compare" data-toc-modified-id="Add-a-Regular-Tokenizer-to-compare-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Add a Regular Tokenizer to compare</a></span></li></ul></li><li><span><a href="#Estimate-a-reasonable-sequence's-max_length" data-toc-modified-id="Estimate-a-reasonable-sequence's-max_length-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Estimate a reasonable sequence's max_length</a></span><ul class="toc-item"><li><span><a href="#Enforce-a-sequence's-max_length" data-toc-modified-id="Enforce-a-sequence's-max_length-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Enforce a sequence's max_length</a></span></li><li><span><a href="#Create-train-/test-sets" data-toc-modified-id="Create-train-/test-sets-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Create train-/test sets</a></span></li><li><span><a href="#Create-Model" data-toc-modified-id="Create-Model-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Create Model</a></span></li><li><span><a href="#Fit-Model" data-toc-modified-id="Fit-Model-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Fit Model</a></span></li><li><span><a href="#Evaluate-Model" data-toc-modified-id="Evaluate-Model-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Evaluate Model</a></span></li></ul></li></ul></div>

In [1]:
import os
import re
import sys
import numpy as np
import pandas as pd

from keras import backend as K
from keras.layers import Dense, Conv1D, LSTM, Dropout, Embedding, Layer
from keras.models import Sequential as SequentialModel
from keras.utils import to_categorical
from keras.preprocessing.text import Tokenizer as KerasTokenizer

%load_ext autoreload
%autoreload 2

sys.path.insert(0, '../ct')

import load
from preprocess import preprocess
from preprocess import Tokenizer
from preprocess.preprocess import separator_samples

Using TensorFlow backend.


In [2]:
# display(f'current dir: {os.getcwd()}')
# df = pd.read_pickle('../data/processed/treebank.pickle')

# if not os.path.exists('../data/processed/treebank'):
#     os.mkdir('../data/processed/treebank')

# for i, text in enumerate(df.text):
#     with open(f'../data/processed/treebank/{i + 1}', 'w') as file:
#         file.write(text)
        
# with open('../data/processed/treebank.txt', 'w', ) as file:
#     file.write(text)

# tokenizer = Tokenizer(input_paths=['../data/input/treebank/raw_txt/treebank-utf8.txt'],
#                       tokens_output_dir='../data/processed/treebank/',
#                       tokenizer_output_path='../data/tokenizer/treebank',
#                       lowercase=True,
#                       vocab_size=5000)
# display(tokenizer)

# Spooky Author Classification

## Generate Tokenizer Input Data

In [37]:
spooky_train = pd.read_csv('../data/input/spooky-author/train.csv')
spooky_test = pd.read_csv('../data/input/spooky-author/test.csv')

In [38]:
spooky_train

Unnamed: 0,id,text,author
0,id26305,"This process, however, afforded me no means of...",EAP
1,id17569,It never once occurred to me that the fumbling...,HPL
2,id11008,"In his left hand was a gold snuff box, from wh...",EAP
3,id27763,How lovely is spring As we looked from Windsor...,MWS
4,id12958,"Finding nothing else, not even gold, the Super...",HPL
...,...,...,...
19574,id17718,"I could have fancied, while I looked at it, th...",EAP
19575,id08973,The lids clenched themselves together as if in...,EAP
19576,id05267,"Mais il faut agir that is to say, a Frenchman ...",EAP
19577,id17513,"For an item of news like this, it strikes us i...",EAP


In [39]:
if not os.path.exists('../data/processed/spooky-author'):
    os.mkdir('../data/processed/spooky-author')

x_train = f'{separator_samples}'.join(spooky_train.text)
x_train = x_train.encode('utf-8', 'backslashreplace').decode('utf-8', 'backslashreplace')

with open('../data/input/spooky-author/train.txt', 'w', encoding="utf-8") as file:
    file.write(x_train)

## Generate Tokenizer

In [40]:
tokenizer = Tokenizer(input_paths=['../data/input/spooky-author/train.txt'],
                      tokenizer_output_path='../data/tokenizer/spooky-author',
                      lowercase=True,
                      vocab_size=4000)

## Tokenize texts

In [41]:
spooky_train['encoding'] = tokenizer.encode_batch(spooky_train.text.tolist())
spooky_test['encoding'] = tokenizer.encode_batch(spooky_test.text.tolist())

spooky_train['tokens'] = spooky_train.encoding.apply(lambda e: e.tokens)
spooky_train['token_ids'] = spooky_train.encoding.apply(lambda e: e.ids)

spooky_test['tokens'] = spooky_test.encoding.apply(lambda e: e.tokens)
spooky_test['token_ids'] = spooky_test.encoding.apply(lambda e: e.ids)

author_to_id = {'EAP': 0, 'HPL': 1, 'MWS': 2}
spooky_train['author_id'] = spooky_train.author.apply(lambda a: author_to_id[a])

In [8]:
for i, row in spooky_train.head(20).iterrows():
    print(row.text)
    display(row.tokens)
    print()

This process, however, afforded me no means of ascertaining the dimensions of my dungeon; as I might make its circuit, and return to the point whence I set out, without being aware of the fact; so perfectly uniform seemed the wall.


['th',
 'is',
 'Ġpro',
 'c',
 'ess',
 ',',
 'Ġh',
 'ow',
 'e',
 'ver',
 ',',
 'Ġa',
 'ff',
 'or',
 'd',
 'ed',
 'Ġme',
 'Ġno',
 'Ġme',
 'an',
 's',
 'Ġof',
 'Ġas',
 'c',
 'er',
 't',
 'ain',
 'ing',
 'Ġthe',
 'Ġd',
 'im',
 'en',
 's',
 'ion',
 's',
 'Ġof',
 'Ġmy',
 'Ġd',
 'u',
 'n',
 'ge',
 'on',
 ';',
 'Ġas',
 'Ġi',
 'Ġm',
 'ight',
 'Ġm',
 'a',
 'ke',
 'Ġit',
 's',
 'Ġc',
 'ir',
 'c',
 'u',
 'it',
 ',',
 'Ġand',
 'Ġre',
 't',
 'ur',
 'n',
 'Ġto',
 'Ġthe',
 'Ġpo',
 'in',
 't',
 'Ġwhen',
 'ce',
 'Ġi',
 'Ġs',
 'et',
 'Ġo',
 'ut',
 ',',
 'Ġwith',
 'out',
 'Ġbe',
 'ing',
 'Ġa',
 'w',
 'a',
 're',
 'Ġof',
 'Ġthe',
 'Ġf',
 'a',
 'ct',
 ';',
 'Ġso',
 'Ġper',
 'f',
 'ect',
 'ly',
 'Ġun',
 'if',
 'or',
 'm',
 'Ġse',
 'em',
 'ed',
 'Ġthe',
 'Ġw',
 'all',
 '.']


It never once occurred to me that the fumbling might be a mere mistake.


['it',
 'Ġne',
 'ver',
 'Ġon',
 'ce',
 'Ġo',
 'c',
 'c',
 'ur',
 'red',
 'Ġto',
 'Ġme',
 'Ġthat',
 'Ġthe',
 'Ġf',
 'um',
 'b',
 'l',
 'ing',
 'Ġm',
 'ight',
 'Ġbe',
 'Ġa',
 'Ġme',
 're',
 'Ġm',
 'ist',
 'a',
 'ke',
 '.']


In his left hand was a gold snuff box, from which, as he capered down the hill, cutting all manner of fantastic steps, he took snuff incessantly with an air of the greatest possible self satisfaction.


['in',
 'Ġhis',
 'Ġle',
 'f',
 't',
 'Ġha',
 'nd',
 'Ġwas',
 'Ġa',
 'Ġg',
 'o',
 'ld',
 'Ġs',
 'n',
 'u',
 'ff',
 'Ġb',
 'o',
 'x',
 ',',
 'Ġfrom',
 'Ġwhich',
 ',',
 'Ġas',
 'Ġhe',
 'Ġc',
 'a',
 'pe',
 'red',
 'Ġd',
 'ow',
 'n',
 'Ġthe',
 'Ġh',
 'ill',
 ',',
 'Ġc',
 'ut',
 't',
 'ing',
 'Ġall',
 'Ġman',
 'ne',
 'r',
 'Ġof',
 'Ġf',
 'ant',
 'ast',
 'ic',
 'Ġst',
 'e',
 'p',
 's',
 ',',
 'Ġhe',
 'Ġto',
 'o',
 'k',
 'Ġs',
 'n',
 'u',
 'ff',
 'Ġin',
 'c',
 'ess',
 'ant',
 'ly',
 'Ġwith',
 'Ġan',
 'Ġa',
 'ir',
 'Ġof',
 'Ġthe',
 'Ġg',
 'reat',
 'est',
 'Ġpo',
 'ss',
 'i',
 'ble',
 'Ġse',
 'lf',
 'Ġs',
 'at',
 'is',
 'f',
 'a',
 'ct',
 'ion',
 '.']


How lovely is spring As we looked from Windsor Terrace on the sixteen fertile counties spread beneath, speckled by happy cottages and wealthier towns, all looked as in former years, heart cheering and fair.


['h',
 'ow',
 'Ġlo',
 've',
 'ly',
 'Ġis',
 'Ġsp',
 'r',
 'ing',
 'Ġas',
 'Ġwe',
 'Ġlo',
 'o',
 'k',
 'ed',
 'Ġfrom',
 'Ġw',
 'ind',
 's',
 'or',
 'Ġt',
 'er',
 'ra',
 'ce',
 'Ġon',
 'Ġthe',
 'Ġs',
 'i',
 'x',
 't',
 'e',
 'en',
 'Ġf',
 'er',
 't',
 'i',
 'le',
 'Ġc',
 'ou',
 'nt',
 'i',
 'es',
 'Ġsp',
 're',
 'ad',
 'Ġbe',
 'ne',
 'at',
 'h',
 ',',
 'Ġs',
 'pe',
 'c',
 'k',
 'led',
 'Ġby',
 'Ġha',
 'pp',
 'y',
 'Ġc',
 'ot',
 't',
 'ag',
 'es',
 'Ġand',
 'Ġwe',
 'al',
 'th',
 'i',
 'er',
 'Ġto',
 'w',
 'n',
 's',
 ',',
 'Ġall',
 'Ġlo',
 'o',
 'k',
 'ed',
 'Ġas',
 'Ġin',
 'Ġfor',
 'm',
 'er',
 'Ġy',
 'e',
 'ar',
 's',
 ',',
 'Ġhe',
 'art',
 'Ġc',
 'he',
 'er',
 'ing',
 'Ġand',
 'Ġf',
 'a',
 'ir',
 '.']


Finding nothing else, not even gold, the Superintendent abandoned his attempts; but a perplexed look occasionally steals over his countenance as he sits thinking at his desk.


['f',
 'ind',
 'ing',
 'Ġnot',
 'h',
 'ing',
 'Ġe',
 'l',
 'se',
 ',',
 'Ġnot',
 'Ġe',
 'ven',
 'Ġg',
 'o',
 'ld',
 ',',
 'Ġthe',
 'Ġsu',
 'per',
 'in',
 't',
 'e',
 'nd',
 'ent',
 'Ġab',
 'and',
 'o',
 'ned',
 'Ġhis',
 'Ġat',
 't',
 'em',
 'pt',
 's',
 ';',
 'Ġbut',
 'Ġa',
 'Ġper',
 'ple',
 'x',
 'ed',
 'Ġlo',
 'o',
 'k',
 'Ġo',
 'c',
 'c',
 'as',
 'ion',
 'all',
 'y',
 'Ġst',
 'e',
 'al',
 's',
 'Ġo',
 'ver',
 'Ġhis',
 'Ġc',
 'ou',
 'nt',
 'en',
 'an',
 'ce',
 'Ġas',
 'Ġhe',
 'Ġs',
 'it',
 's',
 'Ġth',
 'in',
 'k',
 'ing',
 'Ġat',
 'Ġhis',
 'Ġd',
 'es',
 'k',
 '.']


A youth passed in solitude, my best years spent under your gentle and feminine fosterage, has so refined the groundwork of my character that I cannot overcome an intense distaste to the usual brutality exercised on board ship: I have never believed it to be necessary, and when I heard of a mariner equally noted for his kindliness of heart and the respect and obedience paid to him by his crew, I felt myself peculiarly fortunate in being able to secure his services.


['a',
 'Ġyou',
 'th',
 'Ġp',
 'as',
 's',
 'ed',
 'Ġin',
 'Ġso',
 'l',
 'it',
 'u',
 'd',
 'e',
 ',',
 'Ġmy',
 'Ġb',
 'est',
 'Ġy',
 'e',
 'ar',
 's',
 'Ġsp',
 'ent',
 'Ġu',
 'nd',
 'er',
 'Ġyou',
 'r',
 'Ġg',
 'ent',
 'le',
 'Ġand',
 'Ġfe',
 'm',
 'in',
 'in',
 'e',
 'Ġf',
 'o',
 'st',
 'er',
 'a',
 'ge',
 ',',
 'Ġha',
 's',
 'Ġso',
 'Ġre',
 'f',
 'in',
 'ed',
 'Ġthe',
 'Ġg',
 'r',
 'ound',
 'w',
 'or',
 'k',
 'Ġof',
 'Ġmy',
 'Ġc',
 'ha',
 'ra',
 'ct',
 'er',
 'Ġthat',
 'Ġi',
 'Ġc',
 'an',
 'n',
 'ot',
 'Ġo',
 'ver',
 'c',
 'ome',
 'Ġan',
 'Ġint',
 'en',
 'se',
 'Ġdis',
 't',
 'ast',
 'e',
 'Ġto',
 'Ġthe',
 'Ġu',
 's',
 'u',
 'al',
 'Ġb',
 'r',
 'ut',
 'al',
 'ity',
 'Ġex',
 'er',
 'c',
 'is',
 'ed',
 'Ġon',
 'Ġb',
 'o',
 'ard',
 'Ġsh',
 'i',
 'p',
 ':',
 'Ġi',
 'Ġhave',
 'Ġne',
 'ver',
 'Ġbe',
 'l',
 'ie',
 'ved',
 'Ġit',
 'Ġto',
 'Ġbe',
 'Ġne',
 'c',
 'ess',
 'ar',
 'y',
 ',',
 'Ġand',
 'Ġwhen',
 'Ġi',
 'Ġhe',
 'ard',
 'Ġof',
 'Ġa',
 'Ġm',
 'ar',
 'in',
 'er',
 'Ġe',
 'qu',
 'all',



The astronomer, perhaps, at this point, took refuge in the suggestion of non luminosity; and here analogy was suddenly let fall.


['the',
 'Ġa',
 'st',
 'r',
 'on',
 'om',
 'er',
 ',',
 'Ġper',
 'ha',
 'p',
 's',
 ',',
 'Ġat',
 'Ġthis',
 'Ġpo',
 'in',
 't',
 ',',
 'Ġto',
 'o',
 'k',
 'Ġre',
 'f',
 'u',
 'ge',
 'Ġin',
 'Ġthe',
 'Ġsu',
 'g',
 'g',
 'est',
 'ion',
 'Ġof',
 'Ġn',
 'on',
 'Ġl',
 'um',
 'in',
 'o',
 's',
 'ity',
 ';',
 'Ġand',
 'Ġhe',
 're',
 'Ġan',
 'al',
 'o',
 'g',
 'y',
 'Ġwas',
 'Ġsu',
 'd',
 'd',
 'en',
 'ly',
 'Ġle',
 't',
 'Ġf',
 'all',
 '.']


The surcingle hung in ribands from my body.


['the',
 'Ġs',
 'ur',
 'c',
 'ing',
 'le',
 'Ġh',
 'u',
 'n',
 'g',
 'Ġin',
 'Ġr',
 'i',
 'b',
 'and',
 's',
 'Ġfrom',
 'Ġmy',
 'Ġb',
 'od',
 'y',
 '.']


I knew that you could not say to yourself 'stereotomy' without being brought to think of atomies, and thus of the theories of Epicurus; and since, when we discussed this subject not very long ago, I mentioned to you how singularly, yet with how little notice, the vague guesses of that noble Greek had met with confirmation in the late nebular cosmogony, I felt that you could not avoid casting your eyes upward to the great nebula in Orion, and I certainly expected that you would do so.


['i',
 'Ġk',
 'new',
 'Ġthat',
 'Ġyou',
 'Ġcould',
 'Ġnot',
 'Ġs',
 'ay',
 'Ġto',
 'Ġyou',
 'r',
 'se',
 'lf',
 'Ġ',
 "'",
 'st',
 'e',
 're',
 'ot',
 'om',
 'y',
 "'",
 'Ġwith',
 'out',
 'Ġbe',
 'ing',
 'Ġb',
 'r',
 'ou',
 'ght',
 'Ġto',
 'Ġth',
 'in',
 'k',
 'Ġof',
 'Ġat',
 'om',
 'i',
 'es',
 ',',
 'Ġand',
 'Ġth',
 'us',
 'Ġof',
 'Ġthe',
 'Ġthe',
 'or',
 'i',
 'es',
 'Ġof',
 'Ġe',
 'p',
 'ic',
 'ur',
 'us',
 ';',
 'Ġand',
 'Ġs',
 'in',
 'ce',
 ',',
 'Ġwhen',
 'Ġwe',
 'Ġdis',
 'c',
 'u',
 'ss',
 'ed',
 'Ġthis',
 'Ġsu',
 'b',
 'j',
 'ect',
 'Ġnot',
 'Ġ',
 'very',
 'Ġl',
 'ong',
 'Ġa',
 'g',
 'o',
 ',',
 'Ġi',
 'Ġm',
 'ent',
 'i',
 'o',
 'ned',
 'Ġto',
 'Ġyou',
 'Ġh',
 'ow',
 'Ġs',
 'ing',
 'ul',
 'ar',
 'ly',
 ',',
 'Ġy',
 'et',
 'Ġwith',
 'Ġh',
 'ow',
 'Ġl',
 'it',
 't',
 'le',
 'Ġnot',
 'ic',
 'e',
 ',',
 'Ġthe',
 'Ġv',
 'ag',
 'u',
 'e',
 'Ġg',
 'u',
 'ess',
 'es',
 'Ġof',
 'Ġthat',
 'Ġno',
 'ble',
 'Ġg',
 're',
 'e',
 'k',
 'Ġhad',
 'Ġme',
 't',
 'Ġwith',
 'Ġcon',
 'f',
 'ir',
 'm


I confess that neither the structure of languages, nor the code of governments, nor the politics of various states possessed attractions for me.


['i',
 'Ġcon',
 'f',
 'ess',
 'Ġthat',
 'Ġne',
 'it',
 'her',
 'Ġthe',
 'Ġst',
 'r',
 'u',
 'ct',
 'ure',
 'Ġof',
 'Ġl',
 'an',
 'g',
 'u',
 'ag',
 'es',
 ',',
 'Ġn',
 'or',
 'Ġthe',
 'Ġc',
 'od',
 'e',
 'Ġof',
 'Ġg',
 'o',
 'ver',
 'n',
 'm',
 'ent',
 's',
 ',',
 'Ġn',
 'or',
 'Ġthe',
 'Ġp',
 'ol',
 'it',
 'ic',
 's',
 'Ġof',
 'Ġv',
 'ar',
 'i',
 'ous',
 'Ġst',
 'at',
 'es',
 'Ġpo',
 'ss',
 'ess',
 'ed',
 'Ġat',
 't',
 'ra',
 'ct',
 'ion',
 's',
 'Ġfor',
 'Ġme',
 '.']


He shall find that I can feel my injuries; he shall learn to dread my revenge" A few days after he arrived.


['he',
 'Ġs',
 'ha',
 'll',
 'Ġf',
 'ind',
 'Ġthat',
 'Ġi',
 'Ġc',
 'an',
 'Ġfe',
 'e',
 'l',
 'Ġmy',
 'Ġin',
 'j',
 'ur',
 'i',
 'es',
 ';',
 'Ġhe',
 'Ġs',
 'ha',
 'll',
 'Ġle',
 'ar',
 'n',
 'Ġto',
 'Ġd',
 're',
 'ad',
 'Ġmy',
 'Ġre',
 'ven',
 'ge',
 '"',
 'Ġa',
 'Ġfe',
 'w',
 'Ġd',
 'ay',
 's',
 'Ġa',
 'f',
 'ter',
 'Ġhe',
 'Ġar',
 'ri',
 'ved',
 '.']


Here we barricaded ourselves, and, for the present were secure.


['he',
 're',
 'Ġwe',
 'Ġb',
 'ar',
 'r',
 'ic',
 'ad',
 'ed',
 'Ġo',
 'ur',
 'se',
 'l',
 'v',
 'es',
 ',',
 'Ġand',
 ',',
 'Ġfor',
 'Ġthe',
 'Ġpre',
 's',
 'ent',
 'Ġwere',
 'Ġse',
 'c',
 'ure',
 '.']


Herbert West needed fresh bodies because his life work was the reanimation of the dead.


['her',
 'b',
 'er',
 't',
 'Ġw',
 'est',
 'Ġne',
 'ed',
 'ed',
 'Ġf',
 're',
 's',
 'h',
 'Ġb',
 'od',
 'i',
 'es',
 'Ġbe',
 'c',
 'a',
 'u',
 'se',
 'Ġhis',
 'Ġl',
 'if',
 'e',
 'Ġwor',
 'k',
 'Ġwas',
 'Ġthe',
 'Ġre',
 'an',
 'im',
 'ation',
 'Ġof',
 'Ġthe',
 'Ġde',
 'ad',
 '.']


The farm like grounds extended back very deeply up the hill, almost to Wheaton Street.


['the',
 'Ġf',
 'ar',
 'm',
 'Ġl',
 'i',
 'ke',
 'Ġg',
 'r',
 'ound',
 's',
 'Ġex',
 't',
 'e',
 'nd',
 'ed',
 'Ġb',
 'ac',
 'k',
 'Ġ',
 'very',
 'Ġde',
 'e',
 'p',
 'ly',
 'Ġup',
 'Ġthe',
 'Ġh',
 'ill',
 ',',
 'Ġal',
 'm',
 'o',
 'st',
 'Ġto',
 'Ġwhe',
 'at',
 'on',
 'Ġst',
 're',
 'et',
 '.']


But a glance will show the fallacy of this idea.


['b',
 'ut',
 'Ġa',
 'Ġg',
 'l',
 'an',
 'ce',
 'Ġw',
 'ill',
 'Ġsh',
 'ow',
 'Ġthe',
 'Ġf',
 'all',
 'ac',
 'y',
 'Ġof',
 'Ġthis',
 'Ġi',
 'd',
 'e',
 'a',
 '.']


He had escaped me, and I must commence a destructive and almost endless journey across the mountainous ices of the ocean, amidst cold that few of the inhabitants could long endure and which I, the native of a genial and sunny climate, could not hope to survive.


['he',
 'Ġhad',
 'Ġ',
 'es',
 'c',
 'a',
 'p',
 'ed',
 'Ġme',
 ',',
 'Ġand',
 'Ġi',
 'Ġm',
 'ust',
 'Ġcom',
 'm',
 'en',
 'ce',
 'Ġa',
 'Ġd',
 'est',
 'r',
 'u',
 'ct',
 'ive',
 'Ġand',
 'Ġal',
 'm',
 'o',
 'st',
 'Ġe',
 'nd',
 'le',
 'ss',
 'Ġj',
 'our',
 'ne',
 'y',
 'Ġa',
 'c',
 'ro',
 'ss',
 'Ġthe',
 'Ġm',
 'ou',
 'nt',
 'ain',
 'ous',
 'Ġi',
 'c',
 'es',
 'Ġof',
 'Ġthe',
 'Ġo',
 'ce',
 'an',
 ',',
 'Ġa',
 'm',
 'id',
 'st',
 'Ġc',
 'o',
 'ld',
 'Ġthat',
 'Ġfe',
 'w',
 'Ġof',
 'Ġthe',
 'Ġin',
 'ha',
 'b',
 'it',
 'ant',
 's',
 'Ġcould',
 'Ġl',
 'ong',
 'Ġe',
 'nd',
 'ure',
 'Ġand',
 'Ġwhich',
 'Ġi',
 ',',
 'Ġthe',
 'Ġn',
 'at',
 'ive',
 'Ġof',
 'Ġa',
 'Ġg',
 'en',
 'i',
 'al',
 'Ġand',
 'Ġsu',
 'n',
 'n',
 'y',
 'Ġc',
 'l',
 'im',
 'ate',
 ',',
 'Ġcould',
 'Ġnot',
 'Ġh',
 'o',
 'pe',
 'Ġto',
 'Ġs',
 'ur',
 'v',
 'ive',
 '.']


To these speeches they gave, of course, their own interpretation; fancying, no doubt, that at all events I should come into possession of vast quantities of ready money; and provided I paid them all I owed, and a trifle more, in consideration of their services, I dare say they cared very little what became of either my soul or my carcass.


['t',
 'o',
 'Ġthe',
 'se',
 'Ġs',
 'pe',
 'e',
 'c',
 'he',
 's',
 'Ġthe',
 'y',
 'Ġg',
 'a',
 've',
 ',',
 'Ġof',
 'Ġc',
 'our',
 'se',
 ',',
 'Ġthe',
 'ir',
 'Ġo',
 'w',
 'n',
 'Ġin',
 'ter',
 'p',
 're',
 't',
 'ation',
 ';',
 'Ġf',
 'an',
 'c',
 'y',
 'ing',
 ',',
 'Ġno',
 'Ġd',
 'ou',
 'b',
 't',
 ',',
 'Ġthat',
 'Ġat',
 'Ġall',
 'Ġe',
 'v',
 'ent',
 's',
 'Ġi',
 'Ġsh',
 'ould',
 'Ġc',
 'ome',
 'Ġint',
 'o',
 'Ġpo',
 'ss',
 'ess',
 'ion',
 'Ġof',
 'Ġv',
 'ast',
 'Ġ',
 'qu',
 'ant',
 'it',
 'i',
 'es',
 'Ġof',
 'Ġre',
 'ad',
 'y',
 'Ġmo',
 'ne',
 'y',
 ';',
 'Ġand',
 'Ġpro',
 'v',
 'id',
 'ed',
 'Ġi',
 'Ġp',
 'a',
 'id',
 'Ġthe',
 'm',
 'Ġall',
 'Ġi',
 'Ġo',
 'w',
 'ed',
 ',',
 'Ġand',
 'Ġa',
 'Ġt',
 'ri',
 'f',
 'le',
 'Ġm',
 'ore',
 ',',
 'Ġin',
 'Ġcon',
 's',
 'id',
 'er',
 'ation',
 'Ġof',
 'Ġthe',
 'ir',
 'Ġs',
 'er',
 'v',
 'ic',
 'es',
 ',',
 'Ġi',
 'Ġd',
 'a',
 're',
 'Ġs',
 'ay',
 'Ġthe',
 'y',
 'Ġc',
 'a',
 'red',
 'Ġ',
 'very',
 'Ġl',
 'it',
 't',
 'le',
 'Ġw',
 'hat',



Her native sprightliness needed no undue excitement, and her placid heart reposed contented on my love, the well being of her children, and the beauty of surrounding nature.


['her',
 'Ġn',
 'at',
 'ive',
 'Ġsp',
 'r',
 'ight',
 'l',
 'in',
 'ess',
 'Ġne',
 'ed',
 'ed',
 'Ġno',
 'Ġu',
 'nd',
 'u',
 'e',
 'Ġex',
 'c',
 'it',
 'em',
 'ent',
 ',',
 'Ġand',
 'Ġher',
 'Ġpl',
 'ac',
 'id',
 'Ġhe',
 'art',
 'Ġre',
 'p',
 'o',
 's',
 'ed',
 'Ġcon',
 't',
 'ent',
 'ed',
 'Ġon',
 'Ġmy',
 'Ġlo',
 've',
 ',',
 'Ġthe',
 'Ġwe',
 'll',
 'Ġbe',
 'ing',
 'Ġof',
 'Ġher',
 'Ġc',
 'h',
 'i',
 'ld',
 're',
 'n',
 ',',
 'Ġand',
 'Ġthe',
 'Ġbe',
 'a',
 'ut',
 'y',
 'Ġof',
 'Ġs',
 'ur',
 'r',
 'ound',
 'ing',
 'Ġn',
 'at',
 'ure',
 '.']


I even went so far as to speak of a slightly hectic cough with which, at one time, I had been troubled of a chronic rheumatism of a twinge of hereditary gout and, in conclusion, of the disagreeable and inconvenient, but hitherto carefully concealed, weakness of my eyes.


['i',
 'Ġe',
 'ven',
 'Ġw',
 'ent',
 'Ġso',
 'Ġf',
 'ar',
 'Ġas',
 'Ġto',
 'Ġs',
 'pe',
 'a',
 'k',
 'Ġof',
 'Ġa',
 'Ġs',
 'l',
 'ight',
 'ly',
 'Ġhe',
 'ct',
 'ic',
 'Ġc',
 'ough',
 'Ġwith',
 'Ġwhich',
 ',',
 'Ġat',
 'Ġone',
 'Ġt',
 'im',
 'e',
 ',',
 'Ġi',
 'Ġhad',
 'Ġbeen',
 'Ġt',
 'r',
 'ou',
 'ble',
 'd',
 'Ġof',
 'Ġa',
 'Ġc',
 'h',
 'r',
 'on',
 'ic',
 'Ġr',
 'he',
 'um',
 'at',
 'is',
 'm',
 'Ġof',
 'Ġa',
 'Ġt',
 'w',
 'ing',
 'e',
 'Ġof',
 'Ġhe',
 'red',
 'it',
 'ar',
 'y',
 'Ġg',
 'out',
 'Ġand',
 ',',
 'Ġin',
 'Ġcon',
 'c',
 'l',
 'us',
 'ion',
 ',',
 'Ġof',
 'Ġthe',
 'Ġdis',
 'ag',
 're',
 'e',
 'able',
 'Ġand',
 'Ġin',
 'c',
 'on',
 'ven',
 'i',
 'ent',
 ',',
 'Ġbut',
 'Ġh',
 'it',
 'her',
 't',
 'o',
 'Ġc',
 'a',
 're',
 'f',
 'u',
 'll',
 'y',
 'Ġcon',
 'ce',
 'a',
 'led',
 ',',
 'Ġwe',
 'a',
 'k',
 'ne',
 'ss',
 'Ġof',
 'Ġmy',
 'Ġe',
 'y',
 'es',
 '.']


His facial aspect, too, was remarkable for its maturity; for though he shared his mother's and grandfather's chinlessness, his firm and precociously shaped nose united with the expression of his large, dark, almost Latin eyes to give him an air of quasi adulthood and well nigh preternatural intelligence.


['h',
 'is',
 'Ġf',
 'ac',
 'i',
 'al',
 'Ġas',
 'pe',
 'ct',
 ',',
 'Ġto',
 'o',
 ',',
 'Ġwas',
 'Ġre',
 'm',
 'ar',
 'k',
 'able',
 'Ġfor',
 'Ġit',
 's',
 'Ġm',
 'at',
 'ur',
 'ity',
 ';',
 'Ġfor',
 'Ġth',
 'ough',
 'Ġhe',
 'Ġs',
 'ha',
 'red',
 'Ġhis',
 'Ġm',
 'ot',
 'her',
 "'s",
 'Ġand',
 'Ġg',
 'ra',
 'nd',
 'f',
 'at',
 'her',
 "'s",
 'Ġc',
 'h',
 'in',
 'le',
 'ss',
 'ne',
 'ss',
 ',',
 'Ġhis',
 'Ġf',
 'ir',
 'm',
 'Ġand',
 'Ġpre',
 'c',
 'o',
 'c',
 'i',
 'ous',
 'ly',
 'Ġs',
 'ha',
 'p',
 'ed',
 'Ġno',
 'se',
 'Ġun',
 'it',
 'ed',
 'Ġwith',
 'Ġthe',
 'Ġex',
 'p',
 're',
 'ss',
 'ion',
 'Ġof',
 'Ġhis',
 'Ġl',
 'ar',
 'ge',
 ',',
 'Ġd',
 'ar',
 'k',
 ',',
 'Ġal',
 'm',
 'o',
 'st',
 'Ġl',
 'at',
 'in',
 'Ġe',
 'y',
 'es',
 'Ġto',
 'Ġg',
 'ive',
 'Ġhim',
 'Ġan',
 'Ġa',
 'ir',
 'Ġof',
 'Ġ',
 'qu',
 'as',
 'i',
 'Ġa',
 'd',
 'ul',
 'th',
 'o',
 'od',
 'Ġand',
 'Ġwe',
 'll',
 'Ġn',
 'i',
 'gh',
 'Ġpre',
 'ter',
 'n',
 'at',
 'ur',
 'al',
 'Ġint',
 'e',
 'll',
 'ig',
 'en',
 'ce',
 




In [43]:
a = [tokenizer.encode(f'{separator_samples}{t}').tokens for t in spooky_train.text]

In [44]:
a[0]

['############<',
 'new',
 '_',
 'sample',
 '>############',
 'this',
 'Ġpro',
 'cess',
 ',',
 'Ġhowever',
 ',',
 'Ġafforded',
 'Ġme',
 'Ġno',
 'Ġmeans',
 'Ġof',
 'Ġasc',
 'ertain',
 'ing',
 'Ġthe',
 'Ġdimens',
 'ions',
 'Ġof',
 'Ġmy',
 'Ġdun',
 'ge',
 'on',
 ';',
 'Ġas',
 'Ġi',
 'Ġmight',
 'Ġmake',
 'Ġits',
 'Ġcirc',
 'uit',
 ',',
 'Ġand',
 'Ġreturn',
 'Ġto',
 'Ġthe',
 'Ġpoint',
 'Ġwhen',
 'ce',
 'Ġi',
 'Ġset',
 'Ġout',
 ',',
 'Ġwithout',
 'Ġbeing',
 'Ġaware',
 'Ġof',
 'Ġthe',
 'Ġfact',
 ';',
 'Ġso',
 'Ġperfect',
 'ly',
 'Ġun',
 'if',
 'orm',
 'Ġseemed',
 'Ġthe',
 'Ġwall',
 '.']

## Add a Regular Tokenizer to compare

In [11]:
tokenizer = KerasTokenizer(num_words=15000,
                          lower=True,
                          char_level=False)
tokenizer.fit_on_texts(spooky_train.text)

spooky_train['keras_token_ids'] = tokenizer.texts_to_sequences(spooky_train.text)
spooky_train.keras_token_ids = [[min(s, tokenizer.num_words) for s in seq] for seq in spooky_train.keras_token_ids]

spooky_test['keras_token_ids'] = tokenizer.texts_to_sequences(spooky_test.text)
spooky_test.keras_token_ids = [[min(s, tokenizer.num_words) for s in seq] for seq in spooky_test.keras_token_ids]

# Estimate a reasonable sequence's max_length

In [46]:
lengths = pd.Series([len(t) for t in spooky_train.token_ids])

lengths.quantile(np.linspace(0, 1, num=10))

0.000000       5.0
0.111111      14.0
0.222222      20.0
0.333333      25.0
0.444444      31.0
0.555556      37.0
0.666667      43.0
0.777778      53.0
0.888889      68.0
1.000000    1081.0
dtype: float64

## Enforce a sequence's max_length

In [48]:
# `tokenizer` tokenizer
max_length = 50

spooky_train.token_ids = spooky_train.token_ids.apply(lambda a: (a + [0]*(max_length - len(a)))[:max_length])
spooky_test.token_ids = spooky_test.token_ids.apply(lambda a: (a + [0]*(max_length - len(a)))[:max_length])

In [None]:
# Keras tokenizer
max_length = 50

lengths = pd.Series([len(tokens) for tokens in spooky_train.keras_token_ids])
spooky_train.keras_token_ids = spooky_train.keras_token_ids.apply(lambda a: (a + [0]*(max_length - len(a)))[:max_length])

spooky_test.token_ids = spooky_test.keras_token_ids.apply(lambda a: (a + [0]*(max_length - len(a)))[:max_length])

## Create train-/test sets

In [49]:
token_column = 'token_ids'

assert token_column in ['token_ids', 'keras_token_ids']

In [50]:
x_train = np.array(spooky_train[token_column].tolist())
y_train = to_categorical(np.array(spooky_train.author_id.tolist()))

x_test = np.array(spooky_test[token_column].tolist())

In [51]:
permutation = np.random.permutation(len(x_train))
x_train = x_train[permutation]
y_train = y_train[permutation]

## Create Model

In [63]:
num_words = tokenizer.get_vocab_size() if not isinstance(tokenizer, KerasTokenizer) else tokenizer.num_words

# create model
model = SequentialModel()
model.add(Embedding(input_dim=num_words, output_dim=200))
model.add(LSTM(units=128, dropout=0.2, recurrent_dropout=0.15))
model.add(Dense(100))
model.add(Dense(3, activation='softmax'))

model.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

## Fit Model

In [60]:
# fit model
model.fit(x=x_train, 
          y=y_train,
          validation_split=0.3,
          batch_size=32,
          epochs=10)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 13705 samples, validate on 5874 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x23330860cc0>

## Evaluate Model

In [None]:
model.evaluate(x_test, y_test, batch_size=32)