## PREP: DOWNLOAD SPACY FOR ENGLISH

We need to install a library that is fantastic at quickly parsing through text, called SPACY. Then we'll need to download the English package or Spacy to know which language to use. This is very cool, but it's going to take a while...

In [1]:
# !conda install -c conda-forge spacy

In [2]:
# IF THIS CELL RUNS FINE, THE INSTALL WORKED!
# import spacy

# NOW WE NEED TO DOWNLOAD THE ENGLISH LANGUAGE

In [3]:
# !python -m spacy download en

# BUILDING OUR EXQUISITE CORPSE

In [4]:
import pandas
pandas.set_option('display.max_colwidth', 300)

train_lines = pandas.read_csv('data/sentences.csv', encoding='utf-8')
train_lines[100:105]


Unnamed: 0,line
100,"Curiosity, earnest research to learn the hidden laws of nature, gladness akin to rapture, as they were unfolded to me, are among the earliest sensations I can remember."
101,How pleased you would be to remark the improvement of our Ernest.
102,"The possession of these treasures gave me extreme delight; I now continually studied and exercised my mind upon these histories, whilst my friends were employed in their ordinary occupations."
103,"I read of men concerned in public affairs, governing or massacring their species."
104,Will you hear it? Most willingly.


In [5]:
import Helpers
helper = Helpers.Exquisite_Corpse

In [6]:
import spacy
encoder = spacy.load('en')

train_lines['tokens'] = helper.text_to_tokens(train_lines['line'], encoder)
train_lines[['line','tokens']][100:105]

Unnamed: 0,line,tokens
100,"Curiosity, earnest research to learn the hidden laws of nature, gladness akin to rapture, as they were unfolded to me, are among the earliest sensations I can remember.","[ , curiosity, ,, earnest, research, to, learn, the, hidden, laws, of, nature, ,, gladness, akin, to, rapture, ,, as, they, were, unfolded, to, me, ,, are, among, the, earliest, sensations, i, can, remember, .]"
101,How pleased you would be to remark the improvement of our Ernest.,"[how, pleased, you, would, be, to, remark, the, improvement, of, our, ernest, .]"
102,"The possession of these treasures gave me extreme delight; I now continually studied and exercised my mind upon these histories, whilst my friends were employed in their ordinary occupations.","[ , the, possession, of, these, treasures, gave, me, extreme, delight, ;, i, now, continually, studied, and, exercised, my, mind, upon, these, histories, ,, whilst, my, friends, were, employed, in, their, ordinary, occupations, .]"
103,"I read of men concerned in public affairs, governing or massacring their species.","[i, read, of, men, concerned, in, public, affairs, ,, governing, or, massacring, their, species, .]"
104,Will you hear it? Most willingly.,"[will, you, hear, it, ?, , most, willingly, .]"


In [7]:
import pickle
import os

lexicon = helper.make_lexicon(token_seqs=train_lines['tokens'], min_freq=2)

filename = 'data/sentences_lexicon.pkl'

if not os.path.exists(filename):
    open(filename, 'w+').close()

with open(filename, 'wb') as f:
    pickle.dump(lexicon, f)

lexicon sample (4221 total items):
[('actually', 206), ('thrush', 1270), ('supple', 2318), ('longer', 2319), ('private', 2320)]


In [8]:

filename = 'data/sentence_lexicon.pkl'

if not os.path.exists(filename):
    open(filename, 'w+').close()
    
with open(filename, 'wb') as f:
    pickle.dump(lexicon, f)
    f.close()

In [9]:
lexicon["fellows"]

1640

In [10]:
print(type(lexicon))
print(len(lexicon))

<class 'dict'>
4221


In [11]:
lexicon_lookup = helper.get_lexicon_lookup(lexicon)

In [12]:
train_lines['line_ids'] = helper.tokens_to_ids(all_tokens=train_lines['tokens'], lexicon=lexicon)
train_lines[['tokens','line_ids']][500:510]

Unnamed: 0,tokens,line_ids
500,"[everything, nourishes, what, is, strong, already, .]","[185, 1, 3881, 4058, 748, 351, 262]"
501,"[i, covered, it, carefully, with, dry, wood, and, leaves, and, placed, wet, branches, upon, it, ;, and, then, ,, spreading, my, cloak, ,, i, lay, on, the, ground, and, sank, into, sleep, .]","[868, 1702, 1204, 859, 1189, 1070, 561, 4070, 2577, 4070, 1389, 3865, 3092, 3978, 1204, 691, 4070, 2681, 1330, 1, 2921, 150, 1330, 868, 1114, 3510, 1896, 2962, 4070, 1277, 2839, 1759, 262]"
502,"[ , he, made, no, answer, ,, and, they, were, again, silent, till, they, had, gone, down, the, dance, ,, when, he, asked, her, if, she, and, her, sisters, did, not, very, often, walk, to, meryton, .]","[2275, 2294, 2500, 3128, 1777, 1330, 4070, 3863, 3124, 2137, 3391, 2628, 3863, 2284, 1018, 1900, 1896, 3118, 1330, 3053, 2294, 501, 1463, 1168, 2680, 4070, 1463, 2196, 2810, 281, 2034, 1602, 3168, 3700, 3307, 262]"
503,"[the, rain, continued, the, whole, evening, without, intermission, ;, jane, certainly, could, not, come, back, .]","[1896, 1453, 3572, 1896, 2508, 2217, 3579, 1, 691, 741, 2207, 2052, 281, 2748, 904, 262]"
504,"[in, a, doleful, voice, , bennet, began, the, projected, conversation, :, oh, .]","[1929, 873, 1, 3613, 2275, 267, 3574, 1896, 1, 1316, 2607, 319, 262]"
505,"[i, shall, tell, colonel, forster, it, will, be, quite, a, shame, if, he, does, not, .]","[868, 3519, 3915, 3729, 524, 1204, 1433, 1575, 372, 873, 2768, 1168, 2294, 3798, 281, 262]"
506,"[and, you, may, be, certain, when, i, have, the, honour, of, seeing, her, again, ,, i, shall, speak, in, the, very, highest, terms, of, your, modesty, ,, economy, ,, and, other, amiable, qualification, .]","[4070, 1633, 1926, 1575, 1339, 3053, 868, 3921, 1896, 4007, 1954, 318, 1463, 2137, 1330, 868, 3519, 1635, 1929, 1896, 2034, 184, 2707, 1954, 3850, 3048, 1330, 1, 1330, 4070, 293, 1308, 1, 262]"
507,"[their, eyes, were, immediately, wandering, up, in, the, street, in, quest, of, the, officers, ,, and, nothing, less, than, a, very, smart, bonnet, indeed, ,, or, a, really, new, muslin, in, a, shop, window, ,, could, recall, them, .]","[2098, 3512, 3124, 2048, 4066, 43, 1929, 1896, 3516, 1929, 2130, 1954, 1896, 4048, 1330, 4070, 1017, 785, 2389, 873, 2034, 1501, 1, 609, 1330, 4135, 873, 3583, 2993, 1, 1929, 873, 800, 2687, 1330, 2052, 509, 1208, 262]"
508,"[ , the, astonishment, which, i, had, at, first, experienced, on, this, discovery, soon, gave, place, to, delight, and, rapture, .]","[2275, 1896, 223, 1892, 868, 2284, 2926, 2392, 2859, 3510, 1948, 3576, 3781, 1634, 2955, 3700, 2967, 4070, 3421, 262]"
509,"[yet, she, appeared, confident, in, innocence, and, did, not, tremble, ,, although, gazed, on, and, execrated, by, thousands, ,, for, all, the, kindness, which, her, beauty, might, otherwise, have, excited, was, obliterated, in, the, minds, of, the, spectators, by, the, imagination, of, the, eno...","[1587, 2680, 1185, 40, 1929, 3675, 4070, 2810, 281, 1866, 1330, 2703, 337, 3510, 4070, 1, 1003, 540, 1330, 2988, 2506, 1896, 672, 1892, 1463, 16, 465, 4095, 3921, 2911, 991, 3849, 1929, 1896, 1652, 1954, 1896, 1, 1003, 1896, 2910, 1954, 1896, 1834, 2680, 991, 4122, 3700, 3921, 2544, 262]"


In [13]:
from keras.preprocessing.sequence import pad_sequences

max_length = max( [len(ids) for ids in train_lines['line_ids']])

train_padded_ids = pad_sequences(train_lines['line_ids'], maxlen=max_length)
print(train_padded_ids)

print("SHAPE:", train_padded_ids.shape)

Using TensorFlow backend.


[[   0    0    0 ..., 1463 3681  262]
 [   0    0    0 ..., 1896 2754  262]
 [   0    0    0 ...,  745  636  262]
 ..., 
 [   0    0    0 ..., 3052 1463  262]
 [   0    0    0 ...,    1  344  262]
 [   0    0    0 ..., 1896 2007  262]]
SHAPE: (4000, 142)


In [14]:
pandas.DataFrame( list(zip(["-"] + train_lines['tokens'].loc[0], 
                      train_lines['tokens'].loc[0])),
                 columns=['input word', 'output word'])

Unnamed: 0,input word,output word
0,-,but
1,but,still
2,still,he
3,he,would
4,would,be
5,be,her
6,her,husband
7,husband,.


In [15]:
print(pandas.DataFrame(list(zip(train_padded_ids[0,:-1], train_padded_ids[0, 1:])), columns=['input words','output words']))

     input words  output words
0              0             0
1              0             0
2              0             0
3              0             0
4              0             0
5              0             0
6              0             0
7              0             0
8              0             0
9              0             0
10             0             0
11             0             0
12             0             0
13             0             0
14             0             0
15             0             0
16             0             0
17             0             0
18             0             0
19             0             0
20             0             0
21             0             0
22             0             0
23             0             0
24             0             0
25             0             0
26             0             0
27             0             0
28             0             0
29             0             0
..           ...           ...
111     

In [16]:
from keras.models import Model
from keras.layers import Input, Dense, TimeDistributed
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import GRU

def create_model(seq_input_len, n_input_nodes, n_embedding_nodes, n_hidden_nodes, stateful=False, batch_size=None):

    input_layer = Input(batch_shape=(batch_size, seq_input_len), name='input_layer')
    
    embedding_layer = Embedding(input_dim=n_input_nodes,
                               output_dim=n_embedding_nodes,
                               mask_zero=True, name='embedding_layer')(input_layer)
    
    gru_layer1 = GRU(n_hidden_nodes,
                    return_sequences=True,
                    stateful=stateful,
                    name='hidden_layer1')(embedding_layer)
    
    gru_layer2 = GRU(n_hidden_nodes,
                    return_sequences=True,
                    stateful=stateful,
                    name='hidden_layer2')(gru_layer1)
    
    output_layer = TimeDistributed(Dense(n_input_nodes, activation="softmax"),
                                  name='output_layer')(gru_layer2)
    
    model = Model(inputs=input_layer, outputs=output_layer)
    
    model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
    
    return model
    

In [17]:
model = create_model(seq_input_len=train_padded_ids.shape[-1] - 1,
                    n_input_nodes = len(lexicon) + 1,
                    n_embedding_nodes = 250,
                    n_hidden_nodes = 400)

In [18]:
len(lexicon)

4221

In [19]:
model.fit(x=train_padded_ids[:, :-1], 
          y=train_padded_ids[:, 1:, None], 
          epochs=10,
          batch_size=20)

model.save_weights('corpse_weights.h5')


Epoch 1/10
Epoch 2/10
Epoch 3/10


Epoch 4/10
Epoch 5/10


Epoch 6/10
Epoch 7/10


Epoch 8/10
Epoch 9/10


Epoch 10/10
