In [94]:
from gensim.models import Word2Vec
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Dense, SimpleRNN, Dropout, LSTM, GRU
from tensorflow.keras.callbacks import EarlyStopping, TensorBoard
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Text prediction using SimpleRNN

In [2]:
corpus="""
[ nominal delivery draft, 6 August 2014 ]

Cybersecurity as Realpolitik
Dan Geer


Good morning and thank you for the invitation to speak with you
today.  The plaintext of this talk has been made available to the
organizers.  While I will not be taking questions today, you are
welcome to contact me later and I will do what I can to reply.  For
simple clarity, let me repeat the abstract for this talk:

   Power exists to be used.  Some wish for cyber safety, which they
   will not get.  Others wish for cyber order, which they will not
   get.  Some have the eye to discern cyber policies that are "the
   least worst thing;" may they fill the vacuum of wishful thinking.

There are three professions that beat their practitioners into a
state of humility: farming, weather forecasting, and cyber security.
I practice two of those, and, as such, let me assure you that the
recommendations which follow are presented in all humility.  Humility
does not mean timidity.  Rather, it means that when a strongly held
belief is proven wrong, that the humble person changes their mind.
I expect that my proposals will result in considerable push-back,
and changing my mind may well follow.  Though I will say it again
later, this speech is me talking for myself.

As if it needed saying, cyber security is now a riveting concern,
a top issue in many venues more important than this one.  This is
not to insult Black Hat; rather it is to note that every speaker,
every writer, every practitioner in the field of cyber security who
has wished that its topic, and us with it, were taken seriously has
gotten their wish.  Cyber security *is* being taken seriously,
which, as you well know is not the same as being taken usefully,
coherently, or lastingly.  Whether we are talking about laws like
the Digital Millenium Copyright Act or the Computer Fraud and Abuse
Act, or the non-lawmaking but perhaps even more significant actions
that the Executive agencies are undertaking, "we" and the cyber
security issue have never been more at the forefront of policy.
And you ain't seen nothing yet.

I wish that I could tell you that it is still possible for one
person to hold the big picture firmly in their mind's eye, to track
everything important that is going on in our field, to make few if
any sins of omission.  It is not possible; that phase passed sometime
in the last six years.  I have certainly tried to keep up but I
would be less than candid if I were not to say that I know that I
am not keeping up, not even keeping up with what is going on in my
own country much less all countries.  Not only has cybersecurity
reached the highest levels of attention, it has spread into nearly
every corner.  If area is the product of height and width, then the
footprint of cybersecurity has surpassed the grasp of any one of us.

The rate of technological change is certainly a part of it.  When
younger people ask my advice on what they should do or study to
make a career in cyber security, I can only advise specialization.
Those of us who were in the game early enough and who have managed
to retain an over-arching generalist knowledge can't be replaced
very easily because while absorbing most new information most of
the time may have been possible when we began practice, no person
starting from scratch can do that now.  Serial specialization is
now all that can be done in any practical way.  Just looking at the
Black Hat program will confirm that being really good at any one
of the many topics presented here all but requires shutting out the
demands of being good at any others.

Why does that matter?  Speaking for myself, I am not interested in
the advantages or disadvantages of some bit of technology unless I
can grasp how it is that that technology works.  Whenever I see
marketing material that tells me all the good things that adopting
this or that technology makes possible, I remember what George
Santayana said, that "Scepticism is the chastity of the intellect;
it is shameful to give it up too soon, or to the first comer." I
suspect that a majority of you have similar skepticism -- "It's
magic!" is not the answer a security person will ever accept.  By
and large, I can tell *what* something is good for once I know *how*
it works.  Tell me how it works and then, but only then, tell me
why you have chosen to use those particular mechanisms for the
things you have chosen to use them for.

Part of my feeling stems from a long-held and well-substantiated
belief that all cyber security technology is dual use.  Perhaps
dual use is a truism for any and all tools from the scalpel to the
hammer to the gas can -- they can be used for good or ill -- but I
know that dual use is inherent in cyber security tools.  If your
definition of "tool" is wide enough, I suggest that the cyber
security tool-set favors offense these days.  Chris Inglis, recently
retired NSA Deputy Director, remarked that if we were to score cyber
the way we score soccer, the tally would be 462-456 twenty minutes
into the game,[CI] i.e., all offense.  I will take his comment as
confirming at the highest level not only the dual use nature of
cybersecurity but also confirming that offense is where the innovations
that only States can afford is going on.

Nevertheless, this essay is an outgrowth from, an extension of,
that increasing importance of cybersecurity.  With the humility of
which I spoke, I do not claim that I have the last word.  What I
do claim is that when we speak about cybersecurity policy we are
no longer engaging in some sort of parlor game.  I claim that policy
matters are now the most important matters, that once a topic area,
like cybersecurity, becomes interlaced with nearly every aspect of
life for nearly everybody, the outcome differential between good
policies and bad policies broadens, and the ease of finding answers
falls.  As H.L. Mencken so trenchantly put it, "For every complex
problem there is a solution that is clear, simple, and wrong."

The four verities of government are these:
. Most important ideas are unappealing
. Most appealing ideas are unimportant
. Not every problem has a good solution
. Every solution has side effects
"""
corpus=corpus.replace('\n',' ')

In [3]:
from nltk.tokenize import sent_tokenize
sentences=sent_tokenize(corpus)
sentences

[' [ nominal delivery draft, 6 August 2014 ]  Cybersecurity as Realpolitik Dan Geer   Good morning and thank you for the invitation to speak with you today.',
 'The plaintext of this talk has been made available to the organizers.',
 'While I will not be taking questions today, you are welcome to contact me later and I will do what I can to reply.',
 'For simple clarity, let me repeat the abstract for this talk:     Power exists to be used.',
 'Some wish for cyber safety, which they    will not get.',
 'Others wish for cyber order, which they will not    get.',
 'Some have the eye to discern cyber policies that are "the    least worst thing;" may they fill the vacuum of wishful thinking.',
 'There are three professions that beat their practitioners into a state of humility: farming, weather forecasting, and cyber security.',
 'I practice two of those, and, as such, let me assure you that the recommendations which follow are presented in all humility.',
 'Humility does not mean timidity

In [4]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts(sentences)
len(tokenizer.word_index),tokenizer.word_index

(457,
 {'the': 1,
  'that': 2,
  'of': 3,
  'i': 4,
  'is': 5,
  'to': 6,
  'and': 7,
  'not': 8,
  'for': 9,
  'it': 10,
  'in': 11,
  'cyber': 12,
  'a': 13,
  'are': 14,
  'you': 15,
  'security': 16,
  'will': 17,
  'can': 18,
  'have': 19,
  'good': 20,
  'has': 21,
  'all': 22,
  'every': 23,
  'or': 24,
  'cybersecurity': 25,
  'as': 26,
  'this': 27,
  'be': 28,
  'me': 29,
  'we': 30,
  'what': 31,
  'if': 32,
  'but': 33,
  'any': 34,
  'use': 35,
  'with': 36,
  'do': 37,
  'which': 38,
  'they': 39,
  'my': 40,
  'at': 41,
  'only': 42,
  'most': 43,
  'some': 44,
  'wish': 45,
  'their': 46,
  'humility': 47,
  'when': 48,
  'person': 49,
  'now': 50,
  'important': 51,
  'one': 52,
  'were': 53,
  'being': 54,
  'know': 55,
  'tell': 56,
  'possible': 57,
  'on': 58,
  'up': 59,
  'from': 60,
  'technology': 61,
  'dual': 62,
  'been': 63,
  'policies': 64,
  'may': 65,
  'into': 66,
  'those': 67,
  'well': 68,
  'more': 69,
  'who': 70,
  'us': 71,
  'taken': 72,
  'pol

In [5]:
sentences[0]

' [ nominal delivery draft, 6 August 2014 ]  Cybersecurity as Realpolitik Dan Geer   Good morning and thank you for the invitation to speak with you today.'

In [6]:
print(tokenizer.texts_to_sequences([sentences[0]])[0])

[148, 149, 150, 151, 152, 153, 25, 26, 154, 155, 156, 20, 157, 7, 158, 15, 9, 1, 159, 6, 84, 36, 15, 85]


In [7]:
encodedSentences = []
for sentence in sentences:
    encoded = tokenizer.texts_to_sequences([sentence])[0]
    # Create input-output pairs for next word prediction
    for i in range(len(encoded)-1):
        if i != 0 and i<=5:
            seq = encoded[:i]          # input sequence
            target = encoded[i]        # next word to predict
            encodedSentences.append((seq, target))

encodedSentences

[([148], 149),
 ([148, 149], 150),
 ([148, 149, 150], 151),
 ([148, 149, 150, 151], 152),
 ([148, 149, 150, 151, 152], 153),
 ([1], 160),
 ([1, 160], 3),
 ([1, 160, 3], 27),
 ([1, 160, 3, 27], 86),
 ([1, 160, 3, 27, 86], 21),
 ([87], 4),
 ([87, 4], 17),
 ([87, 4, 17], 8),
 ([87, 4, 17, 8], 28),
 ([87, 4, 17, 8, 28], 164),
 ([9], 89),
 ([9, 89], 169),
 ([9, 89, 169], 90),
 ([9, 89, 169, 90], 29),
 ([9, 89, 169, 90, 29], 170),
 ([44], 45),
 ([44, 45], 9),
 ([44, 45, 9], 12),
 ([44, 45, 9, 12], 174),
 ([44, 45, 9, 12, 174], 38),
 ([93], 45),
 ([93, 45], 9),
 ([93, 45, 9], 12),
 ([93, 45, 9, 12], 175),
 ([93, 45, 9, 12, 175], 38),
 ([44], 19),
 ([44, 19], 1),
 ([44, 19, 1], 94),
 ([44, 19, 1, 94], 6),
 ([44, 19, 1, 94, 6], 176),
 ([95], 14),
 ([95, 14], 184),
 ([95, 14, 184], 185),
 ([95, 14, 184, 185], 2),
 ([95, 14, 184, 185, 2], 186),
 ([4], 96),
 ([4, 96], 192),
 ([4, 96, 192], 3),
 ([4, 96, 192, 3], 67),
 ([4, 96, 192, 3, 67], 7),
 ([47], 99),
 ([47, 99], 8),
 ([47, 99, 8], 196),
 ([1

In [8]:
x = []
y = []
for seq, target in encodedSentences:
    x.append(seq)
    y.append(target)
x[:10], y[:10]

([[148],
  [148, 149],
  [148, 149, 150],
  [148, 149, 150, 151],
  [148, 149, 150, 151, 152],
  [1],
  [1, 160],
  [1, 160, 3],
  [1, 160, 3, 27],
  [1, 160, 3, 27, 86]],
 [149, 150, 151, 152, 153, 160, 3, 27, 86, 21])

In [9]:
import numpy as np
x = np.array(pad_sequences(x, maxlen=5, padding='pre'))
y=np.array(y)
x

array([[  0,   0,   0,   0, 148],
       [  0,   0,   0, 148, 149],
       [  0,   0, 148, 149, 150],
       ...,
       [  0,   0,   0,   0,  23],
       [  0,   0,   0,  23,  83],
       [  0,   0,  23,  83,  21]])

In [10]:
x.shape

(239, 5)

In [11]:
model = Sequential([
    Embedding(input_dim=len(tokenizer.word_index)+1, output_dim=10, input_length=5),
    SimpleRNN(32,activation='relu'),
    Dropout(0.05),
    Dense(len(tokenizer.word_index)+1, activation='softmax')
])

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.build(input_shape=(None, 5))
model.summary()



In [12]:
earlystopping=EarlyStopping(monitor='val_loss',patience=20,restore_best_weights=True)
model.fit(x, y, batch_size=32, epochs=200, callbacks=[earlystopping])

Epoch 1/200
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 16ms/step - accuracy: 0.0202 - loss: 6.1261   
Epoch 2/200
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.0559 - loss: 6.1138 
Epoch 3/200
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.0515 - loss: 6.0988 
Epoch 4/200


  current = self.get_monitor_value(logs)


[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.0388 - loss: 6.0742 
Epoch 5/200
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.0554 - loss: 6.0106 
Epoch 6/200
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.0518 - loss: 5.8455 
Epoch 7/200
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.0327 - loss: 5.4691     
Epoch 8/200
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.0389 - loss: 5.1182 
Epoch 9/200
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.0344 - loss: 4.9345 
Epoch 10/200
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.0413 - loss: 4.9472 
Epoch 11/200
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.0639 - loss: 4.8043 
Epoch 12/200
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

<keras.src.callbacks.history.History at 0x168bb846ab0>

- Those of us who are backing out our remaining dependencies on digital goods and services are being entirely rational and are likely to survive.
- I say that because the root cause of risk is dependence, and most especially dependence on expectations of system state.
- If I don't use my trademark, then my rights go over to those who use what was and could have remained mine.
- For better or poorer, the only two products not covered by product liability today are religion and software, and software should not escape for much longer.

<bR><br>

- There are three professions that beat their practitioners into a state of humility: farming, weather forecasting, and cyber security. I practice two of those, and, as such, let me assure you that the recommendations which follow are presented in all humility.  Humility does not mean timidity.  Rather, it means that when a strongly held belief is proven wrong, that the humble person changes their mind. I expect that my proposals will result in considerable push-back, and changing my mind may well follow.  Though I will say it again later, this speech is me talking for myself.

In [13]:
input=["Whether we are talking about laws","Whether we are talking about","Whether we are talking","Whether we are","Whether we","Whether"]
for line in input:
    print('\n',line)
    input=line.split(' ')
    if len(input) >=5:
        input=tokenizer.texts_to_sequences([input[-5:]])[0]
        print(input)
        input=np.array(input).reshape(1,5)
    else:
        input=tokenizer.texts_to_sequences([input])[0]
        input=np.array(pad_sequences([input], maxlen=5, padding='pre'))
        print(input)
        input=np.array(input).reshape(1,5)
    # model.predict(input)
    prediction = model.predict(input)
    top_n = 5
    top_indices = prediction[0].argsort()[-top_n:][::-1]
    top_words = [(tokenizer.index_word.get(i, "<UNK>"), prediction[0][i]) for i in top_indices]

    print("Top predictions:")
    for word, score in top_words:
        print(f"{word}: {score:.4f}")


 Whether we are talking about laws
[30, 14, 106, 116, 232]
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 126ms/step
Top predictions:
that: 0.9517
a: 0.0482
tell: 0.0000
advice: 0.0000
seriously: 0.0000

 Whether we are talking about
[231, 30, 14, 106, 116]
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step
Top predictions:
laws: 0.9986
to: 0.0004
be: 0.0002
taken: 0.0002
can: 0.0002

 Whether we are talking
[[  0 231  30  14 106]]
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
Top predictions:
about: 0.9957
my: 0.0030
a: 0.0009
to: 0.0002
an: 0.0001

 Whether we are
[[  0   0 231  30  14]]
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
Top predictions:
talking: 0.9829
has: 0.0080
his: 0.0052
mean: 0.0018
my: 0.0007

 Whether we
[[  0   0   0 231  30]]
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
Top predictions:
are: 0.8230
i: 0.0453
trenchantly: 0.0334
has: 0.0200
for: 0.

<br><br><br><br>

# Without Embedding Layer

In [37]:
from nltk.tokenize import sent_tokenize
sentences=sent_tokenize(corpus)
sentences

[' [ nominal delivery draft, 6 August 2014 ]  Cybersecurity as Realpolitik Dan Geer   Good morning and thank you for the invitation to speak with you today.',
 'The plaintext of this talk has been made available to the organizers.',
 'While I will not be taking questions today, you are welcome to contact me later and I will do what I can to reply.',
 'For simple clarity, let me repeat the abstract for this talk:     Power exists to be used.',
 'Some wish for cyber safety, which they    will not get.',
 'Others wish for cyber order, which they will not    get.',
 'Some have the eye to discern cyber policies that are "the    least worst thing;" may they fill the vacuum of wishful thinking.',
 'There are three professions that beat their practitioners into a state of humility: farming, weather forecasting, and cyber security.',
 'I practice two of those, and, as such, let me assure you that the recommendations which follow are presented in all humility.',
 'Humility does not mean timidity

In [38]:
sentences[0]

' [ nominal delivery draft, 6 August 2014 ]  Cybersecurity as Realpolitik Dan Geer   Good morning and thank you for the invitation to speak with you today.'

In [39]:
from nltk import word_tokenize
wordTokenizedSentence=[]
for sentence in sentences:
    wordTokenizedSentence.append(word_tokenize(sentence))

x=[]
y=[]
for sentence in wordTokenizedSentence:
    print(sentence)
    for i in range(len(sentence)):
        j=i+1
        while j <= 5:
            try:
                y.append(sentence[j])
                x.append(sentence[i:j])
                j+=1
                print(sentence[i:j],'=>',sentence[j])
            except:
                j+=1
                continue


['[', 'nominal', 'delivery', 'draft', ',', '6', 'August', '2014', ']', 'Cybersecurity', 'as', 'Realpolitik', 'Dan', 'Geer', 'Good', 'morning', 'and', 'thank', 'you', 'for', 'the', 'invitation', 'to', 'speak', 'with', 'you', 'today', '.']
['[', 'nominal'] => delivery
['[', 'nominal', 'delivery'] => draft
['[', 'nominal', 'delivery', 'draft'] => ,
['[', 'nominal', 'delivery', 'draft', ','] => 6
['[', 'nominal', 'delivery', 'draft', ',', '6'] => August
['nominal', 'delivery'] => draft
['nominal', 'delivery', 'draft'] => ,
['nominal', 'delivery', 'draft', ','] => 6
['nominal', 'delivery', 'draft', ',', '6'] => August
['delivery', 'draft'] => ,
['delivery', 'draft', ','] => 6
['delivery', 'draft', ',', '6'] => August
['draft', ','] => 6
['draft', ',', '6'] => August
[',', '6'] => August
['The', 'plaintext', 'of', 'this', 'talk', 'has', 'been', 'made', 'available', 'to', 'the', 'organizers', '.']
['The', 'plaintext'] => of
['The', 'plaintext', 'of'] => this
['The', 'plaintext', 'of', 'this']

In [40]:
x[:5],y[:5]

([['['],
  ['[', 'nominal'],
  ['[', 'nominal', 'delivery'],
  ['[', 'nominal', 'delivery', 'draft'],
  ['[', 'nominal', 'delivery', 'draft', ',']],
 ['nominal', 'delivery', 'draft', ',', '6'])

In [41]:
emeddingModel=Word2Vec(wordTokenizedSentence, vector_size=15, window=10, min_count=1, workers=4) 

In [42]:
emeddingModel.wv['[']

array([ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
       -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
        0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686],
      dtype=float32)

In [43]:
import numpy as np
embeddings_x = []
embeddings_y = []

for sentence in x:
    embeddings = []
    for word in sentence:
        try:
            embeddings.append(emeddingModel.wv[word])  # Get the word2vec embedding
        except:
            continue
    embeddings_x.append(np.array(embeddings))

embeddings_x[0]

array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
        -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
         0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686]],
      dtype=float32)

In [44]:
tokenizer.fit_on_texts(y)
print(len(tokenizer.word_index),tokenizer.word_index)
for word in y:
    try:
        e=tokenizer.texts_to_sequences([word])[0][0]
    except:
        e=0
    embeddings_y.append(e)
embeddings_y=np.array(embeddings_y)

459 {'that': 1, 'of': 2, 'the': 3, 'is': 4, 'i': 5, 'are': 6, 'it': 7, 'to': 8, 'a': 9, 'not': 10, 'has': 11, 'and': 12, 'will': 13, 'for': 14, 'cyber': 15, 'in': 16, 'my': 17, 'security': 18, 'black': 19, 'me': 20, 'can': 21, 'good': 22, 'as': 23, 'this': 24, 'all': 25, 'be': 26, 'when': 27, 'being': 28, 'tell': 29, 'from': 30, 'cybersecurity': 31, 'which': 32, 'have': 33, 'an': 34, 'you': 35, 'hat': 36, 'were': 37, 'am': 38, 'tool': 39, '6': 40, 'taking': 41, 'discern': 42, 'beat': 43, 'those': 44, 'again': 45, 'laws': 46, 'nothing': 47, 'keep': 48, 'advice': 49, 'works': 50, 'truism': 51, 'nsa': 52, 'claim': 53, 'talk': 54, 'let': 55, 'wish': 56, 'now': 57, 'every': 58, 'about': 59, 'possible': 60, 'use': 61, 'matters': 62, 'ideas': 63, 'do': 64, 'safety': 65, 'order': 66, 'timidity': 67, 'proposals': 68, 'saying': 69, 'insult': 70, 'who': 71, 'we': 72, 'at': 73, 'policy': 74, 'seen': 75, 'could': 76, 'reached': 77, 'product': 78, 'change': 79, 'material': 80, 'majority': 81, 'stems

In [45]:
len(embeddings_y)

743

In [46]:
embeddings_x[0]

array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
        -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
         0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686]],
      dtype=float32)

In [47]:
embeddings_x[1]

array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
        -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
         0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686],
       [-0.05683396,  0.05229538, -0.05341402,  0.0306057 ,  0.06263665,
         0.03840695,  0.04100427,  0.05531496,  0.02296231, -0.0471125 ,
         0.03942722, -0.05425759, -0.03720862,  0.03856581, -0.05377772]],
      dtype=float32)

In [48]:
embeddings_x[2]

array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
        -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
         0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686],
       [-0.05683396,  0.05229538, -0.05341402,  0.0306057 ,  0.06263665,
         0.03840695,  0.04100427,  0.05531496,  0.02296231, -0.0471125 ,
         0.03942722, -0.05425759, -0.03720862,  0.03856581, -0.05377772],
       [-0.03969079, -0.02383095,  0.02410348,  0.03382367,  0.01345079,
         0.0347387 , -0.01769079,  0.02956779, -0.04704267, -0.01040189,
        -0.05036606,  0.00243362,  0.03236112,  0.03101156, -0.03196787]],
      dtype=float32)

In [49]:
embeddings_x[3],embeddings_y[3]

(array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
         -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
          0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686],
        [-0.05683396,  0.05229538, -0.05341402,  0.0306057 ,  0.06263665,
          0.03840695,  0.04100427,  0.05531496,  0.02296231, -0.0471125 ,
          0.03942722, -0.05425759, -0.03720862,  0.03856581, -0.05377772],
        [-0.03969079, -0.02383095,  0.02410348,  0.03382367,  0.01345079,
          0.0347387 , -0.01769079,  0.02956779, -0.04704267, -0.01040189,
         -0.05036606,  0.00243362,  0.03236112,  0.03101156, -0.03196787],
        [ 0.01106247,  0.05192446, -0.03524574, -0.02650766, -0.00426547,
         -0.03063049,  0.05779214,  0.05081302,  0.06144317,  0.06326098,
         -0.05483432, -0.0015293 , -0.04956562, -0.03316838, -0.06006755]],
       dtype=float32),
 0)

In [50]:
maxsize=[]
for i in embeddings_x:
    maxsize.append(i.shape[0])
max(maxsize)

5

In [51]:
embeddings_x[0].shape

(1, 15)

In [52]:
embedding_sequences_padded_x = pad_sequences(
    embeddings_x,
    maxlen=5,
    dtype='float32',
    padding='pre'
)
embedding_sequences_padded_x

array([[[ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.06246194, -0.049209  ,  0.00407102, ...,  0.01779463,
          0.06369314,  0.00115686]],

       [[ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.06246194, -0.049209  ,  0.00407102, ...,  0.01779463,
          0.06369314,  0.00115686],
        [-0.05683396,  0.05229538, -0.05341402, ..., -0.03720862,
          0.03

In [53]:
for i in range(len(embedding_sequences_padded_x)):
    print(embeddings_x[i].shape, embedding_sequences_padded_x[i].shape)

(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(5, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(1, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(5, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(1, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(5, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(1, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(5, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(1, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) 

In [54]:
emeddingModel.corpus_total_words,embedding_sequences_padded_x.shape

(1229, (743, 5, 15))

In [64]:
model = Sequential([
    SimpleRNN(64,activation='relu', input_shape=(5, 15),return_sequences=True),
    Dropout(0.05),
    SimpleRNN(32,activation='relu'),
    Dense(744,activation='softmax')
])

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

In [58]:
embedding_sequences_padded_x.shape,embeddings_y.shape

((743, 5, 15), (743,))

In [60]:
earlystopping=EarlyStopping(monitor='val_loss',patience=20,restore_best_weights=True)
model.fit(embedding_sequences_padded_x, embeddings_y , batch_size=32, epochs=200, callbacks=[earlystopping])

Epoch 1/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7212 - loss: 0.9957
Epoch 2/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7302 - loss: 0.9521
Epoch 3/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7098 - loss: 1.0045
Epoch 4/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.7053 - loss: 0.9860
Epoch 5/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6969 - loss: 0.9433
Epoch 6/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7265 - loss: 0.9127
Epoch 7/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7379 - loss: 0.9094
Epoch 8/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7290 - loss: 0.9110
Epoch 9/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x168bb221430>

- Those of us who are backing out our remaining dependencies on digital goods and services are being entirely rational and are likely to survive.
- I say that because the root cause of risk is dependence, and most especially dependence on expectations of system state.
- If I don't use my trademark, then my rights go over to those who use what was and could have remained mine.
- For better or poorer, the only two products not covered by product liability today are religion and software, and software should not escape for much longer.

<bR><br>

- There are three professions that beat their practitioners into a state of humility: farming, weather forecasting, and cyber security. I practice two of those, and, as such, let me assure you that the recommendations which follow are presented in all humility.  Humility does not mean timidity.  Rather, it means that when a strongly held belief is proven wrong, that the humble person changes their mind. I expect that my proposals will result in considerable push-back, and changing my mind may well follow.  Though I will say it again later, this speech is me talking for myself.

In [62]:
input=["Whether we are talking about laws","Whether we are talking about","Whether we are talking","Whether we are","Whether we","Whether","like the Digital Millenium"]
for line in input:
    print('\n',line)
    input=line.split(' ')
    embeddings=[]
    if len(input) >=5:
        for word in input[len(input)-5:]:
            e=emeddingModel.wv[word]
            embeddings.append(e)
    else:
        for word in input:
            e=emeddingModel.wv[word]
            embeddings.append(e)

    embeddings=np.array(embeddings)
    
    if embeddings.shape[0] < 5:  # If embeddings has shape as (3,15)
        diff=5-embeddings.shape[0]
        arr=np.zeros((diff,15))
        embeddings=np.vstack((arr,embeddings))


    embeddings=embeddings.reshape(1, 5, 15)  # Had to reshape since the training shape was (743, 5, 15) for embedding_sequences_padded_x

    prediction = model.predict(embeddings)
    top_n = 5
    top_indices = prediction[0].argsort()[-top_n:][::-1]
    top_words = [(tokenizer.index_word.get(i, "<UNK>"), prediction[0][i]) for i in top_indices]

    print("Top predictions:")
    for word, score in top_words:
        print(f"{word}: {score:.4f}")


 Whether we are talking about laws
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 59ms/step
Top predictions:
are: 0.9999
of: 0.0000
a: 0.0000
i: 0.0000
it: 0.0000

 Whether we are talking about
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step
Top predictions:
laws: 1.0000
<UNK>: 0.0000
does: 0.0000
that: 0.0000
put: 0.0000

 Whether we are talking
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step
Top predictions:
about: 1.0000
<UNK>: 0.0000
that: 0.0000
of: 0.0000
i: 0.0000

 Whether we are
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 43ms/step
Top predictions:
talking: 0.9955
unimportant: 0.0026
unappealing: 0.0016
now: 0.0002
three: 0.0001

 Whether we
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step
Top predictions:
are: 0.9168
my: 0.0356
the: 0.0223
of: 0.0113
this: 0.0053

 Whether
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step
Top predictions:
we: 0.6891
i

<br><br><br><br>

# LSTM

In [65]:
from nltk.tokenize import sent_tokenize
sentences=sent_tokenize(corpus)
sentences

[' [ nominal delivery draft, 6 August 2014 ]  Cybersecurity as Realpolitik Dan Geer   Good morning and thank you for the invitation to speak with you today.',
 'The plaintext of this talk has been made available to the organizers.',
 'While I will not be taking questions today, you are welcome to contact me later and I will do what I can to reply.',
 'For simple clarity, let me repeat the abstract for this talk:     Power exists to be used.',
 'Some wish for cyber safety, which they    will not get.',
 'Others wish for cyber order, which they will not    get.',
 'Some have the eye to discern cyber policies that are "the    least worst thing;" may they fill the vacuum of wishful thinking.',
 'There are three professions that beat their practitioners into a state of humility: farming, weather forecasting, and cyber security.',
 'I practice two of those, and, as such, let me assure you that the recommendations which follow are presented in all humility.',
 'Humility does not mean timidity

In [66]:
sentences[0]

' [ nominal delivery draft, 6 August 2014 ]  Cybersecurity as Realpolitik Dan Geer   Good morning and thank you for the invitation to speak with you today.'

In [67]:
from nltk import word_tokenize
wordTokenizedSentence=[]
for sentence in sentences:
    wordTokenizedSentence.append(word_tokenize(sentence))

x=[]
y=[]
for sentence in wordTokenizedSentence:
    print(sentence)
    for i in range(len(sentence)):
        j=i+1
        while j <= 5:
            try:
                y.append(sentence[j])
                x.append(sentence[i:j])
                j+=1
                print(sentence[i:j],'=>',sentence[j])
            except:
                j+=1
                continue


['[', 'nominal', 'delivery', 'draft', ',', '6', 'August', '2014', ']', 'Cybersecurity', 'as', 'Realpolitik', 'Dan', 'Geer', 'Good', 'morning', 'and', 'thank', 'you', 'for', 'the', 'invitation', 'to', 'speak', 'with', 'you', 'today', '.']
['[', 'nominal'] => delivery
['[', 'nominal', 'delivery'] => draft
['[', 'nominal', 'delivery', 'draft'] => ,
['[', 'nominal', 'delivery', 'draft', ','] => 6
['[', 'nominal', 'delivery', 'draft', ',', '6'] => August
['nominal', 'delivery'] => draft
['nominal', 'delivery', 'draft'] => ,
['nominal', 'delivery', 'draft', ','] => 6
['nominal', 'delivery', 'draft', ',', '6'] => August
['delivery', 'draft'] => ,
['delivery', 'draft', ','] => 6
['delivery', 'draft', ',', '6'] => August
['draft', ','] => 6
['draft', ',', '6'] => August
[',', '6'] => August
['The', 'plaintext', 'of', 'this', 'talk', 'has', 'been', 'made', 'available', 'to', 'the', 'organizers', '.']
['The', 'plaintext'] => of
['The', 'plaintext', 'of'] => this
['The', 'plaintext', 'of', 'this']

In [68]:
x[:5],y[:5]

([['['],
  ['[', 'nominal'],
  ['[', 'nominal', 'delivery'],
  ['[', 'nominal', 'delivery', 'draft'],
  ['[', 'nominal', 'delivery', 'draft', ',']],
 ['nominal', 'delivery', 'draft', ',', '6'])

In [69]:
emeddingModel=Word2Vec(wordTokenizedSentence, vector_size=15, window=10, min_count=1, workers=4) 

In [70]:
emeddingModel.wv['[']

array([ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
       -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
        0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686],
      dtype=float32)

In [71]:
import numpy as np
embeddings_x = []
embeddings_y = []

for sentence in x:
    embeddings = []
    for word in sentence:
        try:
            embeddings.append(emeddingModel.wv[word])  # Get the word2vec embedding
        except:
            continue
    embeddings_x.append(np.array(embeddings))

embeddings_x[0]

array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
        -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
         0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686]],
      dtype=float32)

In [72]:
tokenizer.fit_on_texts(y)
print(len(tokenizer.word_index),tokenizer.word_index)
for word in y:
    try:
        e=tokenizer.texts_to_sequences([word])[0][0]
    except:
        e=0
    embeddings_y.append(e)
embeddings_y=np.array(embeddings_y)

459 {'that': 1, 'of': 2, 'is': 3, 'the': 4, 'i': 5, 'are': 6, 'it': 7, 'a': 8, 'to': 9, 'not': 10, 'has': 11, 'will': 12, 'and': 13, 'my': 14, 'cyber': 15, 'for': 16, 'in': 17, 'black': 18, 'security': 19, 'me': 20, 'can': 21, 'good': 22, 'as': 23, 'this': 24, 'all': 25, 'be': 26, 'when': 27, 'being': 28, 'tell': 29, 'from': 30, 'an': 31, 'which': 32, 'hat': 33, 'am': 34, 'tool': 35, '6': 36, 'cybersecurity': 37, 'taking': 38, 'discern': 39, 'beat': 40, 'again': 41, 'were': 42, 'laws': 43, 'nothing': 44, 'keep': 45, 'advice': 46, 'truism': 47, 'nsa': 48, 'have': 49, 'those': 50, 'works': 51, 'claim': 52, 'talk': 53, 'let': 54, 'about': 55, 'matters': 56, 'ideas': 57, 'you': 58, 'wish': 59, 'safety': 60, 'order': 61, 'timidity': 62, 'proposals': 63, 'saying': 64, 'now': 65, 'insult': 66, 'seen': 67, 'could': 68, 'possible': 69, 'reached': 70, 'product': 71, 'change': 72, 'material': 73, 'majority': 74, 'stems': 75, 'retired': 76, 'comment': 77, 'government': 78, 'unappealing': 79, 'unim

In [73]:
len(embeddings_y)

743

In [74]:
embeddings_x[0]

array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
        -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
         0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686]],
      dtype=float32)

In [75]:
embeddings_x[1]

array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
        -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
         0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686],
       [-0.05683396,  0.05229538, -0.05341402,  0.0306057 ,  0.06263665,
         0.03840695,  0.04100427,  0.05531496,  0.02296231, -0.0471125 ,
         0.03942722, -0.05425759, -0.03720862,  0.03856581, -0.05377772]],
      dtype=float32)

In [76]:
embeddings_x[2]

array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
        -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
         0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686],
       [-0.05683396,  0.05229538, -0.05341402,  0.0306057 ,  0.06263665,
         0.03840695,  0.04100427,  0.05531496,  0.02296231, -0.0471125 ,
         0.03942722, -0.05425759, -0.03720862,  0.03856581, -0.05377772],
       [-0.03969079, -0.02383095,  0.02410348,  0.03382367,  0.01345079,
         0.0347387 , -0.01769079,  0.02956779, -0.04704267, -0.01040189,
        -0.05036606,  0.00243362,  0.03236112,  0.03101156, -0.03196787]],
      dtype=float32)

In [77]:
embeddings_x[3],embeddings_y[3]

(array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
         -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
          0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686],
        [-0.05683396,  0.05229538, -0.05341402,  0.0306057 ,  0.06263665,
          0.03840695,  0.04100427,  0.05531496,  0.02296231, -0.0471125 ,
          0.03942722, -0.05425759, -0.03720862,  0.03856581, -0.05377772],
        [-0.03969079, -0.02383095,  0.02410348,  0.03382367,  0.01345079,
          0.0347387 , -0.01769079,  0.02956779, -0.04704267, -0.01040189,
         -0.05036606,  0.00243362,  0.03236112,  0.03101156, -0.03196787],
        [ 0.01106247,  0.05192446, -0.03524574, -0.02650766, -0.00426547,
         -0.03063049,  0.05779214,  0.05081302,  0.06144317,  0.06326098,
         -0.05483432, -0.0015293 , -0.04956562, -0.03316838, -0.06006755]],
       dtype=float32),
 0)

In [78]:
maxsize=[]
for i in embeddings_x:
    maxsize.append(i.shape[0])
max(maxsize)

5

In [79]:
embeddings_x[0].shape

(1, 15)

In [80]:
embedding_sequences_padded_x = pad_sequences(
    embeddings_x,
    maxlen=5,
    dtype='float32',
    padding='pre'
)
embedding_sequences_padded_x

array([[[ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.06246194, -0.049209  ,  0.00407102, ...,  0.01779463,
          0.06369314,  0.00115686]],

       [[ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.06246194, -0.049209  ,  0.00407102, ...,  0.01779463,
          0.06369314,  0.00115686],
        [-0.05683396,  0.05229538, -0.05341402, ..., -0.03720862,
          0.03

In [81]:
for i in range(len(embedding_sequences_padded_x)):
    print(embeddings_x[i].shape, embedding_sequences_padded_x[i].shape)

(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(5, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(1, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(5, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(1, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(5, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(1, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(5, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(1, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) 

In [82]:
emeddingModel.corpus_total_words,embedding_sequences_padded_x.shape

(1229, (743, 5, 15))

In [86]:
model = Sequential([
    LSTM(64,activation='relu', input_shape=(5, 15),return_sequences=True),
    Dropout(0.05),
    LSTM(32,activation='relu'),
    Dense(744,activation='softmax')
])

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

  super().__init__(**kwargs)


In [87]:
embedding_sequences_padded_x.shape,embeddings_y.shape

((743, 5, 15), (743,))

In [90]:
earlystopping=EarlyStopping(monitor='val_loss',patience=20,restore_best_weights=True)
model.fit(embedding_sequences_padded_x, embeddings_y , batch_size=32, epochs=200, callbacks=[earlystopping])

Epoch 1/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8001 - loss: 0.5974
Epoch 2/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7887 - loss: 0.6278
Epoch 3/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7627 - loss: 0.6285
Epoch 4/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7955 - loss: 0.5968
Epoch 5/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.7757 - loss: 0.6260
Epoch 6/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.7982 - loss: 0.6346
Epoch 7/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7883 - loss: 0.6250
Epoch 8/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7957 - loss: 0.6392
Epoch 9/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x168bbb1b7d0>

- Those of us who are backing out our remaining dependencies on digital goods and services are being entirely rational and are likely to survive.
- I say that because the root cause of risk is dependence, and most especially dependence on expectations of system state.
- If I don't use my trademark, then my rights go over to those who use what was and could have remained mine.
- For better or poorer, the only two products not covered by product liability today are religion and software, and software should not escape for much longer.

<bR><br>

- There are three professions that beat their practitioners into a state of humility: farming, weather forecasting, and cyber security. I practice two of those, and, as such, let me assure you that the recommendations which follow are presented in all humility.  Humility does not mean timidity.  Rather, it means that when a strongly held belief is proven wrong, that the humble person changes their mind. I expect that my proposals will result in considerable push-back, and changing my mind may well follow.  Though I will say it again later, this speech is me talking for myself.

In [92]:
input=["Whether we are talking about laws","Whether we are talking about","Whether we are talking","Whether we are","Whether we","Whether","like the Digital Millenium","like the Digital"]
for line in input:
    print('\n',line)
    input=line.split(' ')
    embeddings=[]
    if len(input) >=5:
        for word in input[len(input)-5:]:
            e=emeddingModel.wv[word]
            embeddings.append(e)
    else:
        for word in input:
            e=emeddingModel.wv[word]
            embeddings.append(e)

    embeddings=np.array(embeddings)
    
    if embeddings.shape[0] < 5:  # If embeddings has shape as (3,15)
        diff=5-embeddings.shape[0]
        arr=np.zeros((diff,15))
        embeddings=np.vstack((arr,embeddings))


    embeddings=embeddings.reshape(1, 5, 15)  # Had to reshape since the training shape was (743, 5, 15) for embedding_sequences_padded_x

    prediction = model.predict(embeddings)
    top_n = 5
    top_indices = prediction[0].argsort()[-top_n:][::-1]
    top_words = [(tokenizer.index_word.get(i, "<UNK>"), prediction[0][i]) for i in top_indices]

    print("Top predictions:")
    for word, score in top_words:
        print(f"{word}: {score:.4f}")


 Whether we are talking about laws
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 64ms/step
Top predictions:
a: 0.5678
it: 0.3311
those: 0.1011
laws: 0.0000
<UNK>: 0.0000

 Whether we are talking about
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 44ms/step
Top predictions:
laws: 1.0000
not: 0.0000
those: 0.0000
a: 0.0000
two: 0.0000

 Whether we are talking
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step
Top predictions:
about: 0.9997
<UNK>: 0.0003
are: 0.0000
laws: 0.0000
ideas: 0.0000

 Whether we are
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 45ms/step
Top predictions:
talking: 0.9959
unappealing: 0.0028
three: 0.0008
majority: 0.0004
unimportant: 0.0002

 Whether we
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 78ms/step
Top predictions:
are: 0.9526
a: 0.0123
of: 0.0098
is: 0.0067
this: 0.0057

 Whether
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 47ms/step
Top predictions:


In [93]:
model.input_shape

(None, 5, 15)

<br><br><br><br>

# GRU

In [95]:
from nltk.tokenize import sent_tokenize
sentences=sent_tokenize(corpus)
sentences

[' [ nominal delivery draft, 6 August 2014 ]  Cybersecurity as Realpolitik Dan Geer   Good morning and thank you for the invitation to speak with you today.',
 'The plaintext of this talk has been made available to the organizers.',
 'While I will not be taking questions today, you are welcome to contact me later and I will do what I can to reply.',
 'For simple clarity, let me repeat the abstract for this talk:     Power exists to be used.',
 'Some wish for cyber safety, which they    will not get.',
 'Others wish for cyber order, which they will not    get.',
 'Some have the eye to discern cyber policies that are "the    least worst thing;" may they fill the vacuum of wishful thinking.',
 'There are three professions that beat their practitioners into a state of humility: farming, weather forecasting, and cyber security.',
 'I practice two of those, and, as such, let me assure you that the recommendations which follow are presented in all humility.',
 'Humility does not mean timidity

In [96]:
sentences[0]

' [ nominal delivery draft, 6 August 2014 ]  Cybersecurity as Realpolitik Dan Geer   Good morning and thank you for the invitation to speak with you today.'

In [97]:
from nltk import word_tokenize
wordTokenizedSentence=[]
for sentence in sentences:
    wordTokenizedSentence.append(word_tokenize(sentence))

x=[]
y=[]
for sentence in wordTokenizedSentence:
    print(sentence)
    for i in range(len(sentence)):
        j=i+1
        while j <= 5:
            try:
                y.append(sentence[j])
                x.append(sentence[i:j])
                j+=1
                print(sentence[i:j],'=>',sentence[j])
            except:
                j+=1
                continue


['[', 'nominal', 'delivery', 'draft', ',', '6', 'August', '2014', ']', 'Cybersecurity', 'as', 'Realpolitik', 'Dan', 'Geer', 'Good', 'morning', 'and', 'thank', 'you', 'for', 'the', 'invitation', 'to', 'speak', 'with', 'you', 'today', '.']
['[', 'nominal'] => delivery
['[', 'nominal', 'delivery'] => draft
['[', 'nominal', 'delivery', 'draft'] => ,
['[', 'nominal', 'delivery', 'draft', ','] => 6
['[', 'nominal', 'delivery', 'draft', ',', '6'] => August
['nominal', 'delivery'] => draft
['nominal', 'delivery', 'draft'] => ,
['nominal', 'delivery', 'draft', ','] => 6
['nominal', 'delivery', 'draft', ',', '6'] => August
['delivery', 'draft'] => ,
['delivery', 'draft', ','] => 6
['delivery', 'draft', ',', '6'] => August
['draft', ','] => 6
['draft', ',', '6'] => August
[',', '6'] => August
['The', 'plaintext', 'of', 'this', 'talk', 'has', 'been', 'made', 'available', 'to', 'the', 'organizers', '.']
['The', 'plaintext'] => of
['The', 'plaintext', 'of'] => this
['The', 'plaintext', 'of', 'this']

In [98]:
x[:5],y[:5]

([['['],
  ['[', 'nominal'],
  ['[', 'nominal', 'delivery'],
  ['[', 'nominal', 'delivery', 'draft'],
  ['[', 'nominal', 'delivery', 'draft', ',']],
 ['nominal', 'delivery', 'draft', ',', '6'])

In [99]:
emeddingModel=Word2Vec(wordTokenizedSentence, vector_size=15, window=10, min_count=1, workers=4) 

In [100]:
emeddingModel.wv['[']

array([ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
       -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
        0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686],
      dtype=float32)

In [101]:
import numpy as np
embeddings_x = []
embeddings_y = []

for sentence in x:
    embeddings = []
    for word in sentence:
        try:
            embeddings.append(emeddingModel.wv[word])  # Get the word2vec embedding
        except:
            continue
    embeddings_x.append(np.array(embeddings))

embeddings_x[0]

array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
        -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
         0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686]],
      dtype=float32)

In [102]:
tokenizer.fit_on_texts(y)
print(len(tokenizer.word_index),tokenizer.word_index)
for word in y:
    try:
        e=tokenizer.texts_to_sequences([word])[0][0]
    except:
        e=0
    embeddings_y.append(e)
embeddings_y=np.array(embeddings_y)

459 {'that': 1, 'of': 2, 'is': 3, 'the': 4, 'i': 5, 'are': 6, 'it': 7, 'a': 8, 'to': 9, 'has': 10, 'not': 11, 'will': 12, 'and': 13, 'my': 14, 'black': 15, 'cyber': 16, 'for': 17, 'security': 18, 'in': 19, 'me': 20, 'can': 21, 'good': 22, 'as': 23, 'this': 24, 'all': 25, 'when': 26, 'being': 27, 'tell': 28, 'from': 29, 'be': 30, 'an': 31, 'hat': 32, 'am': 33, 'tool': 34, '6': 35, 'taking': 36, 'which': 37, 'discern': 38, 'beat': 39, 'again': 40, 'laws': 41, 'nothing': 42, 'keep': 43, 'advice': 44, 'truism': 45, 'nsa': 46, 'were': 47, 'cybersecurity': 48, 'those': 49, 'works': 50, 'claim': 51, 'talk': 52, 'let': 53, 'about': 54, 'matters': 55, 'ideas': 56, 'safety': 57, 'order': 58, 'have': 59, 'timidity': 60, 'proposals': 61, 'saying': 62, 'insult': 63, 'seen': 64, 'could': 65, 'reached': 66, 'product': 67, 'change': 68, 'material': 69, 'majority': 70, 'stems': 71, 'retired': 72, 'comment': 73, 'government': 74, 'unappealing': 75, 'unimportant': 76, 'effects': 77, 'wish': 78, 'now': 79

In [103]:
len(embeddings_y)

743

In [104]:
embeddings_x[0]

array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
        -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
         0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686]],
      dtype=float32)

In [105]:
embeddings_x[1]

array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
        -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
         0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686],
       [-0.05683396,  0.05229538, -0.05341402,  0.0306057 ,  0.06263665,
         0.03840695,  0.04100427,  0.05531496,  0.02296231, -0.0471125 ,
         0.03942722, -0.05425759, -0.03720862,  0.03856581, -0.05377772]],
      dtype=float32)

In [106]:
embeddings_x[2]

array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
        -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
         0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686],
       [-0.05683396,  0.05229538, -0.05341402,  0.0306057 ,  0.06263665,
         0.03840695,  0.04100427,  0.05531496,  0.02296231, -0.0471125 ,
         0.03942722, -0.05425759, -0.03720862,  0.03856581, -0.05377772],
       [-0.03969079, -0.02383095,  0.02410348,  0.03382367,  0.01345079,
         0.0347387 , -0.01769079,  0.02956779, -0.04704267, -0.01040189,
        -0.05036606,  0.00243362,  0.03236112,  0.03101156, -0.03196787]],
      dtype=float32)

In [107]:
embeddings_x[3],embeddings_y[3]

(array([[ 0.06246194, -0.049209  ,  0.00407102, -0.02026014, -0.04346866,
         -0.00748973, -0.0356756 ,  0.06222412, -0.03854244,  0.02709549,
          0.00292689,  0.03292041,  0.01779463,  0.06369314,  0.00115686],
        [-0.05683396,  0.05229538, -0.05341402,  0.0306057 ,  0.06263665,
          0.03840695,  0.04100427,  0.05531496,  0.02296231, -0.0471125 ,
          0.03942722, -0.05425759, -0.03720862,  0.03856581, -0.05377772],
        [-0.03969079, -0.02383095,  0.02410348,  0.03382367,  0.01345079,
          0.0347387 , -0.01769079,  0.02956779, -0.04704267, -0.01040189,
         -0.05036606,  0.00243362,  0.03236112,  0.03101156, -0.03196787],
        [ 0.01106247,  0.05192446, -0.03524574, -0.02650766, -0.00426547,
         -0.03063049,  0.05779214,  0.05081302,  0.06144317,  0.06326098,
         -0.05483432, -0.0015293 , -0.04956562, -0.03316838, -0.06006755]],
       dtype=float32),
 0)

In [108]:
maxsize=[]
for i in embeddings_x:
    maxsize.append(i.shape[0])
max(maxsize)

5

In [109]:
embeddings_x[0].shape

(1, 15)

In [110]:
embedding_sequences_padded_x = pad_sequences(
    embeddings_x,
    maxlen=5,
    dtype='float32',
    padding='pre'
)
embedding_sequences_padded_x

array([[[ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.06246194, -0.049209  ,  0.00407102, ...,  0.01779463,
          0.06369314,  0.00115686]],

       [[ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.06246194, -0.049209  ,  0.00407102, ...,  0.01779463,
          0.06369314,  0.00115686],
        [-0.05683396,  0.05229538, -0.05341402, ..., -0.03720862,
          0.03

In [111]:
for i in range(len(embedding_sequences_padded_x)):
    print(embeddings_x[i].shape, embedding_sequences_padded_x[i].shape)

(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(5, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(1, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(5, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(1, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(5, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(1, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(5, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(4, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(1, 15) (5, 15)
(1, 15) (5, 15)
(2, 15) (5, 15)
(3, 15) 

In [112]:
emeddingModel.corpus_total_words,embedding_sequences_padded_x.shape

(1229, (743, 5, 15))

In [113]:
model = Sequential([
    GRU(64,activation='relu', input_shape=(5, 15),return_sequences=True),
    Dropout(0.05),
    GRU(32,activation='relu'),
    Dense(744,activation='softmax')
])

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

  super().__init__(**kwargs)


In [114]:
embedding_sequences_padded_x.shape,embeddings_y.shape

((743, 5, 15), (743,))

In [119]:
earlystopping=EarlyStopping(monitor='val_loss',patience=20,restore_best_weights=True)
model.fit(embedding_sequences_padded_x, embeddings_y , batch_size=32, epochs=200, callbacks=[earlystopping])

Epoch 1/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8151 - loss: 0.4938
Epoch 2/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8184 - loss: 0.4426
Epoch 3/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8287 - loss: 0.4633
Epoch 4/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8207 - loss: 0.4719
Epoch 5/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8053 - loss: 0.5212
Epoch 6/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8386 - loss: 0.4516
Epoch 7/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8046 - loss: 0.4860
Epoch 8/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8251 - loss: 0.4701
Epoch 9/200
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x168ccf2c500>

- Those of us who are backing out our remaining dependencies on digital goods and services are being entirely rational and are likely to survive.
- I say that because the root cause of risk is dependence, and most especially dependence on expectations of system state.
- If I don't use my trademark, then my rights go over to those who use what was and could have remained mine.
- For better or poorer, the only two products not covered by product liability today are religion and software, and software should not escape for much longer.

<bR><br>

- There are three professions that beat their practitioners into a state of humility: farming, weather forecasting, and cyber security. I practice two of those, and, as such, let me assure you that the recommendations which follow are presented in all humility.  Humility does not mean timidity.  Rather, it means that when a strongly held belief is proven wrong, that the humble person changes their mind. I expect that my proposals will result in considerable push-back, and changing my mind may well follow.  Though I will say it again later, this speech is me talking for myself.

In [120]:
input=["Whether we are talking about laws","Whether we are talking about","Whether we are talking","Whether we are","Whether we","Whether","like the Digital Millenium","like the Digital"]
for line in input:
    print('\n',line)
    input=line.split(' ')
    embeddings=[]
    if len(input) >=5:
        for word in input[len(input)-5:]:
            e=emeddingModel.wv[word]
            embeddings.append(e)
    else:
        for word in input:
            e=emeddingModel.wv[word]
            embeddings.append(e)

    embeddings=np.array(embeddings)
    
    if embeddings.shape[0] < 5:  # If embeddings has shape as (3,15)
        diff=5-embeddings.shape[0]
        arr=np.zeros((diff,15))
        embeddings=np.vstack((arr,embeddings))


    embeddings=embeddings.reshape(1, 5, 15)  # Had to reshape since the training shape was (743, 5, 15) for embedding_sequences_padded_x

    prediction = model.predict(embeddings)
    top_n = 5
    top_indices = prediction[0].argsort()[-top_n:][::-1]
    top_words = [(tokenizer.index_word.get(i, "<UNK>"), prediction[0][i]) for i in top_indices]

    print("Top predictions:")
    for word, score in top_words:
        print(f"{word}: {score:.4f}")


 Whether we are talking about laws
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step
Top predictions:
the: 1.0000
are: 0.0000
<UNK>: 0.0000
it: 0.0000
i: 0.0000

 Whether we are talking about
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step
Top predictions:
laws: 1.0000
good: 0.0000
<UNK>: 0.0000
has: 0.0000
a: 0.0000

 Whether we are talking
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 47ms/step
Top predictions:
about: 0.9994
<UNK>: 0.0006
that: 0.0000
is: 0.0000
professions: 0.0000

 Whether we are
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 48ms/step
Top predictions:
talking: 0.9693
unappealing: 0.0206
unimportant: 0.0100
three: 0.0001
<UNK>: 0.0000

 Whether we
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step
Top predictions:
are: 0.9941
of: 0.0032
this: 0.0011
my: 0.0005
<UNK>: 0.0004

 Whether
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step
Top predictio

In [117]:
model.input_shape

(None, 5, 15)