# Chapter 3: Learning Distributed Word Embeddings and Using Them for NLP
<a id="Top"></a>

<p>In this notebook, you'll learn to load texts into Tensorflow by converting words to numbers. 
You'll learn how to train Latent Semantic Analysis (LSA) representations of words and documents using sklearn's TruncatedSVD method. You'll next train distributed word representations, also known as word embeddings, by 
building your first tensorflow neural network for NLP. You'll compare these word embeddings to similar representations
built with Latent Semantic Indexing (LSI). You'll learn how to save your embeddings for re-use, and how to load
pre-trained embeddings which you borrow from the cloud. Finally, you'll learn how to use pre-trained embeddings
for your first NLP task, categorizing documents. Along the way we'll point out many foundational techniques 
for NLP which will be helpful for you as your skills increase.</p>

## Table of Contents
1. [General Imports](#General-Imports)
1. [Learn to Load, Explore, and Preprocess a Text Corpus](#Load-Explore)
1. [Convert Words to Numbers](#Convert-Words-to-Numbers)
1. [Train Latent Semantic Analysis Representations from Corpus](#LSA)
1. [Train Skipgram Network for Word Embeddings](#Skipgrams)
1. [Examine What the Skipgram Network Has Learned](#Examine-What-the-Skipgram-Network-Has-Learned)
1. [Save Trained Embeddings for Later Use](#Save-Trained-Embeddings-for-Later-Use)
1. [Re-load Pre-Trained Embeddings](#Re-load-Pre-Trained-Embeddings)
1. [Putting It All Together: Your First NLP Task Using Embeddings and Deep Learning](#Putting-It-All-Together)

<a id="General-Imports"></a>
## General Imports

In [None]:
import nltk
import sklearn
import numpy as np
from nltk.tokenize import word_tokenize
import os
import numpy as np
import tensorflow as tf
import gensim
import spacy

[Top](#Top)

<a id="Load-Explore"></a>
## Learn How to Load, Explore, and Preprocess Text Corpus

We'll use some of our own functions to explore the text of Moby Dick, before we start applying deep learning to it.
Tip: Make sure you've downloaded the NLTK text corpora following the directions at <a href="https://www.nltk.org/data.html">https://www.nltk.org/data.html</a>

### Tour of Toolsets to Prepare Data

#### NLTK

In [None]:
#Let's use the text of Melville's novel Moby Dick as our corpus. We'll load it from the NLTK corpus library.
#Here's what the first couple sentences look like:
i = 0
for s in nltk.corpus.gutenberg.sents('melville-moby_dick.txt'):
    i += 1
    print(s)
    if i > 3:
        break

In [None]:
#Find Melville's longest sentence.  Warning: it's long indeed!
longestLen = 0
longest = []
for s in nltk.corpus.gutenberg.sents('melville-moby_dick.txt'):
    thisLen = len(s)
    if thisLen > longestLen:
        longest = s
        longestLen = thisLen
print(longest)
print("Longest sentence length = {}".format(longestLen))

In [None]:
#How about the longest word?
wlen = 0
longest = ''
for w in nltk.corpus.gutenberg.words('melville-moby_dick.txt'):
    if len(w) > wlen:
        longest = w
        wlen = len(w)
print(longest)

In [None]:
#Let's count how many unique words are in Moby Dick.
#We lower-case them first and remove punctuation. Here we're using the python string.lower() method and
#a home-rolled punctuation stripper to do this normalization. In other NLP tasks you'll do additional
#type of normalization including stripping non-ascii characters (pre-processing), stemming, and PoS tagging.
from nltk import FreqDist
from Chapter_03_utils import isPunctuation #Home-brewed function to test if token is punctuation

mobyDickWords = FreqDist(w.lower() for w in nltk.corpus.gutenberg.words('melville-moby_dick.txt') if ((isPunctuation(w) == False ) and (w not in nltk.corpus.stopwords.words('english'))))
print("There are {} word tokens in Moby Dick, of which {} are unique types.".format(len(nltk.corpus.gutenberg.words('melville-moby_dick.txt')),len(mobyDickWords)))
print("Most common words in Moby Dick excluding stop words:")
print("-" * 30)
print(mobyDickWords.most_common(10))

[Top](#Top)

#### Gensim

In [None]:
#Use gensim to get same results
texts = [[w.lower() for w in s if ((isPunctuation(w) == False ) and (w not in nltk.corpus.stopwords.words('english')))]  
                                               for s in nltk.corpus.gutenberg.sents('melville-moby_dick.txt')]
from collections import defaultdict #The nice thing about defaultdict objects is you don't have to initialize their values
frequencies = defaultdict(int)
for text in texts:
    for token in text:
        frequencies[token] += 1

In [None]:
#Take advantage of the heapq structure to choose the words with highest counts
import heapq
print(heapq.nlargest(10, frequencies.items(), key=lambda x : x[1]))

In [None]:
#Take advantage of gensim Dictionary object to make an efficient mapping between terms and integer ids
gensim_dictionary = gensim.corpora.Dictionary(texts)                                                                    
print("The dictionary has: " +str(len(gensim_dictionary)) + " tokens")

for (i, (k, v)) in enumerate(gensim_dictionary.token2id.items()):
     if i < 10:
        print(f'{k:{15}} {v:{10}}')

[Top](#Top)

#### SpaCy

In [None]:
#Peek into spaCy
# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")

In [None]:
raw_text = nltk.corpus.gutenberg.raw('melville-moby_dick.txt')[:100000]
doc = nlp(raw_text)

# Analyze syntax: examine most frequent verbs
verbFrequencies = defaultdict(int)
for token in doc:
    if token.pos_ == 'VERB':
        verbFrequencies[token.lemma_] += 1
print(heapq.nlargest(10, verbFrequencies.items(), key=lambda x : x[1]))

[Top](#Top)

<a id="Convert-Words-to-Numbers"></a>
## Convert Words to Numbers
<i>This is the first step in preparing the text data to be fed into a neural network or other machine learning model.</i>

In [None]:
#Explore bag-of-words vectors for the first few sentences of Moby Dick
sentences = [s for (i, s) in enumerate(nltk.corpus.gutenberg.sents('melville-moby_dick.txt')) if i in [2, 3]]
print(sentences)

In [None]:
#Tabulate the vocabulary of this mini-corpus so we can use it to create the vectorizer
vocabulary_lists = [[w.lower() for w in s if isPunctuation(w) == False] for s in sentences]
vocabulary = set([item for sublist in vocabulary_lists for item in sublist])
print(vocabulary)

In [None]:
#Vectorize using bag-of-words representations for each document
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(binary=True, tokenizer=lambda doc: doc, preprocessor=None, vocabulary=vocabulary, lowercase=False)
tdm = vectorizer.transform(vocabulary_lists)
print(tdm.shape)
print(vectorizer.get_feature_names())
print(tdm.todense()[0])

[Top](#Top)

<a id="LSA"></a>
## Train Latent Semantic Analysis Word Representations from Corpus

Taking our inspiration from:
<a href="https://roshansanthosh.wordpress.com/2016/02/18/evaluating-term-and-document-similarity-using-latent-semantic-analysis/">Evaluating Term and Document Similarity Using Latent Semantic Analysis</a>

In [None]:
#Import some relevant packages
from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer

In [None]:
#Make termXdocument matrix from top 10000 words
vocab = [t for (t, f) in mobyDickWords.most_common(10000)]
print(vocab[:10])

In [None]:
#Create a sparse vectorizer using TFIDF weights and the vocab list we just created
sparseVectorizer = TfidfVectorizer(vocabulary = vocab, tokenizer=lambda doc: doc, use_idf=True, lowercase=False)

In [None]:
#Start transforming Moby Dick from raw form into a corpus of documents we can feed into the TruncatedSVD
# fetch a list of sentences
sentences = [s for s in nltk.corpus.gutenberg.sents('melville-moby_dick.txt')]

In [None]:
# Preprocess: lowercase and remove punctuation
docs = [[w.lower() for w in s if isPunctuation(w) == False] for s in sentences]
print(docs[:2])

In [None]:
#Create the term X document matrix from the list of pre-processed sentences 
X = sparseVectorizer.fit_transform(docs)
print(X.shape)

In [None]:
#Import sklearn pipeline class and Normalizer function
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Normalizer
svd = TruncatedSVD(n_components = 100, algorithm="arpack")
lsa = make_pipeline(svd, Normalizer(copy=False))

In [None]:
#Reduce the dimensionality of the sparse matrix using LSA
lsa_X = lsa.fit_transform(X.T)
print(X.shape)
print(lsa_X.shape)

In [None]:
def getClosestTerm(term,transformer,model):
 
    index = transformer.vocabulary_[term]      
 
    model = np.dot(model,model.T)
    searchSpace =np.concatenate( (model[index][:index] , model[index][(index+1):]) )  
 
    out = np.argmax(searchSpace)
 
    if out<index:
        return transformer.get_feature_names()[out]
    else:
        return transformer.get_feature_names()[(out+1)]
 

In [None]:
def kClosestTerms(k,term,transformer,model):
 
    index = transformer.vocabulary_[term]
 
    model = np.dot(model,model.T)
 
    closestTerms = {}
    for i in range(len(model)):
        if i != index:
            closestTerms[transformer.get_feature_names()[i]] = model[index][i]
 
    sortedList = sorted(closestTerms , key= lambda l : closestTerms[l], reverse=True)
    
    return(sortedList[:10])

In [None]:
#What has the LSA model learned about Ahab?
(kClosestTerms(8, 'ahab', sparseVectorizer, lsa_X))

In [None]:
print([d for d in docs if 'gritted' in d])

In [None]:
#What has the lsa model learned about whales?
kClosestTerms(8, 'whale', sparseVectorizer, lsa_X)

In [None]:
print([d for d in docs if 'cachalot' in d])

[Top](#Top)

<a id="Skipgrams"></a>
## Train Skipgram Network for Word Embeddings
<i>Thanks to <a href="https://adventuresinmachinelearning.com/word2vec-keras-tutorial/">Adventures in Deep Learning's blog</a> for inspiring this section.</i>

### Specialized Imports

In [None]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Reshape, dot
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing.sequence import skipgrams
from tensorflow.keras.preprocessing import sequence
from Chapter_03_utils import build_dataset
from nltk import FreqDist, word_tokenize
from nltk.corpus import gutenberg #Corpus readers for various literary texts
from Chapter_03_utils import isPunctuation #Home-brewed function to test if token is punctuation
from Chapter_03_utils import SimilarityCallback

### Constants and Magic Numbers
To be used throughout the embeddings network training

In [None]:
vocabSize = 2000 #How many words should we include as input?  Top n-most frequent
windowSize = 3 #How many context words should be included on either side of the target word when choosing the skipgram pairs
vectorDim = 300 #How many dimensions are included in the embedding space
epochs = 1000 #Number of training epochs
evalSetSize = 16     # Random set of words to evaluate similarity on.
evalSetWindow = 100  # Only pick similarity evaluation samples from the head of the distribution (frequent terms).

In [None]:
#Function to build lexical resources for deep learning network
def collect_data(corpusName, vocabulary_size=10000):
    '''
    learn and return a list of integer codes corresponding to the words of the text,
    the term frequencies, and a regular and reverse dictionary of terms and integer codes.
    Modified from https://adventuresinmachinelearning.com/word2vec-keras-tutorial/
    '''
    #Retrieve the requested text from the NLTK corpora collections
    words = [w.lower() for w in nltk.corpus.gutenberg.words(corpusName) if isPunctuation(w) == False]
    print(words[:7])
    data, count, dictionary, reverse_dictionary = build_dataset(words,
                                                                vocabulary_size)
    del words  # Hint to reduce memory.
    return data, count, dictionary, reverse_dictionary

[Top](#Top)

### Collect data from corpus, encode, build lexical resources

In [None]:
#Convert the raw text data into a list of integer codes, a frequency distribution of the terms, and a
# dictionary and reverse_dictionary of terms:integer_codes
data, count, dictionary, reverse_dictionary = collect_data('melville-moby_dick.txt', vocabulary_size=vocabSize)
print(data[:7])

In [None]:
#Choose n number of indices from the top m most frequent terms to use to test similarity at different points in the training
evalExamples = np.random.choice(evalSetWindow, evalSetSize, replace=False)

[Top](#Top)

### Sample pairs of term-integer codes as skipgrams (target term, context term)

In [None]:
#Now build the list of skipgrams to use for training, both positive and negative examples
#Sampling frequencies for positive and negative skipgrams examples
sampling_table = sequence.make_sampling_table(vocabSize)
skipgramPairs, labels = skipgrams(data, vocabSize, window_size=windowSize, sampling_table=sampling_table)
word_target, word_context = zip(*skipgramPairs)
word_target = np.array(word_target, dtype="int32")
word_context = np.array(word_context, dtype="int32")

print(skipgramPairs[:10], labels[:10])

In [None]:
#Examine the distribution of 1s and 0s in the skipgram pairs
from collections import Counter
print(Counter(labels).keys())
print(Counter(labels).values())

In [None]:
#Substitute numbers found in the cell above
print(reverse_dictionary[25])
print(reverse_dictionary[883])

In [None]:
print(reverse_dictionary[141])
print(reverse_dictionary[38])

In [None]:
#Reassure yourself that the labeling of the skipgrams is accurate
from Chapter_03_utils import is_Sublist
for s in nltk.corpus.gutenberg.sents('melville-moby_dick.txt'):
    if is_Sublist(s, ['they', 'look'])==True:
        print(s)

[Top](#Top)

### Define and build input layers using tf.keras functional API

<p>We're doing several tricky things here.  First, we're using the tf.keras functional api to define our model because we have two separate input laters, so the standard Sequential() api won't work for us. We instantiate each layer as a fucntion call whose argument is the list of layers feeding it.</p>
<p>Secondly we define a separate model -- the similarityModel -- which we don't compile and train separately. We just hook it up to our SimilarityCallback function, so we can test word similarity at different points during the training to see how much our network has learned.</p>

In [None]:
# create 2 input layers with one input node each, for the skipgram target and context word codes
input_target = Input((1,), name="input_target")
input_context = Input((1,), name="input_context")

In [None]:
#Create the embedding layer and two lookup layers for target and context
embedding = Embedding(vocabSize, vectorDim, input_length=1, name='embedding')
target = embedding(input_target)
reshaped_target = Reshape((vectorDim, 1), name="reshaped_target_embedding")(target)
context = embedding(input_context)
reshaped_context = Reshape((vectorDim, 1), name="reshaped_context_embedding")(context)

In [None]:
# Define cosine similarity operation which will be output in a secondary model
similarity = dot([reshaped_target, reshaped_context], axes=1, normalize=True)

In [None]:
# Create dot product layer for main model to get a similarity measure between target embedding and context embedding
dot_product = dot([reshaped_target, reshaped_context], axes=1, normalize=False, name="dot_product")
reshaped_dot_product = Reshape((1,), name="reshaped_dot_product")(dot_product)

In [None]:
# add the sigmoid output layer
output = Dense(1, activation='sigmoid', name="output")(reshaped_dot_product)

In [None]:
# create the primary training model
learningModel = Model(inputs=[input_target, input_context], outputs=output, name="learningModel")
learningModel.compile(loss='binary_crossentropy', optimizer='adam')

In [None]:
# create a secondary validation model to run our similarity checks during training
similarityModel = Model(inputs=[input_target, input_context], outputs=similarity, name="similarityModel")
sim_cb = SimilarityCallback()

In [None]:
learningModel.summary()

In [None]:
#Tip: Make sure you install pyplot and graphviz before attempting this step
from tensorflow.keras.utils import plot_model
plot_model(learningModel, to_file='model.png')

[Top](#Top)

### Train Network with Skipgram Samples from Corpus

In [None]:
#Finally, train the learning model, evaluating word similarity at specified epochs
history = [] #Store loss for plotting
arr_1 = np.zeros((1,))
arr_2 = np.zeros((1,))
arr_3 = np.zeros((1,))
for cnt in range(epochs):
       
    idx = np.random.randint(0, len(labels)-1)
    arr_1[0,] = word_target[idx]
    arr_2[0,] = word_context[idx]
    arr_3[0,] = labels[idx]

    loss = learningModel.train_on_batch([arr_1, arr_2], arr_3)
  
    if cnt % 100 == 0:
        print("Iteration {}, loss={}".format(cnt, loss))
        history.append((cnt, loss))
    #Test what the model has learned at beginning and end of training    
    if cnt in [0, (epochs - 1)]:
        for testWord in ['ahab', 'whale', 'harpoon', 'boy', 'coffin']:
            sim_cb.probe_word(testWord, dictionary, reverse_dictionary, vocabSize, similarityModel)
        sim_cb.run_sim(evalSetSize, evalExamples, reverse_dictionary, vocabSize, similarityModel)

[Top](#Top)

<a id="Examine-What-the-Skipgram-Network-Has-Learned"></a>
## Examine What the Skipgram Network Has Learned

In [None]:
# Plot training & validation loss values

import matplotlib.pyplot as plt
N = 10 #Size of moving average window
y_losses = [l for (e,l) in history]
x_epochs = [e for (e,l) in history]
history_step_size = 100
ma_epochs = x_epochs[:(epochs - (N * history_step_size))]
loss_line = plt.plot(x_epochs, y_losses)
#Also plot the moving average, to show the trend more clearly
mae_line = plt.plot(mae, np.convolve(y_losses, np.ones((N,))/N, mode='valid'))
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(labels = ['Loss', 'Moving Average Loss'])
#plt.show()
plt.savefig('loss_plot.png')

[Top](#Top)

<a id="Save-Trained-Embeddings-for-Later-Use"></a>
## Save Trained Embeddings for Later Use

In [None]:
#Layer.get_weights() returns a list of numpy ndarrays containing the weights
embedding_weights = embedding.get_weights()[0]
print(type(embedding_weights))
print(embedding_weights.shape)
print("First value in weights matrix = {}.".format(embedding_weights[0][0]))

In [None]:
#Write the weights array to file
filename = 'moby_weights.csv'
np.savetxt(filename, embedding_weights, delimiter=",")

In [None]:
#Let's examine the size of the file to see what it contains
def file_len(fname):
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

size = os.path.getsize(filename)
print("Weights file is {} bytes.".format(size))
rows = file_len(filename)
print("Weights file has {} rows.".format(rows))

[Top](#Top)

<a id="Re-load-Pre-Trained-Embeddings"></a>
## Re-load Pre-Trained Embeddings

In [None]:
with open(filename, 'rt') as fh:
    new_weights = np.loadtxt(fh, delimiter=",")
print(new_weights.shape)
print(new_weights[0][0])
embedding.set_weights([new_weights]) #set_weights() expects a list of ndarrays

[Top](#Top)

<a id="Putting-It-All-Together"></a>
## Putting It All Together: Your First NLP Task Using Deep Learning and Tensorflow

### Load Corpus of Categorized Documents

In [None]:
#We will use the pre-labeled newsgroups data provided by sklearn
from sklearn.datasets import fetch_20newsgroups
categories = ['talk.religion.misc','comp.graphics']

In [None]:
#Download some labeled newsgroup postings
dataset_train = fetch_20newsgroups(subset='train', categories=categories,
                             shuffle=True, random_state=42)
dataset_test = fetch_20newsgroups(subset='test', categories=categories,
                             shuffle=True, random_state=42)

In [None]:
#The dataset is an objbect which contains a data member (list of strings) 
#and a target member (list of integer codes)
print(Counter(dataset_train.target).keys())
print(Counter(dataset_train.target).values())
print(dataset_train.target_names)

In [None]:
#Make a binary labels list of reference category and other
ref = dataset_train.target_names.index('talk.religion.misc')

def code_ref(value):
    if value == ref:
        return 1
    else:
        return 0
    
train_y = list(map(code_ref, dataset_train.target))
test_y = list(map(code_ref, dataset_test.target))
train_y = np.asarray(train_y)
test_y = np.asarray(test_y)

In [None]:
#Here's an example of what the data look like...
print(dataset_train.data[0])

[Top](#Top)

### Load Pre-Trained Embeddings Using Gensim and Homegrown Utilities

In [None]:
#This step takes a few minutes to load...
word_vectors = gensim.models.KeyedVectors.load_word2vec_format('./data/GoogleNews-vectors-negative300.bin.gz', binary=True) 

In [None]:
words2ints = {w:(i+1) for (i, w) in enumerate(word_vectors.wv.vocab) }

In [None]:
ints2words = {(i+i):w for (i,w) in enumerate(word_vectors.wv.vocab)} 

[Top](#Top)

### Extract the keyed vectors into an array of weights
<i>This model is a dumbed-down version of the one found <a href="https://keras.io/examples/pretrained_word_embeddings/">here in the keras.io docs.</a></i>

In [None]:
#Hyperparameters
EMBEDDING_DIM = 300
MAX_NUM_WORDS = 2000
MAX_SEQUENCE_LENGTH = 300
RAW_FEATURES = 2000 #Total number of raw words to include in tokenization

In [None]:
#Get vocab list of words in the posts, train and test set
tokens = []
for post in dataset_train.data:
    for word in word_tokenize(post):
        tok = word.lower()
        if isPunctuation(tok) == False:
            tokens.append(tok)
for post in dataset_test.data:
    for word in word_tokenize(post):
        tok = word.lower()
        if isPunctuation(tok) == False:
            tokens.append(tok)
            
postsVocab = FreqDist(tokens)        
postsTerms = [w for (w, i) in postsVocab.most_common(RAW_FEATURES)]

In [None]:
## prepare embedding matrix
num_words = min(MAX_NUM_WORDS, len(postsVocab))
embedding_matrix = np.zeros(((num_words + 1), EMBEDDING_DIM))
i = 0
for (w, f) in postsVocab.most_common(MAX_NUM_WORDS):
    if w in words2ints.keys():
        embedding_vector = word_vectors.wv.get_vector(w)
        # words not found in embedding index will be all-zeros.
        embedding_matrix[i] = embedding_vector
    i += 1

In [None]:
from Chapter_03_utils import IntEncoder, terms2ints, ints2terms
postsTermsDict = terms2ints(postsTerms)
postsReverseTermsDict = ints2terms(postsTermsDict)
enc = IntEncoder(postsTermsDict, postsReverseTermsDict)

[Top](#Top)

### Encode Training and Test Data

In [None]:
sequences_train = []
for post in dataset_train.data:
    tokens = [enc.lookupCode(t) for t in [tok.lower() for tok in word_tokenize(post) if isPunctuation(tok) == False]]
    sequences_train.append(tokens)

In [None]:
sequences_test = []
for post in dataset_test.data:
    tokens = [enc.lookupCode(t) for t in [tok.lower() for tok in word_tokenize(post) if isPunctuation(tok) == False]]
    sequences_test.append(tokens)

In [None]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
train_X = pad_sequences(sequences_train, maxlen = MAX_SEQUENCE_LENGTH, padding='post', truncating='post')
test_X = pad_sequences(sequences_test, maxlen = MAX_SEQUENCE_LENGTH, padding='post', truncating='post')

[Top](#Top)

### Define Network Architecture

In [None]:
#We'll again use the keras functional api to build a network, this one to
# categorize the postings
catModel = Sequential()

In [None]:
#Make instance of embeddings layer 
from tensorflow.keras.initializers import Constant
embedding_layer = Embedding(num_words + 1,
                            EMBEDDING_DIM,
                            embeddings_initializer=Constant(embedding_matrix),
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False)

In [None]:
from tensorflow.keras.layers import Lambda

sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
catModel.add(sequence_input)
catModel.add(embedding_layer)
mean = Lambda(lambda x: tf.keras.backend.mean(x, axis=1))
catModel.add(mean)
h1 = Dense(128, activation='relu')
catModel.add(h1)
h2 = Dense(16, activation='relu')
catModel.add(h2)
output = Dense(1, activation='sigmoid')
catModel.add(output)

In [None]:
catModel.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['acc'])

In [None]:
catModel.summary()

In [None]:
history = catModel.fit(train_X, train_y,
          batch_size=128,
          epochs=50,
            validation_data=(test_X, test_y))

In [None]:
prediction = catModel.predict(test_X)

In [None]:
results = zip(prediction, test_y)

In [None]:
i = 0
for (yhat, y) in results:
    i += 1
    print("{} | {}".format(yhat, y))
    if i > 20:
        break
    

In [None]:
#Plot and examine the history

In [None]:
#Plot and discuss the confusion matrix.
# Peace with honor

[Top](#Top)