**Simple Neural Network using a Custom (hardcoded) dataset**
- Step 1) Load the corpus (user input - hardcoded here) for which you want prediction from the model trained in imdb dataset.
- Step 2) Create the One Hot Representation for the entire corpus
- Step 3) Pad sentences = Pad the sentences (from the corpus) with 0's wherever the number of words is less than the 10 (i.e. max_words_in_sentence)
- Step 4) Create the Simple Neural Network model
- Step 5) Do prediction




In [7]:
# Imports
from tensorflow.keras.layers import Embedding
from tensorflow.keras.utils import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing.text import one_hot

import numpy as np

In [8]:
# Step 1: Load the corpus (user input - hardcoded here) for which you want prediction from the model trained in imdb dataset.

# maximum_vocabulary_size - this is the maximum number of words we want to use in our vocabulary. 
#  - If your dataset only has 2,000 unique words, setting 10,000 just allocates extra unused capacity
#  - This is a hyperparameter and you can change it as per your requirement.
# - Common ranges:
# - Small dataset → 5k–10k
# - Medium → 20k–50k
# - Large → 100k+

# max_words_in_sentence = maximum words allowed in a sentence
# - This is also a hyperparameter
# - Most of the sentences below have 4 to 5 words.
# - We will pad the sentence with 0's which for sentences having less than 10 words.

# max_size_feature_matrix_dimension 
# - Sometimes we create a Vector Matrix (text to numerical conversion) using OHE but that has some drawbacks like Sparse matrix (leading to overfitting)
# - In this example we are using a Embedding Vector Matrix to convert the text in corpus to numerical data.
# - The numerical Embedding Vector Matrix is created by the co-relation between (1) Feature Matrix and (2) Words in the corpus/sentences.
# - The Feature matrix is a built in functionality provided by Tensor Flow.

maximum_vocabulary_size=10000
max_words_in_sentence=8
max_size_feature_matrix_dimension = 10
sentences=[  'the glass of milk',
     'the glass of juice',
     'the cup of tea',
    'I am a good boy',
     'I am a good developer',
     'understand the meaning of words',
     'your videos are good',]
sentences

['the glass of milk',
 'the glass of juice',
 'the cup of tea',
 'I am a good boy',
 'I am a good developer',
 'understand the meaning of words',
 'your videos are good']

In [None]:
# Step 2: Create the One Hot Representation for the entire corpus
# - This will convert each sentence into a list of integers, 
# - where each integer represents the index of a word where it will be 1 (according to OHE)
# - e.g. an index=3091 will represent that the 3091st value = '1' and rest will be '0' in the OHE vector for that word.
# - [] makes this a list = [one_hot(words,maximum_vocabulary_size)for words in sentences]
one_hot_representation=[one_hot(words,maximum_vocabulary_size)for words in sentences]
one_hot_representation

[[400, 3932, 7297, 2793],
 [400, 3932, 7297, 8799],
 [400, 903, 7297, 2752],
 [4420, 8689, 8042, 9851, 3584],
 [4420, 8689, 8042, 9851, 6264],
 [8369, 400, 9976, 7297, 3753],
 [9956, 2373, 7355, 9851]]

In [10]:
# Step 3 - padded_sentences = Pad the sentences (from the corpus) with 0's wherever the number of words is less than the 10 (i.e. max_words_in_sentence)
# pre = pad with 0's at the beginning of the sentence
# post = pad with 0's at the end of the sentence
padded_sentences=pad_sequences(one_hot_representation,padding='pre',maxlen=max_words_in_sentence)
print(padded_sentences)

[[   0    0    0    0  400 3932 7297 2793]
 [   0    0    0    0  400 3932 7297 8799]
 [   0    0    0    0  400  903 7297 2752]
 [   0    0    0 4420 8689 8042 9851 3584]
 [   0    0    0 4420 8689 8042 9851 6264]
 [   0    0    0 8369  400 9976 7297 3753]
 [   0    0    0    0 9956 2373 7355 9851]]


In [11]:
# Step 4) Create the Simple Neural Network model
# - Optimizer = adam
# - Loss = mse (mean squared error)
simple_neural_network=Sequential()
simple_neural_network.add(Embedding(maximum_vocabulary_size,max_size_feature_matrix_dimension,input_length=max_words_in_sentence))
simple_neural_network.compile('adam','mse')

simple_neural_network.summary()

In [12]:
# Step 5) Do prediction
simple_neural_network.predict(padded_sentences[0])


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 128ms/step


array([[-0.04227723,  0.03486586,  0.04653002, -0.03302176,  0.02290274,
        -0.02983887,  0.02397608,  0.03534326, -0.01489697, -0.01348675],
       [-0.04227723,  0.03486586,  0.04653002, -0.03302176,  0.02290274,
        -0.02983887,  0.02397608,  0.03534326, -0.01489697, -0.01348675],
       [-0.04227723,  0.03486586,  0.04653002, -0.03302176,  0.02290274,
        -0.02983887,  0.02397608,  0.03534326, -0.01489697, -0.01348675],
       [-0.04227723,  0.03486586,  0.04653002, -0.03302176,  0.02290274,
        -0.02983887,  0.02397608,  0.03534326, -0.01489697, -0.01348675],
       [ 0.03745766, -0.00801454,  0.0335804 ,  0.0116964 , -0.03897218,
        -0.01015998, -0.00984795,  0.04967869,  0.0184783 ,  0.04723089],
       [-0.03432703,  0.04276982,  0.0353463 ,  0.03027714,  0.01016311,
        -0.04168027,  0.03872951, -0.04527081,  0.04675138, -0.01265524],
       [-0.03685899,  0.03344798, -0.03624899,  0.03031379,  0.04982844,
         0.0132114 , -0.00453082,  0.03076727