## __Neural networks with imdb data set__

In [28]:
import tensorflow as tf
from tensorflow import keras
import numpy as np

_Loading the data_

In [29]:
data = keras.datasets.imdb

_As our data is large we will remove the reviews which have count less than 10000, and divide the data for training and testing_

In [30]:
(train_data,train_labels),(test_data,test_labels)=data.load_data(num_words=10000)

In [31]:
train_data[0][:15]

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4]

_As seen above the data does not look like the words movie review but here each int is refering to a unique word_

__So, How would we understand the out put?__

In [32]:
word_index=data.get_word_index()# This is a dictionaray so we will take the words out
word_index={k:(v+3) for k,v in word_index.items()} 
word_index["<PAD>"]=0
word_index["<START>"]=1
word_index["<UNK>"]=2
word_index["<UNUSED>"]=3

_Now we can access the no by the word but, we want to access words by no. so we will inter change the key to value and value to key_

In [33]:
reverse_word_index=dict([value,key] for (key,value) in word_index.items())

_we also need to decode a single review in one go and join the words to make it meaning full, there fore the following function will do this for us_

In [34]:
def decode_review(text):
    return " ".join([reverse_word_index.get(i,"?") for i in text])

_So this is the first review_

In [35]:
decode_review(test_data[0])

"<START> please give this one a miss br br <UNK> <UNK> and the rest of the cast rendered terrible performances the show is flat flat flat br br i don't know how michael madison could have allowed this one on his plate he almost seemed to know this wasn't going to work out and his performance was quite <UNK> so all you madison fans give this a miss"

_If we think in practical way so our data is not gping to be of the same length for each review but, our model does not support input of this type_

__So we will make all the data of same length by remove the extra words and adding the <PAD  untill it become of the same length__ so we will choose any arbitary value let's say 250

In [36]:
train_data=keras.preprocessing.sequence.pad_sequences(train_data,value=word_index["<PAD>"],padding="post",maxlen=250)
test_data=keras.preprocessing.sequence.pad_sequences(test_data,value=word_index["<PAD>"],padding="post",maxlen=250)

### __Defining the model__

In [37]:
model=keras.Sequential()
model.add(keras.layers.Embedding(100000,16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16,activation="relu"))
model.add(keras.layers.Dense(1,activation="sigmoid"))

+ Here the embedding layer will make the vector(in 16 dem space) of all(10000) words closer to simpler one and further from different one
+ GlobalAveragePooling1D change the 16D data in 1D

In [38]:
model.summary()
model.compile(optimizer="adam",loss="binary_crossentropy",metrics=["accuracy"])

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, None, 16)          1600000   
                                                                 
 global_average_pooling1d_1  (None, 16)                0         
  (GlobalAveragePooling1D)                                       
                                                                 
 dense_2 (Dense)             (None, 16)                272       
                                                                 
 dense_3 (Dense)             (None, 1)                 17        
                                                                 
Total params: 1600289 (6.10 MB)
Trainable params: 1600289 (6.10 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


_Again we spilt the data :10000 and 10000:  so that our model get train by rectification in testing the training data_

In [39]:
x_val=train_data[:10000]
x_train=train_data[10000:]

y_val=test_labels[:10000]
y_train=train_labels[10000:]

_Let's train our data_

In [40]:
fitModel = model.fit(x_train,y_train,epochs=40,batch_size=512,validation_data=(x_val,y_val),verbose=1)

Epoch 1/40


Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


In [41]:
results=model.evaluate(test_data,test_labels)
results

  1/782 [..............................] - ETA: 22s - loss: 0.2891 - accuracy: 0.8750



[0.3285709619522095, 0.871720016002655]

_Let's check the orignal review the updated one and the predicted one_

In [45]:
test_review = test_data[0:5]
predicted=model.predict([test_review])
print("Review :")
print(decode_review(test_review[0]))
print("Prediction : "+str(predicted[0]))
print("Actual : "+str(test_labels[0]))
print("Loss:",results[0],"accuracy:",results[1])

Review :
<START> please give this one a miss br br <UNK> <UNK> and the rest of the cast rendered terrible performances the show is flat flat flat br br i don't know how michael madison could have allowed this one on his plate he almost seemed to know this wasn't going to work out and his performance was quite <UNK> so all you madison fans give this a miss <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> 

#### __Let's save our model__

In [47]:
model.save("model.h5")

_Let's chcek our model for any other review which we will give in_

__reloading the model__

In [49]:
model=keras.models.load_model("model.h5")

_Let's load our review_

In [57]:
def review_encode(s):
    encoded = [1]
    for word in s:
        if word.lower() in word_index:
            encoded.append(word_index[word.lower()])
        else:
            encoded.append(2)
    return encoded

In [58]:
with open("Volunter review.txt",encoding="utf-8") as f:
    for line in f.readlines():
        nline =line.replace(",","").replace("(","").replace(".","").replace("\"","").replace(":","").replace(")","").strip().split()
        encode = review_encode(nline)
        encode=keras.preprocessing.sequence.pad_sequences([encode],value=word_index["<PAD>"],padding="post",maxlen=250)
        predict=model.predict(encode)
        print(line)
        print(encode)
        print(predict[0])
        

Of all the animation classics from the Walt Disney Company, there is perhaps none that is more celebrated than "The Lion King." Its acclaim is understandable: this is quite simply a glorious work of art."The Lion King" gets off to a fantastic start. The film's opening number, "The Circle of Life," is outstanding. The song lasts for about four minutes, but from the first sound, the audience is floored. Not even National Geographic can capture something this beautiful and dramatic. Not only is this easily the greatest moment in film animation, this is one of the greatest sequences in film history. The story that follows is not as majestic, but the film has to tell a story. Actually, the rest of the film holds up quite well. The story takes place in Africa, where the lions rule. Their king, Mufasa (James Earl Jones) has just been blessed with a son, Simba (Jonathan Taylor Thomas), who goes in front of his uncle Scar (Jeremy Irons) as next in line for the throne. Scar is furious, and sets 