<a href="https://colab.research.google.com/github/AnandK-pm/DSA-C/blob/main/embedding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

A demonstration of embedding layer using keras sequential API

In [None]:
import keras
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential   #uses sequential API
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import Embedding
import numpy as np
# define documents
docs = ['Well done!',
 'Good work',
 'Great effort',
 'nice work',
 'Excellent!',
 'Weak',
 'Poor effort!',
 'not good',
 'poor work',
 'Could have done better.']
# define class labels
labels = np.array([1,1,1,1,1,0,0,0,0,0])

one_hot() function that creates a hash of each word as an efficient integer encoding. We will estimate the vocabulary size of 50, which is much larger than needed to reduce the probability of collisions from the hash function.

In [None]:
vocab_size = 50
encoded_docs = [one_hot(d, vocab_size) for d in docs]
print(encoded_docs)

[[44, 4], [35, 11], [22, 14], [13, 11], [45], [36], [45, 14], [5, 35], [45, 11], [30, 20, 4, 32]]


In [None]:
# pad documents to a max length of 4 words
# as each sequence is of different length ,this is done for uniformity
max_length = 4
padded_docs = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
print(padded_docs)

[[44  4  0  0]
 [35 11  0  0]
 [22 14  0  0]
 [13 11  0  0]
 [45  0  0  0]
 [36  0  0  0]
 [45 14  0  0]
 [ 5 35  0  0]
 [45 11  0  0]
 [30 20  4 32]]


Now we have to define the neural network. It consits of 3 layers
1. Embedding layer
2. Flatten layer
3. Dense layer

Embedding layer is fed with the padded docs. which contains 4*8 3D matrices.
It is flattened to 1D vector in the flatten layer as dense layer only accepts 1D vectors.
Dense layer which in this case only contains one neuron , as we only have to predict 1 or 0 (binary) produces the final output between 0 and 1.
(0.7 means a probability of being 1 is 70%)


In [None]:

# define the model
model = Sequential()
model.add(Embedding(vocab_size, 8, input_length=max_length))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# summarize the model
print(model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 4, 8)              400       
                                                                 
 flatten (Flatten)           (None, 32)                0         
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
Total params: 433 (1.69 KB)
Trainable params: 433 (1.69 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 4, 8)              400       
                                                                 

Finally we have to train the model with the actual labels we have. For that we use the fit function on the model defined with giving the padded docs as input.

In [None]:
# fit the model
model.fit(padded_docs, labels, epochs=50, verbose=0)
#epochs-This parameter specifies the number of times the entire dataset
        #will be passed forward and backward through the neural network
        #during training.

#verbose-whether to print output during training. 0-no , 1 or 2 -yes

# evaluate the model
loss, accuracy = model.evaluate(padded_docs, labels, verbose=0)
print('Accuracy: %f' % (accuracy*100))

Accuracy: 89.999998


In [None]:
# Function to analyze sentiment of a single word
def analyze_sentiment(word):
    # Preprocess the input word
    encoded_word = [one_hot(word, vocab_size)]
    padded_word = pad_sequences(encoded_word, maxlen=max_length, padding='post')
    # Predict sentiment using the trained model
    prediction = model.predict(padded_word)
    # Return sentiment prediction (0 for negative, 1 for positive)
    return 1 if prediction > 0.5 else 0
input_word = input("Enter a word to analyze sentiment: ")
sentiment = analyze_sentiment(input_word)
if sentiment == 1:
    print("Positive sentiment")
else:
    print("Negative sentiment")

Enter a word to analyze sentiment: nice effort
Positive sentiment
