<a href="https://colab.research.google.com/github/arssite/Datalysis/blob/main/embeding%20SEntiment%20Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Embedding
It is a technique in deep learning used to represent categorical variables as vectors of real numbers. This allows the model to learn relationships between different categories and use them for prediction or classification tasks.

Benefits of using embedding:

Reduced dimensionality: Embeddings are typically lower-dimensional than the original categorical variables, which can improve computational efficiency and reduce overfitting.
Improved interpretability: Embeddings can be visualized to understand the relationships between different categories.
Increased flexibility: Embeddings can be used with a variety of deep learning models, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs).
How embedding works:

One-hot encoding: The first step is to convert the categorical variables into one-hot encoded vectors. This means that each category is represented by a vector of zeros, except for the position corresponding to the category, which is set to 1.
Embedding layer: The one-hot encoded vectors are then passed through an embedding layer. This layer is a neural network that learns a mapping from the one-hot encoded vectors to a lower-dimensional space.
Output: The output of the embedding layer is a vector of real numbers for each category. These vectors can then be used as input to other deep learning models.
Applications of embedding:

Natural language processing: Embeddings are commonly used in natural language processing tasks such as sentiment analysis, text classification, and machine translation.
Computer vision: Embeddings can also be used in computer vision tasks such as image classification and object detection.
Recommendation systems: Embeddings can be used to learn user preferences and recommend items that the user is likely to be interested in.

In [1]:
import numpy as np

In [2]:
docs = ['go india',
'india india',
'hip hip hurray',
'jeetega bhai jeetega india jeetega',
 'dhoni dhoni',
  'modi ji ki jai']

In [3]:
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer(oov_token='<nothing>')

In [4]:
tokenizer.fit_on_texts(docs)

In [5]:
len(tokenizer.word_index)

12

In [6]:
sequences = tokenizer.texts_to_sequences(docs)
sequences

[[6, 2], [2, 2], [4, 4, 7], [3, 8, 3, 2, 3], [5, 5], [9, 10, 11, 12]]

In [7]:
from keras.utils import pad_sequences
sequences = pad_sequences(sequences,padding='post')
sequences

array([[ 6,  2,  0,  0,  0],
       [ 2,  2,  0,  0,  0],
       [ 4,  4,  7,  0,  0],
       [ 3,  8,  3,  2,  3],
       [ 5,  5,  0,  0,  0],
       [ 9, 10, 11, 12,  0]], dtype=int32)

In [8]:
from keras.datasets import imdb
from keras import Sequential
from keras.layers import Dense,SimpleRNN,Embedding,Flatten

In [9]:
model = Sequential()
model.add(Embedding(17,output_dim=2,input_length=5))
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 5, 2)              34        
                                                                 
Total params: 34 (136.00 Byte)
Trainable params: 34 (136.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [10]:
model.compile('adam','accuracy')

In [11]:
pred = model.predict(sequences)
print(pred)

[[[-0.01903333 -0.02481568]
  [ 0.03341976 -0.01680303]
  [-0.02170887  0.01460459]
  [-0.02170887  0.01460459]
  [-0.02170887  0.01460459]]

 [[ 0.03341976 -0.01680303]
  [ 0.03341976 -0.01680303]
  [-0.02170887  0.01460459]
  [-0.02170887  0.01460459]
  [-0.02170887  0.01460459]]

 [[ 0.03057356  0.00147548]
  [ 0.03057356  0.00147548]
  [-0.03807183 -0.03274187]
  [-0.02170887  0.01460459]
  [-0.02170887  0.01460459]]

 [[ 0.01368866 -0.00370513]
  [-0.0406686  -0.03878554]
  [ 0.01368866 -0.00370513]
  [ 0.03341976 -0.01680303]
  [ 0.01368866 -0.00370513]]

 [[-0.01962185 -0.02365164]
  [-0.01962185 -0.02365164]
  [-0.02170887  0.01460459]
  [-0.02170887  0.01460459]
  [-0.02170887  0.01460459]]

 [[-0.04003892  0.04048966]
  [-0.00535532  0.02747469]
  [ 0.00386064  0.00823222]
  [-0.04411718 -0.0313225 ]
  [-0.02170887  0.01460459]]]
