### Word Embeddings -Keras

Problems with One-Hot Encoded Feature Vector Approaches
A potential drawback with one-hot encoded feature vector approaches such as N-Grams, bag of words and TF-IDF approach is that the feature vector for each document can be huge. For instance, if you have a half million unique words in your corpus and you want to represent a sentence that contains 10 words, your feature vector will be a half million dimensional one-hot encoded vector where only 10 indexes will have 1.

### Word Embeddings
In word embeddings, every word is represented as an n-dimensional dense vector. The words that are similar will have similar vector. Word embeddings techniques such as GloVe and Word2Vec have proven to be extremely efficient for converting words into corresponding dense vectors. The vector size is small and none of the indexes in the vector is actually empty.

### Implementation of Word Embedding with Keras
> To implement word embeddings, the Keras library contains a layer called ``Embedding()``. The embedding layer is implemented in the form of a class in Keras and is normally used as a first layer in the sequential model for NLP tasks.

[Read More](https://stats.stackexchange.com/questions/270546/how-does-keras-embedding-layer-work)

> Embedding(200, 32, input_length=50)

* The first parameter in the embeddig layer is the size of the vocabulary or the **total number of unique words in a corpus**.
* The second parameter is the number of the **dimensions for each word vector**. For instance, if you want each word vector to have 32 dimensions, you will specify 32 as the second parameter. 
* And finally, the third parameter is the **length of the input sentence**.

### Custom Word Embeddings
> We are going to create our custom word embedding.

In [24]:
import numpy as np
from tensorflow.keras.preprocessing.text import one_hot, Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import tensorflow as tf
from tensorflow import keras

### Data

In [2]:
corpus = [
    'This is an excellent movie',
    'The move was fantastic I like it',
    'You should watch it is brilliant',
    'Exceptionally good',
    'Wonderfully directed and executed I like it',
    'Its a fantastic series',
    'Never watched such a brillent movie',
    'It is a Wonderful movie',
    "horrible acting",
    'waste of money',
    'pathetic picture',
    'It was very boring',
    'I did not like the movie',
    'The movie was horrible',
    'I will not recommend',
    'The acting is pathetic'
]
sentiments = np.array([1 if i< 8 else 0 for i in range(16)])
sentiments

array([1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0])

> The first `8` are positive `reviews` about the move and the last 8 are negative reviews.

> `0` sentiment means a negative review about the movie and `1` is a positive review about the move. as we know the `Embedding()` layer takes `vocabulary` or number of `unique` words. We want to find the total number of `unique` words in the copus

In [3]:
from nltk.tokenize import word_tokenize

In [4]:
all_words = []
for sent in corpus:
    words = word_tokenize(sent)
    for word in words:
        all_words.append(word)
len(all_words)

71

In [5]:
unique_words = list(set(all_words))
len(unique_words)

45

> The Embedding layer expects the words to be in **numeric form**. Therefore, we need to convert the sentences in our corpus to numbers. One way to convert text to numbers is by using the ``one_hot`` function from the ``keras.preprocessing.text`` library. The function takes ``sentence`` and the ``total length of the vocabulary and returns the sentence in numeric form``.

In [6]:
voc_len = len(unique_words) + 5 ## we are just adding 5 to unique words

In [7]:
embedded_sentences = [one_hot(sent, voc_len) for sent in corpus]
embedded_sentences

[[40, 12, 17, 45, 18],
 [17, 47, 48, 40, 13, 34, 28],
 [17, 30, 30, 28, 12, 5],
 [7, 31],
 [12, 17, 10, 3, 13, 34, 28],
 [1, 16, 40, 37],
 [43, 25, 45, 16, 10, 18],
 [28, 12, 16, 33, 18],
 [2, 18],
 [41, 33, 46],
 [40, 49],
 [28, 48, 44, 26],
 [13, 36, 19, 34, 17, 18],
 [17, 18, 48, 2],
 [13, 43, 19, 14],
 [17, 18, 12, 40]]

> The embedding layer expects sentences to be of equal size. However, our encoded sentences are of different sizes. One way to make all the sentences of uniform size is to increase the lenght of all the sentences and make it equal to the length of the largest sentence. Let's first find the largest sentence in our corpus and then we will increase the length of all the sentences to the length of the largest sentence. 

In [8]:
word_count = lambda sentence: len(word_tokenize(sentence))
longest_sentence = max(corpus, key=word_count)
len_longest_sentence = len(word_tokenize(longest_sentence))
len_longest_sentence

7

> We want to make all sentences have equal size, so the sentences that has length less than 7 we will fill the gaps of marking them `7` by 0 using `pad_sequences`. The first parameter is the list of **encoded sentences of unequal sizes**, the second parameter is the **size of the longest sentence** or the padding index, while the last parameter is **padding** where you specify post to add padding at the end of sentences.

In [9]:
padded_sents = pad_sequences(embedded_sentences, len_longest_sentence, padding="post")

In [10]:
padded_sents

array([[40, 12, 17, 45, 18,  0,  0],
       [17, 47, 48, 40, 13, 34, 28],
       [17, 30, 30, 28, 12,  5,  0],
       [ 7, 31,  0,  0,  0,  0,  0],
       [12, 17, 10,  3, 13, 34, 28],
       [ 1, 16, 40, 37,  0,  0,  0],
       [43, 25, 45, 16, 10, 18,  0],
       [28, 12, 16, 33, 18,  0,  0],
       [ 2, 18,  0,  0,  0,  0,  0],
       [41, 33, 46,  0,  0,  0,  0],
       [40, 49,  0,  0,  0,  0,  0],
       [28, 48, 44, 26,  0,  0,  0],
       [13, 36, 19, 34, 17, 18,  0],
       [17, 18, 48,  2,  0,  0,  0],
       [13, 43, 19, 14,  0,  0,  0],
       [17, 18, 12, 40,  0,  0,  0]])

### Creating a Simple Model

In [11]:
model = keras.Sequential([
    tf.keras.layers.Embedding(voc_len, 20, input_length=len_longest_sentence),
    keras.layers.Flatten(),
    keras.layers.Dense(16, activation="relu"),
    keras.layers.Dense(1, activation='sigmoid')
])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 7, 20)             1000      
_________________________________________________________________
flatten (Flatten)            (None, 140)               0         
_________________________________________________________________
dense (Dense)                (None, 16)                2256      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 17        
Total params: 3,273
Trainable params: 3,273
Non-trainable params: 0
_________________________________________________________________


> A Sequential model and add the ``Embedding`` layer as the first layer to the model. The length of the vocabulary is specified by the ``voc_len`` parameter. The dimension of each word vector will be ``20`` and the ``input_length`` will be the length of the longest sentence, which is ``7``. Next, the ``Embedding`` layer is flattened so that it can be directly used with the densely connected layer. Since it is a ``binary classification`` problem, we use the ``sigmoid`` function as the loss function at the dense layer.

### Compiling the Model

In [12]:
model.compile(optimizer=keras.optimizers.Adam(lr=1e-3), 
              loss=keras.losses.BinaryCrossentropy(),
              metrics=['accuracy']
             )

### Trainning the model

> First we want to shuffle our datasets and then split them into train and test as usual

In [13]:
data = np.column_stack([padded_sents, sentiments])
data

array([[40, 12, 17, 45, 18,  0,  0,  1],
       [17, 47, 48, 40, 13, 34, 28,  1],
       [17, 30, 30, 28, 12,  5,  0,  1],
       [ 7, 31,  0,  0,  0,  0,  0,  1],
       [12, 17, 10,  3, 13, 34, 28,  1],
       [ 1, 16, 40, 37,  0,  0,  0,  1],
       [43, 25, 45, 16, 10, 18,  0,  1],
       [28, 12, 16, 33, 18,  0,  0,  1],
       [ 2, 18,  0,  0,  0,  0,  0,  0],
       [41, 33, 46,  0,  0,  0,  0,  0],
       [40, 49,  0,  0,  0,  0,  0,  0],
       [28, 48, 44, 26,  0,  0,  0,  0],
       [13, 36, 19, 34, 17, 18,  0,  0],
       [17, 18, 48,  2,  0,  0,  0,  0],
       [13, 43, 19, 14,  0,  0,  0,  0],
       [17, 18, 12, 40,  0,  0,  0,  0]])

In [14]:
np.random.shuffle(data)
data

array([[28, 48, 44, 26,  0,  0,  0,  0],
       [17, 18, 12, 40,  0,  0,  0,  0],
       [13, 36, 19, 34, 17, 18,  0,  0],
       [13, 43, 19, 14,  0,  0,  0,  0],
       [43, 25, 45, 16, 10, 18,  0,  1],
       [12, 17, 10,  3, 13, 34, 28,  1],
       [17, 47, 48, 40, 13, 34, 28,  1],
       [ 7, 31,  0,  0,  0,  0,  0,  1],
       [28, 12, 16, 33, 18,  0,  0,  1],
       [40, 49,  0,  0,  0,  0,  0,  0],
       [41, 33, 46,  0,  0,  0,  0,  0],
       [ 1, 16, 40, 37,  0,  0,  0,  1],
       [17, 30, 30, 28, 12,  5,  0,  1],
       [40, 12, 17, 45, 18,  0,  0,  1],
       [17, 18, 48,  2,  0,  0,  0,  0],
       [ 2, 18,  0,  0,  0,  0,  0,  0]])

In [15]:
from sklearn.model_selection import train_test_split
train, test = train_test_split(data, random_state=33, test_size= .2)

In [16]:
X_train = train[:][:, :7]
X_test = test[:][:, :7]

y_test = test[:][:,-1]
y_train = train[:][:,-1]

In [21]:
EPOCHS = 10
BATCH_SIZE = 10
VALIDATION_SET = (X_test, y_test)
model.fit(X_train, y_train, epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=2, validation_data=VALIDATION_SET)

Epoch 1/10
2/2 - 0s - loss: 0.6779 - accuracy: 0.8333 - val_loss: 0.6896 - val_accuracy: 0.7500
Epoch 2/10
2/2 - 0s - loss: 0.6726 - accuracy: 0.9167 - val_loss: 0.6906 - val_accuracy: 0.5000
Epoch 3/10
2/2 - 0s - loss: 0.6682 - accuracy: 0.9167 - val_loss: 0.6911 - val_accuracy: 0.5000
Epoch 4/10
2/2 - 0s - loss: 0.6638 - accuracy: 0.9167 - val_loss: 0.6912 - val_accuracy: 0.5000
Epoch 5/10
2/2 - 0s - loss: 0.6595 - accuracy: 0.9167 - val_loss: 0.6915 - val_accuracy: 0.5000
Epoch 6/10
2/2 - 0s - loss: 0.6551 - accuracy: 0.9167 - val_loss: 0.6912 - val_accuracy: 0.5000
Epoch 7/10
2/2 - 0s - loss: 0.6508 - accuracy: 0.9167 - val_loss: 0.6905 - val_accuracy: 0.5000
Epoch 8/10
2/2 - 0s - loss: 0.6464 - accuracy: 0.9167 - val_loss: 0.6895 - val_accuracy: 0.5000
Epoch 9/10
2/2 - 0s - loss: 0.6419 - accuracy: 0.9167 - val_loss: 0.6885 - val_accuracy: 0.7500
Epoch 10/10
2/2 - 0s - loss: 0.6372 - accuracy: 0.9167 - val_loss: 0.6876 - val_accuracy: 0.7500


<tensorflow.python.keras.callbacks.History at 0x247ea82d850>

### Making Predictions

In [22]:
np.round(model.predict(X_test[:])), y_test

(array([[0.],
        [0.],
        [0.],
        [0.]], dtype=float32),
 array([0, 0, 0, 1]))

In [23]:
model.evaluate(X_test, y_test)



[0.6875579953193665, 0.75]

> Those are the basics of `custom word embeddings`.

### Loading Pretrained Word Embeddings

[Doccs](https://stackabuse.com/python-for-nlp-word-embeddings-for-deep-learning-in-keras/)

> Several types of pretrained word embeddings exist, however we will be using the `GloVe` word embeddings from Stanford NLP since it is the most famous one and commonly used and can be downloaded **[Here](https://nlp.stanford.edu/projects/glove/)**.

Im going to download the `Glove.6B.zip` which is `822MB`

> From our `custom` word embedding we have used `one_hot` function to convert text to `vectors` another approach is to use the [Tokenizer](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer) from `keras.preprocessing.text`. All we have to do is to pass the corpus to the `fit_on_text` method and to get the number of unique words we can count the length of the `word_index` and add `1` to it

In [27]:
corpus

['This is an excellent movie',
 'The move was fantastic I like it',
 'You should watch it is brilliant',
 'Exceptionally good',
 'Wonderfully directed and executed I like it',
 'Its a fantastic series',
 'Never watched such a brillent movie',
 'It is a Wonderful movie',
 'horrible acting',
 'waste of money',
 'pathetic picture',
 'It was very boring',
 'I did not like the movie',
 'The movie was horrible',
 'I will not recommend',
 'The acting is pathetic']

In [32]:
from tensorflow.keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)

In [36]:
voc_len = len(tokenizer.word_index) + 1
voc_len 

44

### Converting sentences to their numeric part.
>To convert sentences to their numeric counterpart, call the `texts_to_sequences` function and pass it the whole corpus.

In [37]:
embedded_sentences = tokenizer.texts_to_sequences(corpus)
embedded_sentences

[[14, 3, 15, 16, 1],
 [4, 17, 6, 9, 5, 7, 2],
 [18, 19, 20, 2, 3, 21],
 [22, 23],
 [24, 25, 26, 27, 5, 7, 2],
 [28, 8, 9, 29],
 [30, 31, 32, 8, 33, 1],
 [2, 3, 8, 34, 1],
 [10, 11],
 [35, 36, 37],
 [12, 38],
 [2, 6, 39, 40],
 [5, 41, 13, 7, 4, 1],
 [4, 1, 6, 10],
 [5, 42, 13, 43],
 [4, 11, 3, 12]]

### Finding the number of the longest sentence

In [38]:
word_count = lambda sentence: len(word_tokenize(sentence))
longest_sentence = max(corpus, key=word_count)
length_long_sentence = len(word_tokenize(longest_sentence))
length_long_sentence

7

### Padding sentences

In [39]:
padded_sents = pad_sequences(embedded_sentences, length_long_sentence, padding="post" )
padded_sents

array([[14,  3, 15, 16,  1,  0,  0],
       [ 4, 17,  6,  9,  5,  7,  2],
       [18, 19, 20,  2,  3, 21,  0],
       [22, 23,  0,  0,  0,  0,  0],
       [24, 25, 26, 27,  5,  7,  2],
       [28,  8,  9, 29,  0,  0,  0],
       [30, 31, 32,  8, 33,  1,  0],
       [ 2,  3,  8, 34,  1,  0,  0],
       [10, 11,  0,  0,  0,  0,  0],
       [35, 36, 37,  0,  0,  0,  0],
       [12, 38,  0,  0,  0,  0,  0],
       [ 2,  6, 39, 40,  0,  0,  0],
       [ 5, 41, 13,  7,  4,  1,  0],
       [ 4,  1,  6, 10,  0,  0,  0],
       [ 5, 42, 13, 43,  0,  0,  0],
       [ 4, 11,  3, 12,  0,  0,  0]])

### Loading the `GloVe` 
> load the GloVe word embeddings and then create our embedding matrix that contains the words in our corpus and their corresponding values from GloVe embeddings.

In [46]:
embeddings_dictionary = dict() ## This will store word embeddings

with open(r"C:\Users\crisp\Downloads\glove.6B\glove.6B.100d.txt", encoding="utf8") as glove_file:
    for line in glove_file:
        records = line.split()
        word = records[0]
        vectors = np.asarray(records[1:], dtype=np.float32)
        embeddings_dictionary[word] = vectors


In [51]:
embeddings_dictionary["the"]

array([-0.038194, -0.24487 ,  0.72812 , -0.39961 ,  0.083172,  0.043953,
       -0.39141 ,  0.3344  , -0.57545 ,  0.087459,  0.28787 , -0.06731 ,
        0.30906 , -0.26384 , -0.13231 , -0.20757 ,  0.33395 , -0.33848 ,
       -0.31743 , -0.48336 ,  0.1464  , -0.37304 ,  0.34577 ,  0.052041,
        0.44946 , -0.46971 ,  0.02628 , -0.54155 , -0.15518 , -0.14107 ,
       -0.039722,  0.28277 ,  0.14393 ,  0.23464 , -0.31021 ,  0.086173,
        0.20397 ,  0.52624 ,  0.17164 , -0.082378, -0.71787 , -0.41531 ,
        0.20335 , -0.12763 ,  0.41367 ,  0.55187 ,  0.57908 , -0.33477 ,
       -0.36559 , -0.54857 , -0.062892,  0.26584 ,  0.30205 ,  0.99775 ,
       -0.80481 , -3.0243  ,  0.01254 , -0.36942 ,  2.2167  ,  0.72201 ,
       -0.24978 ,  0.92136 ,  0.034514,  0.46745 ,  1.1079  , -0.19358 ,
       -0.074575,  0.23353 , -0.052062, -0.22044 ,  0.057162, -0.15806 ,
       -0.30798 , -0.41625 ,  0.37972 ,  0.15006 , -0.53212 , -0.2055  ,
       -1.2526  ,  0.071624,  0.70565 ,  0.49744 , 

> The dictionary `embeddings_dictionary` now contains words and corresponding **`GloVe`** embeddings for all the words.

> We want the word embeddings for only those words that are present in our corpus. We will create a two dimensional numpy array of `44` (size of vocabulary) rows and `100` columns. The array will initially contain zeros. The array will be named as `embedding_matrix`.

> Next, we will iterate through each word in our corpus by traversing the `word_tokenizer.word_index` dictionary that contains our words and their corresponding index.

> Each word will be passed as key to the `embedding_dictionary` to retrieve the corresponding `100` dimensional vector for the word. The `100` dimensional vector will then be stored at the corresponding index of the word in the `embedding_matrix`.

In [54]:
embedding_matrix = np.zeros((voc_len, 100))
#print(embedding_matrix)

for word, index in tokenizer.word_index.items():
#     print(word, index)
    embedding_vect = embeddings_dictionary.get(word)
    
    if embedding_vect is not None:
        embedding_matrix[index] = embedding_vect
print(embedding_matrix)



[[ 0.          0.          0.         ...  0.          0.
   0.        ]
 [ 0.38251001  0.14821     0.60601002 ...  0.058921    0.091112
   0.47283   ]
 [-0.30664     0.16821     0.98510998 ... -0.38775     0.36916
   0.54521   ]
 ...
 [ 0.30449    -0.19628     0.20225    ... -0.18385001 -0.12432
   0.27467999]
 [-0.26703     0.44911     0.55478001 ... -0.87247002  0.83828002
   0.465     ]
 [-0.57547998 -0.043236   -0.1972     ... -0.10507     0.26554999
   0.32192999]]


> Our `embedding_matrix` now contains pretrained word embeddings for the words in our corpus.

### Model Creation

In [56]:
model = keras.Sequential([
    keras.layers.Embedding(voc_len, 100, weights=[embedding_matrix], 
              input_length=length_long_sentence, trainable=False),
    keras.layers.Flatten(),
    keras.layers.Dense(1, activation="sigmoid")
])
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 7, 100)            4400      
_________________________________________________________________
flatten_1 (Flatten)          (None, 700)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 701       
Total params: 5,101
Trainable params: 701
Non-trainable params: 4,400
_________________________________________________________________


> The script remains the same, except for the embedding layer. Here in the embedding layer, the first parameter is the size of the vacabulary. The second parameter is the vector dimension of the output vector. Since we are using pretrained word embeddings that contain 100 dimensional vector, we set the vector dimension to 100.

> Another very important attribute of the `Embedding()` layer that we did not use in the last section is `weights`. You can pass your pretrained embedding matrix as default `weights` to the `weights` parameter. And since we are not training the embedding layer, the `trainable` attribute has been set to `False`.

### Compiling the Model

In [58]:
model.compile(
    optimizer=keras.optimizers.Adam(lr=1e-3),
    loss = keras.losses.BinaryCrossentropy(),
    metrics =["accuracy"]
)

### Training the Model

In [59]:
model.fit(padded_sents, sentiments, epochs=100, verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<tensorflow.python.keras.callbacks.History at 0x247ea782c70>

### Evaluating the Model

In [61]:
model.evaluate(padded_sents, sentiments, verbose=0)

[0.07033342123031616, 1.0]

### Making predictions

In [71]:
model.predict(padded_sents[:2]), sentiments[:2]

(array([[0.9590082],
        [0.9785887]], dtype=float32),
 array([1, 1]))

#### Creating a function that sentiments 7-word review

In [81]:
class LongSent(Exception):
    pass
def predictSentiment(sent):
    embedded_sentence = tokenizer.texts_to_sequences([sent])
    
    try:
        if len(embedded_sentence[0]) > 7:
            raise LongSent("The review is too long")   
        padded_sent = pad_sequences(embedded_sentence, length_long_sentence, padding="post" )
        sent_index_predicted = np.round(model.predict(padded_sent))
        
        print("The review is: ", ["Negative", "Positive"][int(sent_index_predicted[0][0])])
        
    except:
        pass
    
predictSentiment("I love this movie.")

The review is:  Negative


> **In NLP** we need more data to train our model as we can see this model is predicting `wrongly`.

### Word Embeddings with Keras Functional API
* It is extremely important to know how Keras Functional API works. Most of the advanced deep learning models involving multiple inputs and outputs use the Functional API.
* The rest of the script remains similar as it was in the last section. The only change will be in the development of a deep learning model. Let's implement the same deep learning model as we implemented in the last section with Keras Functional API.

In [83]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input

In [87]:
deep_inputs = Input(shape=(length_long_sentence,))

embedding = keras.layers.Embedding(voc_len, 100, weights=[embedding_matrix],
                      input_length=length_long_sentence,
                      trainable=False)(deep_inputs)
flatten = keras.layers.Flatten()(embedding)
hidden = keras.layers.Dense(1, activation='sigmoid')(flatten)
model = Model(inputs=deep_inputs, outputs=hidden)

> In the Keras `Functional API`, you have to define the `input layer` separately before the `embedding layer`. In the input, layer you have to simply pass the length of input vector. To specify that previous layer as input to the next layer, the previous layer is passed as a parameter inside the parenthesis, at the end of the next layer.

> For instance, in the above script, you can see that `deep_inputs` is passed as parameter at the end of the embedding layer. Similarly, `embedding` is passed as input at the end of the `Flatten()` layer and so on.

> Finally, in the `Model()`, you have to pass the input layer, and the final output layer.