**Question**: Build, compile, train, and evaluate sarcasm dataset using the sequential model with Bidirectional LSTM layers

**Description:**

* Load the sarcasm JSON file using the open() method and select only headline, is_sarcastic features

* Split the data set into training and testing sets with 20,000 and 6709 rows respectively

* Preprocess the dataset using tokenizer by including vocabulary size as 1000, post padding, and maximum sentence length as 120

* Build the model with Sequential API and add the first layer as Embedding with using input parameters as NUM_WORDS as 1000, DIMENSION as 16  and LEN_WORDS as 120 and add the second layer as Bidirectional LSTM with 32 neurons, third layer as 24 neurons with ‘relu’ activation function and adding fourth layer as dense with 1 neuron with activation function as sigmoid

* Compile the model with optimizer as ‘Adam’, loss as binary cross-entropy, and metrics as accuracy

* Convert the dataset into list format to array using numpy

* Fit the model with training data and epochs as 5

* Evaluate the model with testing data and print accuracy

**Solution**:

In [None]:
import json
import tensorflow as tf
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [None]:
with open("/home/metagogy/Downloads/sarcasm.json", 'r') as d:
    data = json.load(d)
    

In [None]:
sentences = []
labels = []
urls = []
for i in data:
    sentences.append(i['headline'])
    labels.append(i['is_sarcastic'])

In [None]:
training_sentences = sentences[0:20000]
testing_sentences = sentences[20000:]
training_labels = labels[0:20000]
testing_labels = labels[20000:]

In [None]:
tokenizer = Tokenizer(num_words=1000, oov_token="<OOV>")
tokenizer.fit_on_texts(training_sentences)

word_index = tokenizer.word_index

training_sequences = tokenizer.texts_to_sequences(training_sentences)
training_padded = pad_sequences(training_sequences, maxlen=120, padding='post', truncating='post')

testing_sequences = tokenizer.texts_to_sequences(testing_sentences)
testing_padded = pad_sequences(testing_sequences, maxlen=120, padding='post', truncating='post')

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(1000, 16, input_length=120),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
    tf.keras.layers.Dense(24, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

In [None]:
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

In [None]:
training_padded = np.array(training_padded)
training_labels = np.array(training_labels)
testing_padded = np.array(testing_padded)
testing_labels = np.array(testing_labels)

In [None]:
model.fit(training_padded, training_labels, epochs=5, validation_data=(testing_padded,testing_labels))

In [None]:
test_loss, test_acc = model.evaluate(testing_padded,testing_labels)

In [None]:
print(test_acc)