**Question**: Build, compile, train, and predict the Twitter US Airline Sentiment dataset using the three-layer Bidirectional LSTM model


**Description:**

* Load the Twitter US Airline Sentiment dataset which has 14585 rows and 21 columns

* Select only text and airline sentiment columns for modeling

* Remove the neutral label from the dataset such that we need to process positive and negative labels

* Convert airline sentiment feature into numeric values

* Perform tokenization for text feature

* Build the sequential model 12975 as vocabulary size, 32 as embedding length, 200 as input length, next add three-layer Bidirectional LSTM layer with 50 neurons, dropout as 0.5, and finally, add a dense layer with sigmoid as activation.

* Compile the model using loss as binary cross-entropy, adam as the optimizer

* Train the model with 10 epochs, batch size as 32 and 0.2 as validation split

* By using given sample input text predict the sentiment of the text


**SOLUTION**

In [None]:
# Import Libraries.

import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM,Dense, Dropout, SpatialDropout1D, Bidirectional
from tensorflow.keras.layers import Embedding

In [None]:
# Load the dataset.

df= pd.read_csv('/home/metagogy/Tweets-train.csv', sep=',')

In [None]:
# select 'text', and 'airline_sentiment' from the dataframe.

tweet_df = df[['text','airline_sentiment']]

In [None]:
# Remove the neutral labels from the airline_sentiment.

tweet_df = tweet_df[tweet_df['airline_sentiment'] != 'neutral']

In [None]:
# Convert Labels into numerical.

sentiment_label = tweet_df.airline_sentiment.factorize()

In [None]:
# tokenization.

tweet = tweet_df.text.values
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(tweet)
vocab_size = len(tokenizer.word_index) + 1
encoded_docs = tokenizer.texts_to_sequences(tweet)
padded_sequence = pad_sequences(encoded_docs, maxlen=200)

In [None]:
# Build a model.

embedding_vector_length = 32

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_vector_length, input_length=200),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(50,return_sequences = True)),    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(50,return_sequences = True)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(50)),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

In [None]:
# compile a model.
model.compile(loss='binary_crossentropy',optimizer='adam', metrics=['accuracy'])  

In [None]:
# Train the model.

model.fit(padded_sequence,sentiment_label[0],validation_split=0.2, epochs=10, batch_size=32)

In [None]:
# Predict the model.

test_word ="AI educator provides best courses"
tw = tokenizer.texts_to_sequences([test_word])

tw = pad_sequences(tw,maxlen=200)

prediction = int(model.predict(tw).round().item())
sentiment_label[1][prediction]