**Applying Deep Learning Models in NLP:**

Recurrent Neural Networks (RNNs) are widely used in Natural Language Processing (NLP) for tasks like text classification, machine translation, and sentiment analysis.

* Simple RNN – Basic recurrent network for sequential data.
* Bidirectional RNN (BiRNN) – Processes input in both forward and backward directions for better context understanding.

**Steps for NLP Tasks Using RNN:**
* Preprocess Text Data (Tokenization, Padding, Encoding).
* Build RNN Models (Simple RNN, Bidirectional RNN).
* Train and Evaluate on text datasets (e.g., IMDB Sentiment Analysis

In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import warnings
warnings.simplefilter('ignore')
import gc

In [3]:
from tensorflow import keras as kr
from tqdm.keras import TqdmCallback

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score

In [6]:
file="/content/drive/MyDrive/Labs/DL/FakeNewsNet.csv"
wanted_cols=['title','real']
df=pd.read_csv(file,usecols=wanted_cols)

In [7]:
df.dropna(inplace=True)
df.reset_index(drop=True, inplace=True)

In [8]:
df['wcount']=df['title'].apply(lambda x: len(x.split(' ')))

In [9]:
print(f"Found {df.shape[0]} news...")

Found 23196 news...


In [10]:
x=df['title'].values
y=df['real'].values

In [11]:
x_train,x_test,y_train,y_test= train_test_split(x,y,test_size=0.20, random_state=np.random.randint(10))

In [12]:
print(f"Training Records: {x_train.shape[0]} | Testing Records: {x_test.shape[0]}")

Training Records: 18556 | Testing Records: 4640


In [13]:
tok = kr.preprocessing.text.Tokenizer(
    num_words=None,
    filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
    lower=True,
    split=' ',
    char_level=False,
    oov_token=None
)

In [14]:
tok.fit_on_texts(x_train)

In [15]:
tok_train=tok.texts_to_sequences(x_train)
tok_test=tok.texts_to_sequences(x_test)

In [16]:
max_length=int(df.wcount.quantile(0.75))
padded_train=kr.preprocessing.sequence.pad_sequences(tok_train,maxlen=max_length, padding='post')
padded_test=kr.preprocessing.sequence.pad_sequences(tok_test, maxlen=max_length, padding='post')

In [17]:
vocab_size = len(tok.word_index)+1
epoch = 10
unit = 32

In [18]:
model=kr.models.Sequential(name='FakeNewsCatcher')
model.add(kr.layers.Embedding(vocab_size,unit,input_length=max_length))
model.add(kr.layers.SimpleRNN(unit,return_sequences=False))
model.add(kr.layers.Dense(1,activation='sigmoid'))

In [19]:
model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy'])

In [20]:
model.summary()

In [21]:
es=kr.callbacks.EarlyStopping(monitor='val_loss', mode='min', verbose=0, patience=5)

In [22]:
gc.collect()
kr.backend.clear_session()

In [23]:
hist=model.fit(x=padded_train,
              y=y_train,
              epochs=epoch,
              shuffle=True,
              validation_data=(padded_test,y_test),
              verbose=0,
              callbacks=[TqdmCallback(verbose=0),es])

0epoch [00:00, ?epoch/s]

In [24]:
acc = '{:.2%}'.format(hist.history['accuracy'][-1])
print(f"Our model has achieved an accuracy of {acc} in {hist.epoch[-1]} epoch(s)")

Our model has achieved an accuracy of 96.11% in 5 epoch(s)


In [25]:
pred = (model.predict(padded_test)>0.5).astype('int32')
print(classification_report(y_test,pred))

[1m145/145[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step
              precision    recall  f1-score   support

           0       0.57      0.57      0.57      1157
           1       0.86      0.86      0.86      3483

    accuracy                           0.78      4640
   macro avg       0.71      0.71      0.71      4640
weighted avg       0.78      0.78      0.78      4640



In [26]:
model2=kr.models.Sequential(name='FakeNewsCatcher2')
model2.add(kr.layers.Embedding(vocab_size,unit,input_length=max_length))
model2.add(kr.layers.Bidirectional(kr.layers.SimpleRNN(unit)))
model2.add(kr.layers.Dense(unit,activation='relu'))
model2.add(kr.layers.Dense(1,activation='sigmoid'))

In [27]:
model2.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy'])

In [28]:
model2.summary()