# PART 1

### 1.Understanding RNN

### Question: What are Recurrent Neural Networks, and how do they differ from traditional feedforward neural networks?

RNNs are a type of neural network designed for sequence data (like time series or text). They remember past information to make decisions.


Difference from Feedforward Networks: \
Traditional neural networks (feedforward) just process inputs one by one without memory. RNNs have loops, allowing them to keep information from previous inputs. (Recurrusion)

### Task: Explain the working of RNN, and how information is passed through the network over time.

- At each step, RNN takes an input and combines it with the hidden state (memory of the past). 
- This combination is passed through a layer to produce the output and update the hidden state. 
- The updated hidden state is passed on to the next step, so NN remembers previoous inputs. \

info is passed by:
- The hidden state carries the memory of what has been seen so far.
- As new inputs come in, the hidden state gets updated, so the network is "learning" as it processes the sequence.

### Stacking RNN Layers and Bi-directional Architecture

**Stacking RNN Layers:**
- **Advantages:** 
  - **Complex Patterns:** Multiple RNN layers can capture more complex patterns in the data.
  - **Higher Capacity:** They have a higher capacity to learn from long sequences.
- **Drawbacks:**
  - **Vanishing/Exploding Gradients:** More layers can lead to vanishing or exploding gradients, making training harder.
  - **Computational Cost:** More layers increase computation and memory usage.

**Bi-directional RNNs:**
- **Definition:** Bi-directional RNNs process data in both forward and backward directions.
- **Enhancement:** This allows the model to learn from both past and future context in sequences, improving performance on tasks where context from both directions is important.

### Hybrid Architecture

**Hybrid Architecture:**
- **Definition:** Combining RNNs with other models (e.g., CNNs, attention mechanisms) to leverage their strengths.
- **Examples:**
  - **CNN + RNN:** CNNs can extract features from sequences (like text or images), and RNNs can model temporal dependencies in those features.
  - **Attention + RNN:** Attention mechanisms help RNNs focus on relevant parts of the input sequence, improving performance on tasks like machine translation.

### Types of RNNs

1. **Vanilla RNN:**
   - **Structure:** Basic RNN with simple connections between layers.
   - **Differences:** Struggles with long-term dependencies due to vanishing gradients.

2. **Long Short-Term Memory (LSTM):**
   - **Structure:** Includes gates (input, output, forget) to control the flow of information.
   - **Differences:** Better at capturing long-term dependencies compared to vanilla RNNs.

3. **Gated Recurrent Unit (GRU):**
   - **Structure:** Similar to LSTMs but with fewer gates (update and reset gates).
   - **Differences:** Often simpler and faster than LSTMs with comparable performance.

4. **Bidirectional RNN:**
   - **Structure:** Processes input in both forward and backward directions.
   - **Differences:** Captures context from both ends of the sequence, enhancing performance on tasks requiring understanding from both directions.

# PART 2

Tweet sentiment Analysis dataset

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
import re
import string
from textblob import TextBlob
import nltk
from nltk.corpus import stopwords
import emoji
nltk.download('punkt')
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')
from keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from keras.layers import LSTM, Dense, SimpleRNN, Embedding, Flatten, Dropout
from keras.activations import softmax
from sklearn.model_selection import train_test_split
# ignore warnings   
import warnings
warnings.filterwarnings('ignore')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\PMLS\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\PMLS\AppData\Roaming\nltk_data...


In [6]:
df = pd.read_csv('sentiment_tweets3.csv')
df.head()

Unnamed: 0,Index,message to examine,label (depression result)
0,106,just had a real good moment. i missssssssss hi...,0
1,217,is reading manga http://plurk.com/p/mzp1e,0
2,220,@comeagainjen http://twitpic.com/2y2lx - http:...,0
3,288,@lapcat Need to send 'em to my accountant tomo...,0
4,540,ADD ME ON MYSPACE!!! myspace.com/LookThunder,0


In [7]:
df.rename(columns={'message to examine': 'Text', 'label (depression result)': 'Label'}, inplace=True)
df.head()

Unnamed: 0,Index,Text,Label
0,106,just had a real good moment. i missssssssss hi...,0
1,217,is reading manga http://plurk.com/p/mzp1e,0
2,220,@comeagainjen http://twitpic.com/2y2lx - http:...,0
3,288,@lapcat Need to send 'em to my accountant tomo...,0
4,540,ADD ME ON MYSPACE!!! myspace.com/LookThunder,0


In [8]:
df['Text'] = df['Text'].str.lower()
def remove_html_tags(text):
    soup = BeautifulSoup(text, 'html.parser')
    return soup.get_text()

# Remove HTML tags from 'Text' column
df['Text'] = df['Text'].apply(remove_html_tags)

def remove_urls(text):
    return re.sub(r'http\S+|www\S+', '', text)

# Apply the function to the 'Text' column
df['Text'] = df['Text'].apply(remove_urls)

string.punctuation

# Define the punctuation characters to remove
punctuation = string.punctuation

In [9]:
def remove_punctuation(text):
    return text.translate(str.maketrans('', '', punctuation))

# Apply remove_punctuation function to 'Text' column
df['Text'] = df['Text'].apply(remove_punctuation)

cpy pasting this common slangs abbreviation list

In [11]:
chat_words = {
    "AFAIK": "As Far As I Know",
    "AFK": "Away From Keyboard",
    "ASAP": "As Soon As Possible",
    "ATK": "At The Keyboard",
    "ATM": "At The Moment",
    "A3": "Anytime, Anywhere, Anyplace",
    "BAK": "Back At Keyboard",
    "BBL": "Be Back Later",
    "BBS": "Be Back Soon",
    "BFN": "Bye For Now",
    "B4N": "Bye For Now",
    "BRB": "Be Right Back",
    "BRT": "Be Right There",
    "BTW": "By The Way",
    "B4": "Before",
    "B4N": "Bye For Now",
    "CU": "See You",
    "CUL8R": "See You Later",
    "CYA": "See You",
    "FAQ": "Frequently Asked Questions",
    "FC": "Fingers Crossed",
    "FWIW": "For What It's Worth",
    "FYI": "For Your Information",
    "GAL": "Get A Life",
    "GG": "Good Game",
    "GN": "Good Night",
    "GMTA": "Great Minds Think Alike",
    "GR8": "Great!",
    "G9": "Genius",
    "IC": "I See",
    "ICQ": "I Seek you (also a chat program)",
    "ILU": "ILU: I Love You",
    "IMHO": "In My Honest/Humble Opinion",
    "IMO": "In My Opinion",
    "IOW": "In Other Words",
    "IRL": "In Real Life",
    "KISS": "Keep It Simple, Stupid",
    "LDR": "Long Distance Relationship",
    "LMAO": "Laugh My A.. Off",
    "LOL": "Laughing Out Loud",
    "LTNS": "Long Time No See",
    "L8R": "Later",
    "MTE": "My Thoughts Exactly",
    "M8": "Mate",
    "NRN": "No Reply Necessary",
    "OIC": "Oh I See",
    "PITA": "Pain In The A..",
    "PRT": "Party",
    "PRW": "Parents Are Watching",
    "QPSA?": "Que Pasa?",
    "ROFL": "Rolling On The Floor Laughing",
    "ROFLOL": "Rolling On The Floor Laughing Out Loud",
    "ROTFLMAO": "Rolling On The Floor Laughing My A.. Off",
    "SK8": "Skate",
    "STATS": "Your sex and age",
    "ASL": "Age, Sex, Location",
    "THX": "Thank You",
    "TTFN": "Ta-Ta For Now!",
    "TTYL": "Talk To You Later",
    "U": "You",
    "U2": "You Too",
    "U4E": "Yours For Ever",
    "WB": "Welcome Back",
    "WTF": "What The F...",
    "WTG": "Way To Go!",
    "WUF": "Where Are You From?",
    "W8": "Wait...",
    "7K": "Sick:-D Laugher",
    "TFW": "That feeling when",
    "MFW": "My face when",
    "MRW": "My reaction when",
    "IFYP": "I feel your pain",
    "TNTL": "Trying not to laugh",
    "JK": "Just kidding",
    "IDC": "I don't care",
    "ILY": "I love you",
    "IMU": "I miss you",
    "ADIH": "Another day in hell",
    "ZZZ": "Sleeping, bored, tired",
    "WYWH": "Wish you were here",
    "TIME": "Tears in my eyes",
    "BAE": "Before anyone else",
    "FIMH": "Forever in my heart",
    "BSAAW": "Big smile and a wink",
    "BWL": "Bursting with laughter",
    "BFF": "Best friends forever",
    "CSL": "Can't stop laughing"
}

In [12]:
def replace_chat_words(text):
    words = text.split()
    for i, word in enumerate(words):
        if word.lower() in chat_words:
            words[i] = chat_words[word.lower()]
    return ' '.join(words)

# Apply replace_chat_words function to 'Text' column
df['Text'] = df['Text'].apply(replace_chat_words)

In [13]:
nltk.download('stopwords')

# Get English stopwords from NLTK
stop_words = set(stopwords.words('english'))

# Function to remove stop words from text
def remove_stopwords(text):
    words = text.split()
    filtered_words = [word for word in words if word.lower() not in stop_words]
    return ' '.join(filtered_words)

# Apply remove_stopwords function to 'Text' column
df['Text'] = df['Text'].apply(remove_stopwords)

def remove_emojis(text):
    return emoji.demojize(text)

# Apply remove_emojis function to 'Text' column
df['Text'] = df['Text'].apply(remove_emojis)

wordnet_lemmatizer = WordNetLemmatizer()

# Apply 
df['Text_lemmatized'] = df['Text'].apply(lambda x: ' '.join([wordnet_lemmatizer.lemmatize(word , pos='v') for word in x.split()]))

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\PMLS\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.


In [14]:
X = df['Text']
y = df['Label']

# Train Test Split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [15]:
tokenizer = Tokenizer(oov_token = 'nothing')
tokenizer.fit_on_texts(X_train)
tokenizer.fit_on_texts(X_test)

tokenizer.document_count

10314

In [16]:
X_train_sequences = tokenizer.texts_to_sequences(X_train)
X_test_sequences = tokenizer.texts_to_sequences(X_test)

In [17]:
maxlen = max(len(tokens) for tokens in X_train_sequences)
print("Maximum sequence length (maxlen):", maxlen)

Maximum sequence length (maxlen): 75


In [18]:
X_train_padded = pad_sequences(X_train_sequences, maxlen=maxlen, padding='post')
X_test_padded = pad_sequences(X_test_sequences, maxlen=maxlen, padding='post')

In [36]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Bidirectional, Dropout, Dense
from tensorflow.keras.callbacks import History

# Assuming X_train_padded, X_test_padded, y_train, y_test are already defined and preprocessed

# Define Simple RNN Model
model_simple_rnn = Sequential([
    SimpleRNN(128, input_shape=(75, 1), return_sequences=True),
    Dropout(0.5),
    SimpleRNN(128),
    Dropout(0.5),
    Dense(64, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compile Simple RNN Model
model_simple_rnn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history_simple_rnn = model_simple_rnn.fit(X_train_padded, y_train, epochs=5, batch_size=32, validation_data=(X_test_padded, y_test))
test_loss_simple_rnn, test_acc_simple_rnn = model_simple_rnn.evaluate(X_test_padded, y_test)
print(f"Test Accuracy (Simple RNN): {test_acc_simple_rnn:.4f}")

# Define Stacked RNN Model
model_stacked_rnn = Sequential([
    SimpleRNN(128, return_sequences=True, input_shape=(75, 1)),
    Dropout(0.5),
    SimpleRNN(128, return_sequences=True),
    Dropout(0.5),
    SimpleRNN(128),
    Dropout(0.5),
    Dense(64, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compile Stacked RNN Model
model_stacked_rnn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history_stacked_rnn = model_stacked_rnn.fit(X_train_padded, y_train, epochs=5, batch_size=32, validation_data=(X_test_padded, y_test))
test_loss_stacked_rnn, test_acc_stacked_rnn = model_stacked_rnn.evaluate(X_test_padded, y_test)
print(f"Test Accuracy (Stacked RNN): {test_acc_stacked_rnn:.4f}")

# Define Bidirectional RNN Model
model_bidirectional_rnn = Sequential([
    Bidirectional(SimpleRNN(128, return_sequences=True), input_shape=(75, 1)),
    Dropout(0.5),
    Bidirectional(SimpleRNN(128, return_sequences=True)),
    Dropout(0.5),
    Bidirectional(SimpleRNN(128)),
    Dropout(0.5),
    Dense(64, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compile Bidirectional RNN Model
model_bidirectional_rnn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history_bidirectional_rnn = model_bidirectional_rnn.fit(X_train_padded, y_train, epochs=5, batch_size=32, validation_data=(X_test_padded, y_test))
test_loss_bidirectional_rnn, test_acc_bidirectional_rnn = model_bidirectional_rnn.evaluate(X_test_padded, y_test)
print(f"Test Accuracy (Bidirectional RNN): {test_acc_bidirectional_rnn:.4f}")

# Comparison of Results
print("\nModel Comparison:")
print(f"Simple RNN - Test Accuracy: {test_acc_simple_rnn:.4f}")
print(f"Stacked RNN - Test Accuracy: {test_acc_stacked_rnn:.4f}")
print(f"Bidirectional RNN - Test Accuracy: {test_acc_bidirectional_rnn:.4f}")


Epoch 1/5
[1m258/258[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 31ms/step - accuracy: 0.7506 - loss: 0.5743 - val_accuracy: 0.7712 - val_loss: 0.5173
Epoch 2/5
[1m258/258[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 34ms/step - accuracy: 0.7597 - loss: 0.5615 - val_accuracy: 0.7824 - val_loss: 0.5310
Epoch 3/5
[1m258/258[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 36ms/step - accuracy: 0.7694 - loss: 0.5537 - val_accuracy: 0.7824 - val_loss: 0.5260
Epoch 4/5
[1m258/258[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 40ms/step - accuracy: 0.7647 - loss: 0.5546 - val_accuracy: 0.7824 - val_loss: 0.5367
Epoch 5/5
[1m258/258[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 36ms/step - accuracy: 0.7734 - loss: 0.5491 - val_accuracy: 0.7824 - val_loss: 0.5334
[1m65/65[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.7873 - loss: 0.5284
Test Accuracy (Simple RNN): 0.7824
Epoch 1/5
[1m258/258[0m [32m━━━━━━━━━━━━━━