# <a id="Import"></a>
# <p style="background-color: #422057FF; font-family: 'Copperplate'; color:#FDDB27FF; font-size:140%; text-align:center; border-radius:1000px 10px;">1.0 About Author</p> 

<div style="border: 2px solid #006B38FF; padding: 10px; max-width: 1500px;">
    <p>
        I am <b>Atif Ali Khokhar</b>, a passionate data scientist dedicated to mastering machine learning techniques and continually expanding my knowledge base. I believe in the mantra of #KeepLearning and #KeepSupporting, as I am committed to constant growth and uplifting others in the field.
    </p>
    <div style="text-align: center;">
        <img src="https://media.licdn.com/dms/image/D4E03AQHbj9PaMNpUuQ/profile-displayphoto-shrink_400_400/0/1694879278829?e=1721260800&v=beta&t=XWss7C6pbhbBWJoETbMhxsQASHKpP9Vkf7qty24U6Hs" alt="Profile Picture" style="width: 100px; height: 100px; border-radius: 50%; border: 2px solid #D35400;"><br>
    </div>
    <p>
        You can find more about me on my <a href="https://www.linkedin.com/in/atifalikhokhar/" target="_blank">LinkedIn</a>.<br>
        Feel free to connect and reach out for any collaboration or queries!
    </p>
</div>

# <a id="Import"></a>
# <p style="background-color: #422057FF; font-family: 'Copperplate'; color:#FDDB27FF; font-size:140%; text-align:center; border-radius:1000px 10px;">2.0 About Project and Data</p> 

## Project Title: Sentiment Sleuth: Cracking Twitter's Emotional Code with LSTM and NLP

### **Description:**
   This Kaggle notebook, titled "Twitter Sentiment Analysis," focuses on understanding and classifying the sentiment of tweets. Leveraging the power of Natural Language Processing (NLP) and Long Short-Term Memory (LSTM) networks, this project aims to accurately predict whether a given tweet expresses a positive or negative sentiment. By analyzing a dataset of tweets, the notebook demonstrates the entire workflow of data preprocessing, model training, evaluation, and visualization of results.
    
### **Objective:**
   The primary objective of this notebook is to develop an efficient binary classification model that can categorize tweets into positive or negative sentiments. The specific goals include:

**Data Preprocessing:** Clean and preprocess the raw tweet data, including tokenization, stop word removal, and text normalization.

**Feature Engineering:** Transform the text data into numerical representations suitable for input into the LSTM model.

**Model Building:** Construct and train an LSTM model optimized for text data to perform the binary classification task.

**Evaluation:** Assess the model's performance using appropriate metrics such as accuracy, precision, recall, and F1-score.

**Visualization:** Provide clear and insightful visualizations of the results to interpret the model's effectiveness and areas for improvement.

### **Dataset:**
[sentimental-analysis-for-tweets](/kaggle/input/sentimental-analysis-for-tweets)

# <a id="Import"></a>
# <p style="background-color: #422057FF; font-family: 'Copperplate'; color:#FDDB27FF; font-size:140%; text-align:center; border-radius:1000px 10px;">3.0 Importing Libraries and Checking Data</p> 

# **3.1 Importing Libraries** 

In [None]:
# Import libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
import string
import re
import emoji
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
nltk.download('punkt')
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

from sklearn.model_selection import train_test_split

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from keras.layers import LSTM, Dense, SimpleRNN, Embedding, Flatten, Dropout
from keras.activations import softmax

# **3.2 Loading Dataset**

In [None]:
data = pd.read_csv("/kaggle/input/sentimental-analysis-for-tweets/sentiment_tweets3.csv")

In [None]:
data.head()

In [None]:
data.shape

In [None]:
data.head()

In [None]:
data.columns = ['Index','Text', 'label']
data.head()

# <a id="Import"></a>
# <p style="background-color: #422057FF; font-family: 'Copperplate'; color:#FDDB27FF; font-size:140%; text-align:center; border-radius:1000px 10px;">4.0 Data Preprocessing Steps</p> 

# **4.1 Lowercasing the Text :**

In [None]:
data['Text'] = data['Text'].str.lower()
data.head()

# **4.2 Remove HTML tags :**

In [None]:
def remove_html_tags(text):
    soup = BeautifulSoup(text, 'html.parser')
    return soup.get_text()

data['Text'] = data['Text'].apply(remove_html_tags)

data.head()

# **4.3 Remove URLs :**

In [None]:
def remove_urls(text):
    return re.sub(r'http\S+|www\S+', '', text)

data['Text'] = data['Text'].apply(remove_urls)

data.head()

# **4.4 Remove punctuation :**

In [None]:
punctuation = string.punctuation

# Function to remove punctuation from text
def remove_punctuation(text):
    return text.translate(str.maketrans('', '', punctuation))

# Apply remove_punctuation function to 'Text' column
data['Text'] = data['Text'].apply(remove_punctuation)

data.head()

# **4.5 Handling ChatWords :**

In [None]:
chat_words = {
    "AFAIK": "As Far As I Know",
    "AFK": "Away From Keyboard",
    "ASAP": "As Soon As Possible",
    "ATK": "At The Keyboard",
    "ATM": "At The Moment",
    "A3": "Anytime, Anywhere, Anyplace",
    "BAK": "Back At Keyboard",
    "BBL": "Be Back Later",
    "BBS": "Be Back Soon",
    "BFN": "Bye For Now",
    "B4N": "Bye For Now",
    "BRB": "Be Right Back",
    "BRT": "Be Right There",
    "BTW": "By The Way",
    "B4": "Before",
    "B4N": "Bye For Now",
    "CU": "See You",
    "CUL8R": "See You Later",
    "CYA": "See You",
    "FAQ": "Frequently Asked Questions",
    "FC": "Fingers Crossed",
    "FWIW": "For What It's Worth",
    "FYI": "For Your Information",
    "GAL": "Get A Life",
    "GG": "Good Game",
    "GN": "Good Night",
    "GMTA": "Great Minds Think Alike",
    "GR8": "Great!",
    "G9": "Genius",
    "IC": "I See",
    "ICQ": "I Seek you (also a chat program)",
    "ILU": "ILU: I Love You",
    "IMHO": "In My Honest/Humble Opinion",
    "IMO": "In My Opinion",
    "IOW": "In Other Words",
    "IRL": "In Real Life",
    "KISS": "Keep It Simple, Stupid",
    "LDR": "Long Distance Relationship",
    "LMAO": "Laugh My A.. Off",
    "LOL": "Laughing Out Loud",
    "LTNS": "Long Time No See",
    "L8R": "Later",
    "MTE": "My Thoughts Exactly",
    "M8": "Mate",
    "NRN": "No Reply Necessary",
    "OIC": "Oh I See",
    "PITA": "Pain In The A..",
    "PRT": "Party",
    "PRW": "Parents Are Watching",
    "QPSA?": "Que Pasa?",
    "ROFL": "Rolling On The Floor Laughing",
    "ROFLOL": "Rolling On The Floor Laughing Out Loud",
    "ROTFLMAO": "Rolling On The Floor Laughing My A.. Off",
    "SK8": "Skate",
    "STATS": "Your sex and age",
    "ASL": "Age, Sex, Location",
    "THX": "Thank You",
    "TTFN": "Ta-Ta For Now!",
    "TTYL": "Talk To You Later",
    "U": "You",
    "U2": "You Too",
    "U4E": "Yours For Ever",
    "WB": "Welcome Back",
    "WTF": "What The F...",
    "WTG": "Way To Go!",
    "WUF": "Where Are You From?",
    "W8": "Wait...",
    "7K": "Sick:-D Laugher",
    "TFW": "That feeling when",
    "MFW": "My face when",
    "MRW": "My reaction when",
    "IFYP": "I feel your pain",
    "TNTL": "Trying not to laugh",
    "JK": "Just kidding",
    "IDC": "I don't care",
    "ILY": "I love you",
    "IMU": "I miss you",
    "ADIH": "Another day in hell",
    "ZZZ": "Sleeping, bored, tired",
    "WYWH": "Wish you were here",
    "TIME": "Tears in my eyes",
    "BAE": "Before anyone else",
    "FIMH": "Forever in my heart",
    "BSAAW": "Big smile and a wink",
    "BWL": "Bursting with laughter",
    "BFF": "Best friends forever",
    "CSL": "Can't stop laughing"
}

# Function to replace chat words with their full forms
def replace_chat_words(text):
    words = text.split()
    for i, word in enumerate(words):
        if word.lower() in chat_words:
            words[i] = chat_words[word.lower()]
    return ' '.join(words)

# Apply replace_chat_words function to 'Text' column
data['Text'] = data['Text'].apply(replace_chat_words) 

data.head()

# **4.6 Handling StopWords :**

In [None]:
def remove_stopwords(text):
    stop_words = set(stopwords.words('english'))
    words = text.split()
    filtered_words = [word for word in words if word.lower() not in stop_words]
    return ' '.join(filtered_words)

# Example usage
data['Text'] = data['Text'].apply(remove_stopwords)
data.head()

# **4.7 Handling Emojis :**

In [None]:
def remove_emojis(text):
    return emoji.replace_emoji(text, replace='')


data['Text'] = data['Text'].apply(remove_emojis)

data.head()

In [None]:
from nltk.stem.porter import PorterStemmer

ps = PorterStemmer()

def stem_words(text):
    return " ".join([ps.stem(word) for word in text.split()])

data['Text'] = data['Text'].apply(stem_words)



# **4.8 Train Test Split :**

In [None]:
X = data['Text']
y = data['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=12)

# **4.9 Tokenization and Padding Sequences :**

In [None]:
# 9. Tokenization and Padding Sequences

tokenizer = Tokenizer(oov_token = 'nothing')
tokenizer.fit_on_texts(X_train)
tokenizer.fit_on_texts(X_test)

In [None]:
tokenizer.document_count

In [None]:
X_train_sequences = tokenizer.texts_to_sequences(X_train)
X_test_sequences = tokenizer.texts_to_sequences(X_test)

In [None]:
# Max Len in X_train_sequences
maxlen = max(len(tokens) for tokens in X_train_sequences)
print("Maximum sequence length (maxlen):", maxlen)

In [None]:
# Perform padding on X_train and X_test sequences
X_train_padded = pad_sequences(X_train_sequences, maxlen=maxlen, padding='post')
X_test_padded = pad_sequences(X_test_sequences, maxlen=maxlen, padding='post')

In [None]:
# Print the padded sequences for X_train and X_test
print("X_train_padded:")
print(X_train_padded)
print("\nX_test_padded:")
print(X_test_padded)

In [None]:
# Input Size
# Embedding Input Size / Vocabulary Size 
input_Size = np.max(X_train_padded) + 1
input_Size

# <a id="Import"></a>
# <p style="background-color: #422057FF; font-family: 'Copperplate'; color:#FDDB27FF; font-size:140%; text-align:center; border-radius:1000px 10px;">5.0 Modeling</p> 

# **5.1 Model Building:**

In [None]:
model = Sequential()

model.add(LSTM(256, input_shape=(75,1), return_sequences=True))  

model.add(LSTM(128)) 

model.add(Dense(64, activation='relu'))  
model.add(Dropout(0.01))

model.add(Dense(1, activation='sigmoid')) 

# **5.2 Model  Compilation:**

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.summary()

# **5.3 Model Training:**

In [None]:
# Model Train 
history = model.fit(X_train_padded, y_train, epochs=10, batch_size=32, validation_data=(X_test_padded, y_test))

# <a id="Import"></a>
# <p style="background-color: #422057FF; font-family: 'Copperplate'; color:#FDDB27FF; font-size:140%; text-align:center; border-radius:1000px 10px;">6.0 Prediction</p> 

# **6.1 Plotting the graph of Accuracy and Validation Accuracy:**

In [None]:
plt.title('Training Accuracy vs Validation Accuracy')

plt.plot(history.history['accuracy'], color='red',label='Train')
plt.plot(history.history['val_accuracy'], color='blue',label='Validation')

plt.legend()

# **6.2 Plotting the graph of Accuracy and Validation loss:**

In [None]:
plt.title('Training Loss vs Validation Loss')

plt.plot(history.history['loss'], color='red',label='Train')
plt.plot(history.history['val_loss'], color='blue',label='Validation')

plt.legend()

# <a id="Import"></a>
# <p style="background-color: #422057FF; font-family: 'Copperplate'; color:#FDDB27FF; font-size:140%; text-align:center; border-radius:1000px 10px;">7.0 The End</p> 