<br>

# <center> Sentiment Analysis of Twitter Tweets using NLP and LSTM

## <center> Pridiction on Real Time Text

<br>

---

<br>


# List of Contents



*   [1. Initialization](#initialization)
*   [2. Loading the required Data](#loading-the-required-data)
*   [3. Text Processing](#text-preprocessing)
*   [4. Prediction](#prediction)


<br>


<br>
<br>

<a name='initialization'></a>
# 1. Initialization

<br>

## 1.1. Colab Configuration

### 1.1.1. Mount Google Drive

In [1]:
'''
    This is required if the code runs in Google Colab.
    - this code will mount Google Drive for Colab.
    - the code needs to run only once.
'''

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


<br>

### 1.1.2. Defining Root Directory

In [2]:
# -----------------------------------------------
#   Check the code is running on Colab or not   |
# -----------------------------------------------
import sys
is_running_on_colab = 'google.colab' in sys.modules


# -----------------------------------------------
#               Root Directory                  |
# -----------------------------------------------
# this directory will be used as Root Directory to read/write any file
if is_running_on_colab:
    # for google-colab
    rootDir = '/content/drive/MyDrive/_ML/Twitter Sentiment Analysis LSTM'
else:
    # for application
    rootDir = './mlData/'
    

<br>

## 1.2. Import Libraries

In [3]:
# importing all the required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json

<br>

<a name='loading-the-required-data'></a>
# 2. Loading the required Data

<br>

> Loading the Model

In [4]:
# importing the library
from keras.models import load_model

# defining the name of the model
modelName = 'Sentiment_Analyser.h5'

# creating the path
path = f"{rootDir}/03. Generated Data/{modelName}"

# Load the model from the file
model = load_model( filepath = path )

<br>

> Loding the Tokenizer, Config, and Target_names

In [5]:
import pickle

filename = 'data.pkl'
path = f'{rootDir}/03. Generated Data/{filename}'

# open a file for writing in binary mode
with open(path, 'rb') as f:
    # deserialize and load the file
    data = pickle.load(f)

In [6]:
tokenizer = data['tokenizer']
config = data['config']
target_names = data['target_names']

<br>
<br>

<a name='text-preprocessing'></a>

# 3. Text Processing

## 3.1. Filtering the text

In [7]:
# importing regex library
import re   

# defining the function to filter tweets
def filter_text(
        text: str
    ):

    '''
        Filtering the tweet string to extract meaningful text.
        

        Parameter
        ---------
        text
            a text (string)

        Return
        ------
        ret
            filtered tweet string
    '''


    # 01. converting the text to lower case
    text = text.lower()

    # 02. filtering non-letters from the text so only valid words remain
    text = re.sub(r"[^a-zA-Z0-9]", " ", text)

    # 03. removes a specific word from text if exists
    word_to_remove = 'rt' # defining the word to remove
    # text = re.sub(r'\b' + word_to_remove + r'\b', '', text)

    # 04. striping the white space at starting or end of the text
    text = text.strip()


    # returning the filtered text
    return text

<br>

## 3.2. NLP

<br>

### 3.2.1. Initializing NLTK

In [8]:
import nltk

nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

<br>

### 3.2.2. Lemmatizer Design

<br>

> Designing Parts-Of-Speech Tagger

In [9]:
# accessing the wordnet library for parts-of-speech tagging
from nltk.corpus import wordnet

# defining function for POS tagging of a word
def get_wordnet_pos(word):
    '''
        Map POS tag to first character the WordNetLemmatizer() function accepts

        Parameters
        ----------
        word
            the word whose Parts-Of-Speech will be tagged

        Return
        ------
        ret
            tagged Parts-Of-Speech of the word
    '''
    
    # accessing the first alphabet
    tag = nltk.pos_tag([word])[0][1][0].lower()
    
    # definning the dictionary
    tag_dict = {
        "a": wordnet.ADJ,
        "n": wordnet.NOUN,
        "v": wordnet.VERB,
        "r": wordnet.ADV
    }
    
    # getting the tagged parts-of-speech
    pos = tag_dict.get(tag, wordnet.NOUN)
    
    # returning the tagged pos
    return pos

<br>

> Desiging Lemmatizer

In [10]:
from nltk.stem import WordNetLemmatizer

def _lemmatize(sentence):
    '''
        Lemmatize the words of a sentence.

        Parameters
        ----------
        sentence
        the sentence whose words will be lemmatized

        Return
        ------
        ret
        lemmatized sentence
    '''
        
    # creating the lemmatizer
    lemmatizer = WordNetLemmatizer()

    # definign list to store the lemmatized words
    lemmatized_words = []
        
    # looping through the words of the sentence to lemmatize
    for word in sentence.split():
        # parts-of-speech of the word
        pos = get_wordnet_pos(word)
        
        # lemmatized word
        lemma = lemmatizer.lemmatize(word=word, pos=pos)
        
        # appending the lemmatized word to the list
        lemmatized_words.append(lemma.lower())

    # creating the sentence with lemmatized words
    lemmatized_sentence = ' '.join(lemmatized_words)

    # returning the lemmatized sentence
    return lemmatized_sentence

<br>

## 3.3. Tokenizing & Padding

In [11]:
# accessing the library for tokenizing and padding
from keras.utils import pad_sequences

<br>
<br>

<a name='prediction'></a>

# 4. Prediction

In [12]:
def get_prediction(
        text: str
    ):

    '''
        This function takes a Text as String and Predicts the Class Name

        Parameter
        ---------
        text
            a text (string)

        Return
        ------
        ret
            predicted class name (string)
    '''

    # filtering the text
    filtered_text = filter_text(text)

    # lemmatizing the filtered text
    lemma = _lemmatize(filtered_text)

    # transforming the lemma to a sequence of integers
    tokens = tokenizer.texts_to_sequences([lemma])

    # padding the tokens to the a defined length
    sequence = pad_sequences(tokens, padding='post', maxlen=config['max_length'])

    # predicting result for the sequence
    predicted_class = model.predict(sequence, verbose=0)

    # extracting the index of the predicted class with maximum probability
    predicted_class = int( np.argmax(predicted_class, axis=1) )

    # extracting the class name
    predicted_class_name = target_names[predicted_class]

    # returning the predicted class name
    return predicted_class_name


In [13]:
# taking a input as text
# print('Type Something: \n')
text = input('Type Something: ')

# predicting on the text
predicted_class_name = get_prediction(text)
predicted_sentiment = predicted_class_name.capitalize()

# displaying the predicted result
print(f'\n Pricted Sentiment : {predicted_sentiment}')
print(f"\n'{text}' has {predicted_sentiment} sentiment")


Type Something: I had a wonderful day today!

 Pricted Sentiment : Positive

'I had a wonderful day today!' has Positive sentiment
