<a href="https://colab.research.google.com/github/Jayalakshmi12345/Sentiment-Analysis/blob/main/sentiment_analysis_LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Aim:**
To implement sentiment analysis on movie reviews using LSTM .


**Description:**




*   Sentiment analysis is widely used, especially as a part of social media analysis for any
domain, be it a business, a recent movie, or a product launch, to understand its reception by
the people and what they think of it based on their opinions or sentiment!
*   **RNN** is a type of supervised deep learning algorithm. Here, the neurons are connected to themselves through time. The idea behind RNN is to remember what information was there in the previous neurons so that these neurons could pass information to themselves in the future for further analysis. It means that the information from a specific time instance (t1) is used as an input for the next time instance(t2). This is the idea behind RNN.


*   **LSTM**  is an updated version of Recurrent Neural Network to overcome the vanishing gradient problem. Below is the architecture of LSTM 






**Procedure:**

Step 1: Importing required libraries.

Step 2: Loading the dataset.

Step 3: Checking for null values in the dataset.

Step 4: Cleaning the data. It includes removing the special characters, digits, unnecessary symbols, and stop words. Also, it is required to convert the words to their root form for easy interpretation.


Step 5: Encoding the target variable using ‘Label Encoder’ from the ‘sklearn’ library.

Step 6: Tokenizing and converting the reviews into numerical vectors.

Step 7: Building the LSTM model using the ‘Keras’ library. This step involves model initialization, adding required LSTM layers, and model compilation

Step 8: Splitting the data into training and testing data.

Step 9: Training the model using training data.

Step 10: Evaluating the model.


**Code:**

**Step 1**: Importing required libraries.

In [None]:
# Importing required libraries
import nltk
import pandas as pd
from nltk.corpus import stopwords
from textblob import Word
from sklearn.preprocessing import LabelEncoder
from collections import Counter
import wordcloud
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
from keras.models import Sequential
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
from sklearn.model_selection import train_test_split 
import matplotlib.pyplot as plt

In [None]:
!pip install kaggle

In [None]:
!mkdir ~/.kaggle

In [None]:
!cp kaggle.json ~/.kaggle/

In [None]:
!chmod 600 ~/.kaggle/kaggle.json

In [None]:
!kaggle datasets download -d lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

**Step 2:** Loading the dataset.

In [None]:
url='/content/imdb-dataset-of-50k-movie-reviews.zip'

In [None]:
df=pd.read_csv(url)

In [None]:
df

**Step 3:** Checking for null values in the dataset.



In [None]:
#Check if there are any null values
data_v1= df[['review','sentiment']]
data_v1.isnull().sum()

**Step 4:** Cleaning the data. It includes removing the special characters, digits, unnecessary symbols, and stop words. Also, it is required to convert the words to their root form for easy interpretation.

In [None]:
import nltk
nltk.download('stopwords')
nltk.download('wordnet')

In [None]:
def cleaning(df, stop_words):
    df['review'] = df['review'].apply(lambda x:' '.join(x.lower() for x in x.split()))
    # Replacing the special characters
    df['review'] = df['review'].str.replace('[^ws]', '')
    # Replacing the digits/numbers
    df['review'] = df['review'].str.replace('d', '')
    # Removing stop words
    df['review'] = df['review'].apply(lambda x:' '.join(x for x in x.split() if x not in stop_words))
    # Lemmatization
    df['review'] = df['review'].apply(lambda x:' '.join([Word(x).lemmatize() for x in x.split()]))
    return df
stop_words = stopwords.words('english')
data_v1 = cleaning(data_v1, stop_words)

**Step 5:** Encoding the target variable using ‘Label Encoder’ from the ‘sklearn’ library.

In [None]:
# Encoded the target column
lb=LabelEncoder()
data_v1['sentiment'] = lb.fit_transform(data_v1['sentiment'])
data_v1['sentiment']

**Step 6:** Tokenizing and converting the reviews into numerical vectors.

In [None]:
tokenizer = Tokenizer(num_words=500, split=' ') 
tokenizer.fit_on_texts(data_v1['review'].values)
X = tokenizer.texts_to_sequences(df['review'].values)
X = pad_sequences(X)

**Step 7:** Building the LSTM model using the ‘Keras’ library. This step involves model initialization, adding required LSTM layers, and model compilation

In [None]:
model = Sequential()
model.add(Embedding(500, 120, input_length = X.shape[1]))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(176, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2,activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics = ['accuracy'])
print(model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 12, 120)           60000     
                                                                 
 spatial_dropout1d (SpatialD  (None, 12, 120)          0         
 ropout1D)                                                       
                                                                 
 lstm (LSTM)                 (None, 176)               209088    
                                                                 
 dense (Dense)               (None, 2)                 354       
                                                                 
Total params: 269,442
Trainable params: 269,442
Non-trainable params: 0
_________________________________________________________________
None


**Step 8:** Splitting the data into training and testing data.

In [None]:
#Splitting the data into training and testing
y=pd.get_dummies(data_v1['sentiment'])
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state = 42)

**Step 9:** Training the model using training data.

In [None]:
batch_size=40
model.fit(X_train, y_train, epochs = 10, batch_size=batch_size, verbose = 'auto')

**Step 10:** Evaluating the model

In [None]:
model.evaluate(X_test,y_test) 



[0.6934447884559631, 0.4952666759490967]

**Result:**
Sentiment analysis using LSTM is implemented successfully.