<a href="https://colab.research.google.com/github/SrishtiGoswami/DL_SC/blob/main/Sentiment_analysis_for_the_UI_UX_of_an_app.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [12]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

**Load the dataset:** The first step is to load the user reviews dataset into a pandas DataFrame using the read_csv() function.

In [13]:
data = pd.read_csv('dataset.csv')

**Split the data:** The next step is to split the dataset into training and testing sets using the sample() and drop() functions from pandas.

This code randomly selects 80% of the rows from the dataset to be used for training, and the remaining 20% are used for testing.

In [14]:
train_data = data.sample(frac=0.8, random_state=0)
test_data = data.drop(train_data.index)

**Preprocess the text data:** In order to use the text data in a machine learning model, we need to convert the text into numerical values. This is done using the Tokenizer and pad_sequences functions from Keras.

The Tokenizer function converts each word in the text into an integer, and the pad_sequences function pads the sequences to a fixed length of 100.

In [15]:
tokenizer = Tokenizer(num_words=10000, oov_token='<OOV>')
tokenizer.fit_on_texts(train_data['content'].values)

train_sequences = tokenizer.texts_to_sequences(train_data['content'].values)
train_padded = pad_sequences(train_sequences, maxlen=100, truncating='post')

test_sequences = tokenizer.texts_to_sequences(test_data['content'].values)
test_padded = pad_sequences(test_sequences, maxlen=100, truncating='post')

**Define the model architecture:** The next step is to define the architecture of the deep learning model. This code defines a simple model with an embedding layer, a global average pooling layer, and two dense layers.

The embedding layer learns a dense vector representation for each word in the text, and the global average pooling layer aggregates these vectors into a fixed-length representation of the entire review. The dense layers then perform the actual classification task, with the final layer using a sigmoid activation function to output a probability between 0 and 1.

In [16]:
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(10000, 16, input_length=100),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])


**Compile the model:** Once the model is defined, we need to compile it with a loss function, an optimizer, and any evaluation metrics we want to use.

This code uses binary crossentropy loss, which is appropriate for binary classification problems, and the Adam optimizer, which is a popular optimization algorithm for deep learning models. We also specify that we want to track the accuracy metric during training.

In [17]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

**Train the model:** The next step is to train the model on the training set using the fit() function.

This code trains the model for 10 epochs and validates it on the testing set after each epoch. The fit() function returns a history object that contains information about the training process.

In [18]:
# Train the model
history = model.fit(train_padded, train_data['sentiment'].values, epochs=10, 
                    validation_data=(test_padded, test_data['sentiment'].values))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


**Evaluate the model:** Once the model is trained, we can evaluate its performance on the testing set using the evaluate() function.

In [21]:
loss, accuracy = model.evaluate(test_padded, test_data['sentiment'].values, verbose=False)
print(f'Test accuracy: {accuracy}')

Test accuracy: 0.9163458347320557


The **new_reviews** variable is a list of two new reviews that we want to predict the sentiment of. In this case, one review is positive and the other is negative.

The **texts_to_sequences** method of the tokenizer object is used to convert the text data in **new_reviews** to sequences of integers, which can be input to the neural network model.

The **pad_sequences** function is used to ensure that all sequences are of the same length, which is necessary for input to the neural network. Here, we pad the sequences with zeros up to a maximum length of 100 and truncate any sequences that are longer than this.

The **predict** method of the model object is used to generate predictions for the sentiment of the new reviews. The **new_padded** sequences are input to the model, which generates a prediction for each review.

Finally, the predicted sentiment values for the new reviews are printed to the console. This line of code formats a string to include the predictions as a variable in the output.

In [23]:
new_reviews = ['The UI is very intuitive and easy to use.', 'The app is too slow and buggy.']
new_sequences = tokenizer.texts_to_sequences(new_reviews)
new_padded = pad_sequences(new_sequences, maxlen=100, truncating='post')
predictions = model.predict(new_padded)

print(f'Sentiment predictions: {predictions}')

Sentiment predictions: [[0.99010366]
 [0.9840924 ]]


This means that the model predicts a positive sentiment for the first review with a probability of 0.987, and a negative sentiment for the second review with a probability of 0.980.