<a href="https://colab.research.google.com/github/dayana-cabrera004/npl/blob/main/NLP_Class_Assignment_Sentiment_Analysis_Using_TensorFlow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Loading the IMDb Dataset

In [1]:
import numpy as np
import pandas as pd

# Load the dataset
df = pd.read_csv('/content/IMDB Dataset.csv')
df.sample(5)

Unnamed: 0,review,sentiment
10037,"A lot of people seemed to have liked the film,...",negative
33670,An incredible little English film for so many ...,positive
28574,"This early film from director Bob Clark (""Pork...",negative
13091,I've been watching a lot of cartoon or animate...,negative
21152,"Of all the reviews I've read, most people have...",positive


2. Data Cleaning and Preprocessing

In [2]:
import re
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Function to clean text
def clean_text(text):
    text = re.sub(r'<.*?>', '', text)  # Remove HTML tags
    text = re.sub(r'[^a-zA-Z\s]', '', text)  # Remove non-alphabet characters
    return text.lower().strip()

# Clean the reviews
df['review'] = df['review'].apply(clean_text)

# Tokenization and padding
tokenizer = Tokenizer(num_words=10000, oov_token='<OOV>')
tokenizer.fit_on_texts(df['review'])
sequences = tokenizer.texts_to_sequences(df['review'])
padded_sequences = pad_sequences(sequences, maxlen=200)

3. Converting Labels to Numeric Format

In [3]:
# Convert sentiment labels to binary
df['sentiment'] = df['sentiment'].apply(lambda x: 1 if x == 'positive' else 0)

# Splitting the data into features (X) and labels (y)
X = padded_sequences
y = df['sentiment'].values

4. Splitting the Data into Training and Testing Sets

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

5. Building the Neural Network with TensorFlow

In [None]:
import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(10000, 16, input_length=200),
    tf.keras.layers.LSTM(64, return_sequences=True),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(24, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

history = model.fit(X_train, y_train, epochs=10, validation_split=0.2)

Epoch 1/10
[1m 326/1000[0m [32m━━━━━━[0m[37m━━━━━━━━━━━━━━[0m [1m1:55[0m 171ms/step - accuracy: 0.5749 - loss: 0.6572

6. Visualizing Model Performance

In [None]:
import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

7. Evaluating the Model

In [None]:
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f'Test Accuracy: {test_accuracy:.2f}')

8. Making Predictions

In [None]:
sample_reviews = [
    "I absolutely loved this movie! The plot was thrilling and the characters were so well developed.",
    "The film was a disaster. Poor acting and a predictable storyline."
]

sample_sequences = tokenizer.texts_to_sequences(sample_reviews)
sample_padded = pad_sequences(sample_sequences, maxlen=200)

predictions = model.predict(sample_padded)
print(["Positive" if prob > 0.5 else "Negative" for prob in predictions])

1. Loading the Reviews Dataset

2. Data Cleaning and Preprocessing

3. Converting Labels to Numeric Format

4. Splitting the Data into Training and Testing Sets

5. Building the Neural Network with TensorFlow

6. Visualizing Model Performance

7. Evaluating the Model

8. Making Predictions