<a href="https://colab.research.google.com/github/adhilnajeeb7/ICT/blob/main/movie_sentiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [6]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import string

# Download NLTK resources
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

# Load IMDb dataset
# Replace 'imdb_dataset.csv' with the path to your IMDb dataset
imdb_data = pd.read_csv('IMDB-Dataset.csv')

# Preprocess dataset
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def preprocess_text(text):
    # Tokenize
    tokens = word_tokenize(text.lower())
    # Remove punctuation and non-alphabetic characters
    tokens = [word for word in tokens if word.isalpha()]
    # Remove stopwords
    tokens = [word for word in tokens if word not in stop_words]
    # Lemmatize
    tokens = [lemmatizer.lemmatize(word) for word in tokens]
    return ' '.join(tokens)

imdb_data['review'] = imdb_data['review'].apply(preprocess_text)

# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(imdb_data['review'], imdb_data['sentiment'], test_size=0.2, random_state=42)

# Vectorize text data
vectorizer = TfidfVectorizer(max_features=5000)  # Limit features to top 5000
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

# Train logistic regression model
model = LogisticRegression()
model.fit(X_train_vectorized, y_train)

# Predict sentiment for user input
def predict_sentiment(review):
    preprocessed_review = preprocess_text(review)
    review_vectorized = vectorizer.transform([preprocessed_review])
    prediction = model.predict(review_vectorized)
    return prediction[0]

# Get user input
user_review = input("Enter your movie review: ")

# Predict sentiment
sentiment = predict_sentiment(user_review)

# Print result
if sentiment == 'positive':
    print("The sentiment of the review is positive.")
else:
    print("The sentiment of the review is negative.")


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Enter your movie review: I'll get right to the point. This is the 2nd worst Godzilla flic since Godzilla 1985. As usual, Legendary manages to make even the charactors/actors with the greatest potential and in their usual fashion, turns them into cardboad. Not even the semi-clever quips work to give these characters a heart. Kaylee Hottle is very talented and the poor script manages to snuff out her true potential. Same for Dan Stevens who has the potential to do great things but again, the script is so trite he is lost in the murk. The other cast members did a phone-in "performance" that a high schooler could have done.....maybe even better. As far as the CGI: over utilized and a quality that pales in comparison to Minus 1. Kong and Mini-Kong had rediculous facial expressions and the fight scenes between them looked like WWW wrestling match with all the correographed moves. If it wasn't so rediculous and disappointing, it would be hilarious. In Fact; audience was actually laughing at t

In [10]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Embedding, Conv1D, MaxPooling1D, Flatten, Dense
from keras.utils import to_categorical
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import string

# Download NLTK resources
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

# Load IMDb dataset
# Replace 'imdb_dataset.csv' with the path to your IMDb dataset
imdb_data = pd.read_csv('IMDB-Dataset.csv')

# Preprocess dataset
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def preprocess_text(text):
    # Tokenize
    tokens = word_tokenize(text.lower())
    # Remove punctuation and non-alphabetic characters
    tokens = [word for word in tokens if word.isalpha()]
    # Remove stopwords
    tokens = [word for word in tokens if word not in stop_words]
    # Lemmatize
    tokens = [lemmatizer.lemmatize(word) for word in tokens]
    return ' '.join(tokens)

imdb_data['review'] = imdb_data['review'].apply(preprocess_text)

# Encode labels
label_encoder = LabelEncoder()
imdb_data['sentiment'] = label_encoder.fit_transform(imdb_data['sentiment'])

# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(imdb_data['review'], imdb_data['sentiment'], test_size=0.2, random_state=42)

# Tokenize text data
max_words = 10000
tokenizer = Tokenizer(num_words=max_words)
tokenizer.fit_on_texts(X_train)

X_train_sequences = tokenizer.texts_to_sequences(X_train)
X_test_sequences = tokenizer.texts_to_sequences(X_test)

max_len = 200  # Limiting sequence length to 200 words
X_train_padded = pad_sequences(X_train_sequences, maxlen=max_len)
X_test_padded = pad_sequences(X_test_sequences, maxlen=max_len)

# Define CNN model
embedding_dim = 100
num_filters = 128
kernel_size = 5

model = Sequential()
model.add(Embedding(max_words, embedding_dim, input_length=max_len))
model.add(Conv1D(num_filters, kernel_size, activation='relu'))
model.add(MaxPooling1D(2))
model.add(Conv1D(num_filters, kernel_size, activation='relu'))
model.add(MaxPooling1D(2))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train model
model.fit(X_train_padded, y_train, epochs=5, batch_size=64, validation_data=(X_test_padded, y_test))

# Evaluate model
loss, accuracy = model.evaluate(X_test_padded, y_test)
print("Test Accuracy:", accuracy)

# Predict sentiment for user input
def predict_sentiment(review):
    preprocessed_review = preprocess_text(review)
    review_sequence = tokenizer.texts_to_sequences([preprocessed_review])
    review_padded = pad_sequences(review_sequence, maxlen=max_len)
    prediction = model.predict(review_padded)
    return prediction[0][0]

# Get user input
user_review = input("Enter your movie review: ")

# Predict sentiment
sentiment_score = predict_sentiment(user_review)

# Print result
if sentiment_score >= 0.5:
    print("The sentiment of the review is positive.")
else:
    print("The sentiment of the review is negative.")


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test Accuracy: 0.8744000196456909
Enter your movie review: I was really excited for Godzilla vs. Kong, but it let me down. The story felt thin, and I just couldn't get attached to any of the characters. It seemed like all the effort went into the special effects, (and even the cgi sucked) and not enough into the story that makes you care. Sometimes, the fights were so over the top that I couldn't even keep up, and instead of being fun, it was just confusing. I wanted to love it for the spectacle, but without a story to hook me, I walked away feeling pretty disappointed. I don't recommend wasting your time watching this garbage, just walk away or watch anything else.
The sentiment of the review is negative.
