<a href="https://colab.research.google.com/github/Manikoduru/Moviereviews/blob/main/mSentiment_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis Project


## Load the Data

In [9]:
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [10]:
import pandas as pd


file_path = '/content/drive/My Drive/Copy of moviereviews.tsv'
df = pd.read_csv(file_path, sep='\t')


df.head()


Unnamed: 0,label,review
0,neg,how do films like mouse hunt get into theatres...
1,neg,some talented actresses are blessed with a dem...
2,pos,this has been an extraordinary year for austra...
3,pos,according to hollywood movies made in last few...
4,neg,my first press screening of 1998 and already i...


## Remove Blank Records (optional)

In [11]:

df.dropna(inplace=True)

df = df[df['review'].str.strip() != '']


df.reset_index(drop=True, inplace=True)

print(f"Total reviews after cleaning: {len(df)}")


Total reviews after cleaning: 1938


## Import `SentimentIntensityAnalyzer` and create an sid object
This assumes that the VADER lexicon has been downloaded.

In [12]:
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer


nltk.download('vader_lexicon')


sid = SentimentIntensityAnalyzer()


[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


## Use sid to append a `comp_score` to the dataset

In [13]:
df['comp_score'] = df['review'].apply(lambda review: sid.polarity_scores(review)['compound'])


In [14]:
df['vader_sentiment'] = df['comp_score'].apply(lambda score: 'pos' if score >= 0 else 'neg')


## Perform a comparison analysis between the original `label` and `comp_score`

In [15]:
from sklearn.metrics import classification_report, confusion_matrix

print("Classification Report:\n")
print(classification_report(df['label'], df['vader_sentiment']))

print("\nConfusion Matrix:\n")
print(confusion_matrix(df['label'], df['vader_sentiment']))


Classification Report:

              precision    recall  f1-score   support

         neg       0.72      0.44      0.55       969
         pos       0.60      0.83      0.70       969

    accuracy                           0.64      1938
   macro avg       0.66      0.64      0.62      1938
weighted avg       0.66      0.64      0.62      1938


Confusion Matrix:

[[427 542]
 [164 805]]
