Movie Review Sentiment Analysis

This repository contains code and resources for performing sentiment analysis on movie reviews using natural language processing (NLP) techniques.

Overview

This project focuses on sentiment analysis of movie reviews using natural language processing (NLP) techniques. Sentiment analysis, also known as opinion mining, aims to determine the sentiment expressed in a piece of text, whether it is positive, negative, or neutral. In this project, we specifically target movie reviews and aim to classify them as either positive or negative based on the sentiment conveyed in the text.

Dataset

The dataset used for this project is the IMDB dataset, which contains a large collection of movie reviews labeled as positive or negative.

You can download the dataset from [link to dataset].

Project Structure

The project is organized as follows:

Models/: This directory contains saved model files after training and the standard vocabulary file.
IMDB-Dataset.csv: This file contains the dataset used for training and testing the model.
Movie Review Sentiment Analysis.ipynb: This notebook contains code used for data exploration, preprocessing, model training, and evaluation.
Token Count Distribution.png: This image contains the token count distribution of the entire dataset.

Dependencies

The project requires the following Python libraries:

numpy
pandas
scikit-learn
nltk
matplotlib
seaborn

Steps Involved In Analysis

Problem : Classify the reviews of movies into either a positive sentiment or a negative sentiment review.
Import the required packages and Load the data: We utilize the IMDB dataset, which contains a large collection of movie reviews labeled with their corresponding sentiment (positive or negative). We import the necessary packages such as numpy, pandas, scikit-learn models and metrics, nltk.
EDA and Preprocess the data: We preprocess the text data by tokenizing the reviews into words, removing stopwords and punctuation, and performing stemming to reduce words to their base form.
Feature Engineering: We extract features from the preprocessed text data using techniques such as bag-of-words and countVectorizer.
Model Selection and Training: We train a machine learning model, specifically a Naive Bayes classifiers such as Gaussian, Multinominal and Bernouli models, on the extracted features to learn patterns in the text data and classify movie reviews into positive or negative sentiment categories.
Evaluate the Model Performance: We evaluate the performance of the trained model using metrics such as accuracy, precision, and recall on a separate test set of movie reviews.

By accurately predicting the sentiment of movie reviews, this project has various practical applications such as movie recommendation systems, customer feedback analysis for movie producers and distributors, and sentiment analysis of social media discussions about movies.

Plot

Distribution of Token Counts of the Corpus

Confusion Matrix

Gaussian NB Model
Multinominal NB Model
Bernouli NB Model

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Models		Models
.gitattributes		.gitattributes
IMDB-Dataset.csv		IMDB-Dataset.csv
Movie Review Sentiment Analysis.ipynb		Movie Review Sentiment Analysis.ipynb
README.md		README.md
Token Count Distribution.png		Token Count Distribution.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Review Sentiment Analysis

Overview

Dataset

Project Structure

Dependencies

Steps Involved In Analysis

Plot

Confusion Matrix

About

Releases

Packages

Languages

Prajwal-1718/Movie-Review-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Movie Review Sentiment Analysis

Overview

Dataset

Project Structure

Dependencies

Steps Involved In Analysis

Plot

Confusion Matrix

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages