Skip to content

🤖 Sentiment Analysis using IMDb Reviews. The project contains TfidfVectorizer for representing text in numeric form. 🎥

Notifications You must be signed in to change notification settings

Serfati/imdb_nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open In Colab

Description

This repository contains simple yet efficient solution for sentiment polarity analysis of IMDB dataset with TfIdfVectorizer and Many ML models such as SVM, NB and Kmeans.

Sentiment analysis is a challenging subject in machine learning. People express their emotions in language that is often obscured by sarcasm, ambiguity, and plays on words, all of which could be very misleading for both humans and computers

Dataset

The labeled data set consists of 50,000 IMDB movie reviews, specially selected for sentiment analysis. The sentiment of reviews is binary, meaning the IMDB rating < 5 results in a sentiment score of 0, and rating >=7 have a sentiment score of 1. No individual movie has more than 30 reviews. The 25,000 review labeled training set does not include any of the same movies as the 25,000 review test set. In addition, there are another 50,000 IMDB reviews provided without any rating labels.

Data fields

  • review - Text of the review
  • sentiment - Sentiment of the review; 1 for positive and 0 for negative reviews

⚠️ Prerequisites

📦 How To Install

You can modify or contribute to this project by following the steps below:

1. Clone the repository

  • Open terminal ( Ctrl + Alt + T )

  • Clone to a location on your machine.

# Clone the repository 
$> git clone https://github.com/serfati/imdb_nlp.git  

# Navigate to the directory 
$> cd imdb_nlp

2. Install Dependencies

# install with pip/conda 
$> pip install -r requirments.txt

3. launch of the project

# Run nootebook 
$> jupyter notebook IMDB_NLP.ipynb
  • Or open with Colab

    Open In Colab


author Serfati

⚖️ License

This program is free software: you can redistribute it and/or modify it under the terms of the MIT LICENSE as published by the Free Software Foundation.

⬆ back to top

About

🤖 Sentiment Analysis using IMDb Reviews. The project contains TfidfVectorizer for representing text in numeric form. 🎥

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published