Skip to content

Implementing machine learning models from scratch for Sentiment Analysis on movie reviews

Notifications You must be signed in to change notification settings

TSunny007/Movie-Reviews-Classification

Repository files navigation

Movie-Reviews-Classification

Binary Sentiment Analysis on movie reviews

We have replicated the features that were used in the original paper. A movie review is featurized as a bag-of-words, where each feature is the number of times a particular word occurs in the review. Of course, most words don't occur in a single review. So while the dimensionality of the feature vector is the number of words (in this case 74481), most reviews correspond to very sparse vectors in this space.

Classifiers implemented (all except Neural net from scratch) include:

  • Margin Perceptron: accuracy of 87.3% on testing data, 87% on validation data
  • Average Perceptron: accuracy of 87.6% on testing data, 88.4% on validation data
  • SVM: accuracy of 88.89% on testing data, 88.2% on validation data
  • Logistic Regression: accuracy of 86.8% on testing data, 86.9% on validation data
  • Naive Bayes: assumes a Gaussian kernel, accuracy of 81.6% on testing data, 80.8% on validation data
  • Neural Network: accuracy of 86.3% on testing data, 86.2% on validation data

All classifiers are compliant with Scikit-learn's API, and that makes it possible to use Sklearn's CV and Gridsearch functions. This allows for multi-core training and cross-validation capability.

Kaggle Competition gives more details about the dataset. The dataset can also be found at http://ai.stanford.edu/~amaas/data/sentiment/

About

Implementing machine learning models from scratch for Sentiment Analysis on movie reviews

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published