Sentiment Vectors

A Python reproduction of Learning Word Vectors for Sentiment Analysis by Maas et al. (2011).

This project learns word representations from IMDb movie reviews. The objective is to build word vectors that capture both:

semantic similarity: words used in similar contexts should be close; sentiment orientation: words carrying similar polarity should also be close.

The implementation follows the main ideas of the paper and evaluates the learned representations using a linear SVM classifier.

📁 Project Structure

SentimentAnalysis/
├── aclImdb/
│   ├── train/
│   │   ├── pos/
│   │   ├── neg/
│   │   └── unsup/
│   └── test/
│       ├── pos/
│       └── neg/
├── params/
├── src/
│   └── sentiment_vectors/
│       ├── __init__.py
│       ├── data.py
│       ├── vocabulary.py
│       ├── semantic.py
│       ├── sentiment.py
│       ├── evaluation.py
│       └── utils.py
├── Scripts/
│   ├── step1_load_data.py
│   ├── step2_build_vocab.py
│   ├── step3_train_semantic.py
│   ├── step4_train_full.py
│   └── step5_evaluate.py
├── pyproject.toml
└── README.md

Dataset

The project uses the IMDb dataset introduced by Maas et al. (2011).

aclImdb/
├── train/
│   ├── pos/    positive labeled reviews
│   ├── neg/    negative labeled reviews
│   └── unsup/  unlabeled reviews
└── test/
    ├── pos/    positive test reviews
    └── neg/    negative test reviews

The training data contains: 25,000 labeled reviews 50,000 unlabeled reviews 25,000 test reviews

Example results

Nearest neighbors obtained from the learned word vectors:

Query word	Semantic + Sentiment	Semantic Only
wonderful	perfect, fantastic, deserved, incredible, awesome	perfect, amazing, excellent, superb, fabulous
terrible	awful, horrible, plain, corny, crap	awful, horrible, atrocious, 1/10, badly
boring	terrible, predictable, awful, annoying, honestly	amateur, pointless, mess, 1/10, terrible
romantic	romance, appealing, charm, charming, performer	romance, charming, appealing, roles, charm

Classification results:

Method	Our Result	Paper
Bag of Words (bnc)	87.94%	87.80%
Our Semantic Only	82.84%	87.30%
Our Full	87.52%	87.44%
Our Full + BoW	88.06%	88.89%

Reference

Maas, Andrew L., Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning Word Vectors for Sentiment Analysis. ACL 2011. Dataset: IMDb Large Movie Review Dataset.# Sentiment-Analysis

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Scripts		Scripts
__pycache__		__pycache__
aclImdb		aclImdb
params		params
src		src
test_files/libs		test_files/libs
.gitignore		.gitignore
README.md		README.md
R_full.npy		R_full.npy
R_semantic.npy		R_semantic.npy
b_full.npy		b_full.npy
b_semantic.npy		b_semantic.npy
bc_full.npy		bc_full.npy
dict.json		dict.json
psi_full.npy		psi_full.npy
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Vectors

📁 Project Structure

Dataset

Example results

Classification results:

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentiment Vectors

📁 Project Structure

Dataset

Example results

Classification results:

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages