Use Google's Doc2Vec for movie reviews

Kaggle Challenge: Bag of words meets bags of popcorn

Tutorial Overview

This tutorial will help you get started with Word2Vec for natural language processing. It has two goals:

Basic Natural Language Processing: Part 1 of this tutorial is intended for beginners and covers basic natural language processing techniques, which are needed for later parts of the tutorial.

Deep Learning for Text Understanding: In Parts 2 and 3, we delve into how to train a model using Word2Vec and how to use the resulting word vectors for sentiment analysis.

Since deep learning is a rapidly evolving field, large amounts of the work has not yet been published, or exists only as academic papers. Part 3 of the tutorial is more exploratory than prescriptive -- we experiment with several ways of using Word2Vec rather than giving you a recipe for using the output.

To achieve these goals, we rely on an IMDB sentiment analysis data set, which has 100,000 multi-paragraph movie reviews, both positive and negative.

簡單來說就是使用 Word2Vec 對 IMDB 的電影評論進行情感分析，透過模型判斷該評論為正面或負面。

雖然網頁上是要求使用 Word2Vec，但我是使用它的延伸應用 Doc2Vec models.doc2vec – Doc2vec paragraph embeddings

Data Set

訓練資料和測試資料各 25000筆，另包含沒有標註情感分數的額外訓練資料集 50000筆

	SVC	Decision Tree	Random Forest	Logistic Regression	KNN
Accuracy	0.837	0.639	0.808	0.833	0.709
MSE	0.159	0.0	0.0	0.163	0.202
MAE	0.159	0.0	0.0	0.163	0.202

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
images		images
README.md		README.md
classification.py		classification.py
doc2vec.model		doc2vec.model
doc2vec.model.trainables.syn1neg.npy		doc2vec.model.trainables.syn1neg.npy
doc2vec.model.wv.vectors.npy		doc2vec.model.wv.vectors.npy
doc2vec.py		doc2vec.py
dtree_model.pkl		dtree_model.pkl
knn_model.pkl		knn_model.pkl
labeledTrainData.tsv		labeledTrainData.tsv
logistic_model.pkl		logistic_model.pkl
pre-processing.py		pre-processing.py
sampleSubmission.csv		sampleSubmission.csv
svc_model.pkl		svc_model.pkl
testData.tsv		testData.tsv
unlabeledTrainData.tsv		unlabeledTrainData.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Use Google's Doc2Vec for movie reviews

Kaggle Challenge: Bag of words meets bags of popcorn

Tutorial Overview

Data Set

About

Releases

Packages

Languages

a10423006/Doc2Vec-movie-reviews

Folders and files

Latest commit

History

Repository files navigation

Use Google's Doc2Vec for movie reviews

Kaggle Challenge: Bag of words meets bags of popcorn

Tutorial Overview

Data Set

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages