Skip to content

a10423006/Doc2Vec-movie-reviews

Repository files navigation

Use Google's Doc2Vec for movie reviews

Tutorial Overview

This tutorial will help you get started with Word2Vec for natural language processing. It has two goals:

Basic Natural Language Processing: Part 1 of this tutorial is intended for beginners and covers basic natural language processing techniques, which are needed for later parts of the tutorial.

Deep Learning for Text Understanding: In Parts 2 and 3, we delve into how to train a model using Word2Vec and how to use the resulting word vectors for sentiment analysis.

Since deep learning is a rapidly evolving field, large amounts of the work has not yet been published, or exists only as academic papers. Part 3 of the tutorial is more exploratory than prescriptive -- we experiment with several ways of using Word2Vec rather than giving you a recipe for using the output.

To achieve these goals, we rely on an IMDB sentiment analysis data set, which has 100,000 multi-paragraph movie reviews, both positive and negative.

簡單來說就是使用 Word2Vec 對 IMDB 的電影評論進行情感分析,透過模型判斷該評論為正面或負面。

雖然網頁上是要求使用 Word2Vec,但我是使用它的延伸應用 Doc2Vec models.doc2vec – Doc2vec paragraph embeddings

Data Set


訓練資料和測試資料各 25000筆,另包含沒有標註情感分數的額外訓練資料集 50000筆

訓練資料集欄位

訓練資料集情感分佈


SVC Decision Tree Random Forest Logistic Regression KNN
Accuracy 0.837 0.639 0.808 0.833 0.709
MSE 0.159 0.0 0.0 0.163 0.202
MAE 0.159 0.0 0.0 0.163 0.202

SVC 模型 ROC需曲線圖

About

Kaggle Challenge: Bag of words meets bags of popcorn

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages