SentiNEL: Sentiment Analysis from Tweets

SentiNEL system is developed for sentiment analysis of tweets based on SemEval2015 Task10-Subtask A: Contextual Polarity Disambiguation. The purpose of SentiNEL is that given a message containing a marked instance of a word or a phrase, determines whether that instance is positive, negative or neutral in that context. SentiNEL is inspired by the IOA system. The main differences are that SentiNEL extracts more features (e.g. Char 3, 4, 5 grams, Hashtag, longer Word2Vec dimension, more lexicons etc.) for training. Besides, SentiNEL trains L2-regularized logistic regression SVM classifier with C value 0.5. The code is based on Webis system. However, Webis is a system only for SemEval2015 Task10-Sub Task B (Message-level task). We modify the code and adapt it to term-level. The system is scored by computing F1-score for predicting positive/negative phrases. Comparing to IOA system, SentiNEL improves the F1-score from 83.90 to 88.15 on Tweet2013-test, from 84.18 to 84.73 on Sms2013-test.

Key words: Sentiment analysis, Machine Learning, Data Mining, NLP

Architectural Overview

SentiNEL consists of four steps:

Pre-train Word2Vec module: it trains the Word2Vec vectors from all the words which appear at least 3 times in the dataset
Extraction of features: it extracts features from the training dataset
Train: it trains the SVM classifier with extracted features
Evaluation: it evaluates the trained SVM classifier and tests it with testing dataset

Corpus description

The corpus is collected from SemEval-2015 Task 10 Dataset. The following table shows the account of dataset we collected.

Corpus	Positive	Negative	Neutral	Total Tweets
Tweet2013-train	4484(62.5%)	2329(32.5%)	356(5%)	7169
Tweet2013-dev	506(62.6%)	326(34.0%)	40(3.4%)	872
Tweet2013-test	2132(62.6%)	1156(34.0%)	116(3.4%)	3404
Sms2013-test	1071(45.9%)	1103(47.3%)	159(6,8%)	2333
Tweet2014-test	3568(66.5%)	1606(29.9%)	190(3.5%)	5364
Sms2014-test	710(45.3%)	747(46.7%)	111(7.1%)	1568

Requirements

Java 7+
Maven 3+

Setting Up

git clone https://github.com/MultimediaSemantics/sentinel	
mvn clean
mvn compile

Train

mvn exec:java -Dexec.args="train train_file [save_features_file]"

-train					set 	train mode
-train_file 			set 	the input file for training
-save_features_file		set 	the file to save trained features, by default SentiNEL saves the extracted features in arff/Trained-Features.arff

example

mvn exec:java -Dexec.args="train train"

Extract the features from training dataset: resources/tweets/train.txt, and save the extracted features in arff/Trained-Features.arff

mvn exec:java -Dexec.args="train train model1"

Extract the features from training dataset: resources/tweets/train.txt, and save the extracted features in arff/Trained-Features-model1.arff

Evaluation

mvn exec:java -Dexec.args="eval test_file [saved_features_file]"

-eval					set 	test mode
-test_file 				set 	the input file for testing
-saved_features_file	set 	the file contains trained features, by default SentiNEL trains SVM classifier with arff/Trained-Features.arff

example

mvn exec:java -Dexec.args="eval Tweet2013-test"

Train SVM classifier with the extracted features: arff/Trained-Features.arff, then evaluate it with testing dataset: resources/tweets/Tweet2013-test.txt"

mvn exec:java -Dexec.args="eval Sms2013-test Trained-Features-model1"

Train SVM classifier with the extracted features: arff/Trained-Features-model1.arff, then evaluate it with testing dataset: resources/tweets/Sms2013-test.txt"

Output

"I drove a Lincoln and it's a truly dream"
Lincoln -> positive

The output of SentiNEL locates in output/ folder. result.txt file contains the sentiment prediction results, and error_analysis.txt file contains the wrong sentiment prediction results.

Team

Yonghui Feng
Ahmed Abdelli
Giuseppe Rizzo
Raphael Troncy

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.settings		.settings
output		output
resources		resources
src		src
.classpath		.classpath
.gitignore		.gitignore
.project		.project
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SentiNEL: Sentiment Analysis from Tweets

Architectural Overview

Corpus description

Requirements

Setting Up

Train

example

Evaluation

example

Output

Team

About

Releases

Packages

Contributors 4

Languages

D2KLab/sentinel

Folders and files

Latest commit

History

Repository files navigation

SentiNEL: Sentiment Analysis from Tweets

Architectural Overview

Corpus description

Requirements

Setting Up

Train

example

Evaluation

example

Output

Team

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages