Skip to content

D2KLab/sentinel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SentiNEL: Sentiment Analysis from Tweets

SentiNEL system is developed for sentiment analysis of tweets based on SemEval2015 Task10-Subtask A: Contextual Polarity Disambiguation. The purpose of SentiNEL is that given a message containing a marked instance of a word or a phrase, determines whether that instance is positive, negative or neutral in that context. SentiNEL is inspired by the IOA system. The main differences are that SentiNEL extracts more features (e.g. Char 3, 4, 5 grams, Hashtag, longer Word2Vec dimension, more lexicons etc.) for training. Besides, SentiNEL trains L2-regularized logistic regression SVM classifier with C value 0.5. The code is based on Webis system. However, Webis is a system only for SemEval2015 Task10-Sub Task B (Message-level task). We modify the code and adapt it to term-level. The system is scored by computing F1-score for predicting positive/negative phrases. Comparing to IOA system, SentiNEL improves the F1-score from 83.90 to 88.15 on Tweet2013-test, from 84.18 to 84.73 on Sms2013-test.

Key words: Sentiment analysis, Machine Learning, Data Mining, NLP

Architectural Overview

SentiNEL consists of four steps:

  • Pre-train Word2Vec module: it trains the Word2Vec vectors from all the words which appear at least 3 times in the dataset
  • Extraction of features: it extracts features from the training dataset
  • Train: it trains the SVM classifier with extracted features
  • Evaluation: it evaluates the trained SVM classifier and tests it with testing dataset

image

Corpus description

The corpus is collected from SemEval-2015 Task 10 Dataset. The following table shows the account of dataset we collected.

Corpus Positive Negative Neutral Total Tweets
Tweet2013-train 4484(62.5%) 2329(32.5%) 356(5%) 7169
Tweet2013-dev 506(62.6%) 326(34.0%) 40(3.4%) 872
Tweet2013-test 2132(62.6%) 1156(34.0%) 116(3.4%) 3404
Sms2013-test 1071(45.9%) 1103(47.3%) 159(6,8%) 2333
Tweet2014-test 3568(66.5%) 1606(29.9%) 190(3.5%) 5364
Sms2014-test 710(45.3%) 747(46.7%) 111(7.1%) 1568

Requirements

  • Java 7+
  • Maven 3+

Setting Up

git clone https://github.com/MultimediaSemantics/sentinel	
mvn clean
mvn compile

Train

mvn exec:java -Dexec.args="train train_file [save_features_file]"
-train					set 	train mode
-train_file 			set 	the input file for training
-save_features_file		set 	the file to save trained features, by default SentiNEL saves the extracted features in arff/Trained-Features.arff

example

mvn exec:java -Dexec.args="train train"

Extract the features from training dataset: resources/tweets/train.txt, and save the extracted features in arff/Trained-Features.arff

mvn exec:java -Dexec.args="train train model1"

Extract the features from training dataset: resources/tweets/train.txt, and save the extracted features in arff/Trained-Features-model1.arff

Evaluation

mvn exec:java -Dexec.args="eval test_file [saved_features_file]"
-eval					set 	test mode
-test_file 				set 	the input file for testing
-saved_features_file	set 	the file contains trained features, by default SentiNEL trains SVM classifier with arff/Trained-Features.arff

example

mvn exec:java -Dexec.args="eval Tweet2013-test"

Train SVM classifier with the extracted features: arff/Trained-Features.arff, then evaluate it with testing dataset: resources/tweets/Tweet2013-test.txt"

mvn exec:java -Dexec.args="eval Sms2013-test Trained-Features-model1"

Train SVM classifier with the extracted features: arff/Trained-Features-model1.arff, then evaluate it with testing dataset: resources/tweets/Sms2013-test.txt"

Output

"I drove a Lincoln and it's a truly dream"
Lincoln -> positive

The output of SentiNEL locates in output/ folder. result.txt file contains the sentiment prediction results, and error_analysis.txt file contains the wrong sentiment prediction results.

Team

  • Yonghui Feng
  • Ahmed Abdelli
  • Giuseppe Rizzo
  • Raphael Troncy

About

Named Entity Sentiment Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •