Skip to content

dabbabi-zayani/Twitter-Sentiment-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 

Repository files navigation

A twitter sentiment classifier based on Support Vector Machines and K nearest neighbors algorithms

Overall decription

As undestood from the title, this repository contains sources codes (src folder) , datasets (data folder) and useful resources for twitter sentiment analysis (resources folder).
The training dataset is split into 3 files containing a processed version of tweets in the three classes : positive (data/used/positive1.csv), negative (data/used/negative1.csv) and neutral (data/used/neutral1.csv)

The training dataset is collected SemEval challenge ( http://alt.qcri.org/semeval2014/task9/index.php?id=data-and-tools ), STS gold(http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip) and Sanders dataset (http://www.sananalytics.com/lab/twitter-sentiment) . The testing dataset is from STS-Gold (http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip)

The test datasets are STS_Test (data/test_dataset.csv) and 100 3cixty reviews in (data/3cixty/3cixty_test_dataset.csv)

In the src folder :

  1. svm.py : svm classifier
  2. knn.py : knn classifier
  3. hybrid.py : 2 step classification : knn for objectivity/subjectivity test, svm for polarity test
    Emoticons dictionnary, Stop Words list, SentiWordnet 3.0.1, AFINN , and a slang dictionnary are in the resources folder.

Requirements -------

The classifier works for python 2.6 and 2.7
To use these algorithms you should install : sklearn 0.14 version (http://scikit-learn.org/dev/index.html) , numpy (http://www.numpy.org/), nltk 3 with full packages using nltk.download() instruction in python

Running the classifiers

Runnig any classifier of the mentioned above is done as by executing the classifier.py script as follow :
Usage : python predictor.py classifier_choice
Available classifiers are : svm, knn or hybrid

N.B : The class labels are real values and are as follow : positive : 4.0, negative : 0.0 and neutral 2.0

Thank you .

About

A twitter sentiment classifier based on KNN and SVM algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages