Skip to content
No description or website provided.
C++ Python C JavaScript
Failed to load latest commit information.
PyML-0.7.9 final paper tex Feb 5, 2012
egpaper_final Added web version of paper Feb 5, 2012
neg Classifier works with ~65% accuracy in 3-fold validation Oct 31, 2011
neg_adj refiltered adjectives, was missing a tag before Jan 20, 2012
neg_position adding position tagged data, along with script to generate the data Jan 15, 2012
neg_tagged adding tagged data (using qtag) Jan 15, 2012
paper Merge branch 'master' of github.com:cathywu/Sentiment-Analysis Feb 6, 2012
pos Classifier works with ~65% accuracy in 3-fold validation Oct 31, 2011
pos_adj refiltered adjectives, was missing a tag before Jan 20, 2012
pos_position adding position tagged data, along with script to generate the data Jan 15, 2012
pos_tagged adding tagged data (using qtag) Jan 15, 2012
pysvmlight refactoring, ability to run svm in different configurations; + modifi… Jan 11, 2012
subjectivity Started working on majority voting Jan 17, 2012
.gitignore directory for latex report files Feb 5, 2012
COPYING Added license info Feb 6, 2012
Indexes.py built k-fold validation testing into bayes classification, and the re… Jan 15, 2012
LICENSE Added license info Feb 6, 2012
README added preprocessing for yelp data, filtering for verbs,docs for data … Jan 20, 2012
adjectives_filter.py added preprocessing for yelp data, filtering for verbs,docs for data … Jan 20, 2012
classifier.py Added binary flag to other classifiers Jan 19, 2012
data.db Added sqlite database of results from testing Feb 5, 2012
data.py Merge branch 'master' of github.com:cathywu/Sentiment-Analysis Jan 11, 2012
human.py script for generating human verification Jan 18, 2012
me.py maxent tweaking, adding presence vs frequency flag for bayes classifi… Jan 15, 2012
movie.py Merge branch 'master' of github.com:cathywu/Sentiment-Analysis Feb 5, 2012
ngrams.py Fixed underscores for tags in ngrams Jan 19, 2012
position_tagger.py removed yelp data from repo, modified position tagging script and adj… Jan 17, 2012
preprocess_yelp.py added preprocessing for yelp data, filtering for verbs,docs for data … Jan 20, 2012
svm.py refactoring, ability to run svm in different configurations; + modifi… Jan 11, 2012
svm2.py ported svm stuff from pysvmlight to pyml, running into memory problem… Jan 12, 2012
validate.py Merge branch 'master' of github.com:cathywu/Sentiment-Analysis Jan 11, 2012
verb_filter.py added preprocessing for yelp data, filtering for verbs,docs for data … Jan 20, 2012
yelp.py Added term frequency/inverse document frequency to ngrams Jan 17, 2012

README

==============================================================================
Linking to Yelp dataset (via a symlink)
==============================================================================
ln -s $HOME/Dropbox/sentiment-data/yelp/ yelp

==============================================================================
Toolkits
==============================================================================
Oliver Mason's Qtag program [http://phrasys.net/uob/om/software]

==============================================================================
For setting up Maximum Entropy Modeling Toolkit for Python and C++
==============================================================================
Main page [http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit.html]
Source [https://github.com/lzhang10/maxent]
Wonderful documentation, except for the missing Python API reference [http://homepages.inf.ed.ac.uk/lzhang10/software/maxent/manual.pdf]

DEPENDENCIES
zlib [http://www.techsww.com/tutorials/libraries/zlib/installation/installing_zlib_on_ubuntu_linux.php]
libboost [apt-get]
jam [apt-get]

Important points
* L-BFGS is the default parameter estimating method in this toolkit.

==============================================================================
Preprocess movie data
==============================================================================

Use Qtag with the "underscore" and "process all files in directory" options
$ java -jar qtag.jar

Move the POS tagged data out to its own directory, for further processing
$ mv pos/tagged/ pos_tagged
$ mv neg/tagged/ neg_tagged

Tag data with position
$ python position_tagger.py -d pos 
$ python position_tagger.py -d neg 

Filter out for only adjectives
$ python adjectives_filter.py -d neg
$ python adjectives_filter.py -d pos

Filter out for only verbs
$ python verb_filter.py -d pos
$ python verb_filter.py -d neg

==============================================================================
Preprocess Yelp data
==============================================================================

Make yelp data look like movie data in terms of formatting, and limit to 1000 
per star rating
$ python preprocess_yelp.py -d yelp/default/1star_limited
$ python preprocess_yelp.py -d yelp/default/2star_limited
$ python preprocess_yelp.py -d yelp/default/3star_limited
$ python preprocess_yelp.py -d yelp/default/4star_limited
$ python preprocess_yelp.py -d yelp/default/5star_limited

Use Qtag with the "underscore" and "process all files in directory" options
$ java -jar qtag.jar

Move the POS tagged data out to its own directory, for further processing
$ mv 1star_limited/tagged/ 1star_limited_tagged
$ mv 2star_limited/tagged/ 2star_limited_tagged
$ mv 3star_limited/tagged/ 3star_limited_tagged
$ mv 4star_limited/tagged/ 4star_limited_tagged
$ mv 5star_limited/tagged/ 5star_limited_tagged

Tag data with position
$ python position_tagger.py -d yelp/default/1star_limited
$ python position_tagger.py -d yelp/default/2star_limited
$ python position_tagger.py -d yelp/default/3star_limited
$ python position_tagger.py -d yelp/default/4star_limited
$ python position_tagger.py -d yelp/default/5star_limited

Filter out for only adjectives
$ python adjectives_filter.py -d yelp/default/1star_limited
$ python adjectives_filter.py -d yelp/default/2star_limited
$ python adjectives_filter.py -d yelp/default/3star_limited
$ python adjectives_filter.py -d yelp/default/4star_limited
$ python adjectives_filter.py -d yelp/default/5star_limited

Filter out for only verbs
$ python verb_filter.py -d yelp/default/1star_limited
$ python verb_filter.py -d yelp/default/2star_limited
$ python verb_filter.py -d yelp/default/3star_limited
$ python verb_filter.py -d yelp/default/4star_limited
$ python verb_filter.py -d yelp/default/5star_limited
Something went wrong with that request. Please try again.