Skip to content

BastinRobin/streamcrab

 
 

Repository files navigation

Streamcrab

Streamcrab is a realtime twitter sentiment analyzer

This is the second version of the tool, and it is rewritten completely from previous version (still available in legacy branch)

Demo: http://www.streamcrab.com

Changes from previous version

  • Supports MaxEnt and Bayes classifiers (defaults to MaxEnt)
  • Simplified tweets collection (see Collecting raw Tweets)
  • Simplified trainer (see Train classifier)
  • Build in HTTP Server & frontend based on gevent and Flask
  • Unittests tested
  • Utilization of multi-core systems
  • Scalable (in theory :)

Requirements

  • python 2.7
  • python2.7-dev
  • mongodb server

Debian like systems:

apt-get install python2.7 python2.7-dev mongodb-server

Checkout

Checkout latest streamcrab branch from github

git clone https://github.com/cyhex/streamcrab.git ./streamcrab
cd streamcrab

Configure

copy smm/config.default.py to smm/config.py and edit smm/config.py according to your needs

cp smm/config.default.py smm/config.py
nano smm/config.py

Installation & Setup

Download and install required libs and data

python setup.py develop
python toolbox/setup-app.py

Testing

Run unittests

python -m unittest discover tests

Collecting raw Tweets

The base of data training is an assumption that tweets with happy emoticons :) are positive and tweets with sad :( emoticons have negative sentiment polarity

Wether this assumption is correct or not is outside the scope of this document.

Collect 2000 'happy' tweets

python toolbox/collect-tweets.py happy 2000

Collect 2000 'sad' tweets

python toolbox/collect-tweets.py sad 2000

for more options see

python toolbox/collect-classifier.py --help

Train classifier

Create and save new classifier trained from collected tweets

python toolbox/train-classifier.py maxEntTestCorpus 2000

for more options see

python toolbox/train-classifier.py --help

Start server stack

open 3 shells and type in each:

python start-collector.py
python start-classifier.py
python start-server.py

open browser on http://127.0.0.1:5000

Show stats

Show detailed info on collected Tweets and saved classifiers

python toolbox/show-classifiers.py

Its worth mention that Training data size is the size of the trained classifier after it has been serialized (pickled) whit protocol=1 actual Memory Usage may vary...

Interactive shell

You can directly interact with the trained classifier and get verbose output on how the score is calculated replace maxEntTestCorpus with desired classifier name see Show stats to display available classifiers

python toolbox/shell-classifier.py maxEntTestCorpus

You should see:

exit: ctrl+c

Loaded maxEntTestCorpus
Classify:

Type something and hit enter:

Classify: today is a bad day for this nation

Classification: negative with 53.29%

Feature                                          negativ positiv
----------------------------------------------------------------
bad==1 (1)                                         0.074
today==1 (1)                                       0.027
day==1 (1)                                         0.008
bad==1 (1)                                                -0.178
nation==1 (1)                                              0.139
today==1 (1)                                              -0.035
day==1 (1)                                                -0.007
-----------------------------------------------------------------
TOTAL:                                             0.109  -0.081
PROBS:                                             0.533   0.467

for more options see

python toolbox/shell-classifier.py --help

Training and testing results

see : https://github.com/cyhex/streamcrab/blob/master/docs/acurracy_tests.md

Production & deployment

Run everything behind nginx >= 1.3.13, automate processes management with supervisord.

Since nginx 1.3.13 supports websockets, so you should probably use latest stable version.

This is only one way of many to deploy the app. in folder ex.conf there are sample config files for nginx and supervisord.

Links, Sources etc

About

Real-Time, Twitter sentiment analyzer engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 59.7%
  • CSS 19.7%
  • JavaScript 11.0%
  • HTML 9.6%