Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Real-Time, Twitter sentiment analyzer engine
Python CSS JavaScript
Branch: master


Streamcrab is a realtime twitter sentiment analyzer

This is the second version of the tool, and it is rewritten completely from previous version (still available in legacy branch)


Changes from previous version

  • Supports MaxEnt and Bayes classifiers (defaults to MaxEnt)
  • Simplified tweets collection (see Collecting raw Tweets)
  • Simplified trainer (see Train classifier)
  • Build in HTTP Server & frontend based on gevent and Flask
  • Unittests tested
  • Utilization of multi-core systems
  • Scalable (in theory :)


  • python 2.7
  • python2.7-dev
  • mongodb server

Debian like systems:

apt-get install python2.7 python2.7-dev mongodb-server


Checkout latest streamcrab branch from github

git clone ./streamcrab
cd streamcrab


copy smm/ to smm/ and edit smm/ according to your needs

cp smm/ smm/
nano smm/

Installation & Setup

Download and install required libs and data

python develop
python toolbox/


Run unittests

python -m unittest discover tests

Collecting raw Tweets

The base of data training is an assumption that tweets with happy emoticons :) are positive and tweets with sad :( emoticons have negative sentiment polarity

Wether this assumption is correct or not is outside the scope of this document.

Collect 2000 'happy' tweets

python toolbox/ happy 2000

Collect 2000 'sad' tweets

python toolbox/ sad 2000

for more options see

python toolbox/ --help

Train classifier

Create and save new classifier trained from collected tweets

python toolbox/ maxEntTestCorpus 2000

for more options see

python toolbox/ --help

Start server stack

open 3 shells and type in each:


open browser on

Show stats

Show detailed info on collected Tweets and saved classifiers

python toolbox/

Its worth mention that Training data size is the size of the trained classifier after it has been serialized (pickled) whit protocol=1 actual Memory Usage may vary...

Interactive shell

You can directly interact with the trained classifier and get verbose output on how the score is calculated replace maxEntTestCorpus with desired classifier name see Show stats to display available classifiers

python toolbox/ maxEntTestCorpus

You should see:

exit: ctrl+c

Loaded maxEntTestCorpus

Type something and hit enter:

Classify: today is a bad day for this nation

Classification: negative with 53.29%

Feature                                          negativ positiv
bad==1 (1)                                         0.074
today==1 (1)                                       0.027
day==1 (1)                                         0.008
bad==1 (1)                                                -0.178
nation==1 (1)                                              0.139
today==1 (1)                                              -0.035
day==1 (1)                                                -0.007
TOTAL:                                             0.109  -0.081
PROBS:                                             0.533   0.467

for more options see

python toolbox/ --help

Training and testing results

see :

Production & deployment

Run everything behind nginx >= 1.3.13, automate processes management with supervisord.

Since nginx 1.3.13 supports websockets, so you should probably use latest stable version.

This is only one way of many to deploy the app. in folder ex.conf there are sample config files for nginx and supervisord.

Links, Sources etc

Something went wrong with that request. Please try again.