Topical search for Twitter. See, for tokenization.
Switch branches/tags
Nothing to show
Clone or download
Latest commit 1b0b1e3 Mar 22, 2014
Failed to load latest commit information.
everything_else symlinks too Feb 25, 2011 Update Mar 22, 2014 Tokenizer improvements: Sep 30, 2010 Tokenizer improvements: Sep 30, 2010


TweetMotif is a faceted/topic/summarizing search system for Twitter, built on top of the API.

Do you just want the tokenizer?

All you need is two files:

If you use it in research, please cite:

  • Brendan O'Connor, Michel Krieger, and David Ahn. TweetMotif: Exploratory Search and Topic Summarization for Twitter. ICWSM-2010.

Latest version (Java)

The latest version, with a number of improvements, is in Java. We released a new version Sept. 2012. See the explanation and links at:

More on TweetMotif

By Brendan O'Connor, Michel Krieger, and David Ahn. Written over April-May 2009 and released April 2010.

The TweetMotif paper (inside EXAMPLES_AND_WRITING, or a copy at this link) overviews the system.

Running TweetMotif


  • Tokyo Cabinet
  • Tokyo Tyrant
  • mod_wsgi
  • Python: version 2.5 works

There are precompiled versions of the Tokyo infrastructure in platform/, for Mac OSX 10.5 and Ubuntu 8.04-ish. In the off-chance they will work for your system, uncomment the code that specifies to use them (grep platform *.py). You may also have to muck around with and ldconfig (on Linux) to get mod_wsgi, which is inside Apache, to see them.

You also need to be running Tokyo Tyrant for the query cache. This is usually inconvenient for just getting started; in which case, disable it by commenting out the lines

# the_cache = ....
# @the_cache.wrap



There is a backend and frontend. The backend talks to and does all text processing, clustering, etc. The frontend is a Django web site with normal and iPhone versions.

The backend makes extensive use of Tokyo Cabinet and Tyrant databases: for the language model, and the query cache.

Both the backend and frontend are WSGI apps. Everything is set up to run through mod_wsgi. They communicate via JSON-over-HTTP.


The backend is run through, confusingly enough, It also has a primitive frontend for development purposes there.


The frontend is Django. See djfrontend/.


TweetMotif is licensed under the Apache License 2.0:

Copyright Brendan O'Connor, Michel Krieger, and David Ahn, 2009-2010.