Skip to content


Subversion checkout URL

You can clone with
Download ZIP
A machine learning API to analyze tweets during disasters.
JavaScript Python PHP Other
Branch: master
Failed to load latest commit information.
static Add CrisisTracker screenshot to use in wiki
templates Add handlebars.js, a couple other handy clientside js libs, and impro…
web added more to README in extraction-tool directory
.travis.yml Add doctests to travis ci and a few other packages that could be inst…
LICENSE changed readme title
package.json Refactor feature extractors into submodules, reorganize CLI calls, an…

Tweedr: measuring disaster damage with tweets

Tweedr makes information from social media more accessible to providers of disaster relief. There are two aspects to the application:

  1. An API / pipeline for applying machine learning techniques and natural language processing tools to analyze social media produced in response to a disaster.
  2. A user interface for manipulating, filtering, and aggregating this enhanced social media data.

Tweedr is a Data Science for Social Good project, through a partnership with the Qatar Computational Research Institute.

Problem, solution, data

web app screenshot

Project layout

  • doc/ contains various presentations, along with accompanying slides and poster.
    • doc/report/ contains a more technical and extensive write-up of this project. In progress.
  • ext/ is created by a complete install; external data sources and libraries are downloaded to this folder.
  • static/ contains static (non-Javascript) files used by the web app.
  • templates/ contain templates (both server-side and client-side) used by the web app.
  • tests/ contain unittest-like tests. Use python test to run these.
  • tools/ holds tools to aid development (currently, only a test-running git-hook).
  • tweedr/ contains the main Python app and functions as a Python package (e.g., import tweedr).

Installation guide

git clone
cd tweedr
python develop download_ext

If you want to jump straight to development, see the Contributing wiki page.


Tweedr uses a number of external libraries and resources. This is the dependency tree:

  • Tweedr: Primarily python, on github
    • crfsuite: C/C++, from source
    • scikit-learn: Python, from PyPI
      • numpy: Python, with C/C++ (blas/lapack), Fortran links, from PyPI or package manager
      • scipy: Python, with C/C++, from PyPI or package manager
    • TweetNLP: Java, from jar
    • PyPer: Python, with R, from PyPI

crfsuite and liblbfgs are the only components that can't be installed directly with Python via setuptools. Though if you have trouble installing some of the packages above, you might have better luck looking for those packages in your operating system's pacakge manager or as binaries on the projects' websites.

Installation steps

1. Installing libLBFGS

The source code can be downloaded from the maintainer's webpage, though this Github fork (and below) attempts to simplify the install process.

git clone
cd liblbfgs
sudo make install

2. Installing CRFsuite

Like libLBFGS, a tarball can be downloaded from the original website, though the accompanying fork on Github attempts to document the installation process and make compilation more automatic on both Linux and Mac OS X.

git clone
cd crfsuite
sudo make install

That installs the library, but not the Python wrapper, which takes a few more steps:

cd swig/python
python build_ext
sudo python install_lib

To test whether it installed correctly, you can run the following at your terminal, which should print out the current CRFsuite version:

python -c 'import crfsuite; print crfsuite.version()'
> 0.12.2

The github repository documents a few more options that might come in handy if the process above does not work for your operating system.

3. Configuring environment variables

Tweedr also connects to a number of remote resources when running live; see [[Environment]] for instructions on setting those up.

4. Installing Tweedr

After installing crfsuite and liblbfgs, everything else should be installable via setuptools / distutils:

git clone
cd tweedr
python install

And then to download external data requirements:

python download_ext

The download_ext command will download external data, which currently includes the following packages / sources:

You may get an error, "IOError: cmu.arktweetnlp.RunTagger error", if you try to use some parts of Tweedr before installing this component.

5. Instantiating the database

While we are not currently able to release our data, you can easily recreate the structure of our database by running the following command:

tweedr-database create

This simply uses SQLAlchemy to un-reflect the database, by running metadata.create_all().

Running Tweedr

At this point, you should have tools like tweedr-ui and tweedr-pipeline on your PATH, and you can run each of those with the --help flag to view the usage messages.

See the API section of the wiki for a description of some of the fields that tweedr-pipeline adds.


If your installation is still missing packages, see the manually installing page of the wiki.



Contributing to the project

Want to get in touch? Found a bug? Open up a new issue or email us at


Copyright © 2013 The University of Chicago. MIT Licensed.

Something went wrong with that request. Please try again.