Tweedr: measuring disaster damage with tweets
Tweedr makes information from social media more accessible to providers of disaster relief. There are two aspects to the application:
- An API / pipeline for applying machine learning techniques and natural language processing tools to analyze social media produced in response to a disaster.
- A user interface for manipulating, filtering, and aggregating this enhanced social media data.
Problem, solution, data
- For an extensive discussion of the problem and proposed solution, visit our wiki.
- Get start using the tweedr api, check out our tutorial website.
doc/contains various presentations, along with accompanying slides and poster.
doc/report/contains a more technical and extensive write-up of this project. In progress.
ext/is created by a complete install; external data sources and libraries are downloaded to this folder.
templates/contain templates (both server-side and client-side) used by the web app.
tests/contain unittest-like tests. Use
python setup.py testto run these.
tools/holds tools to aid development (currently, only a test-running git-hook).
tweedr/contains the main Python app and functions as a Python package (e.g.,
git clone https://github.com/dssg/tweedr.git cd tweedr python setup.py develop download_ext
If you want to jump straight to development, see the Contributing wiki page.
Tweedr uses a number of external libraries and resources. This is the dependency tree:
- Tweedr: Primarily python, on github
liblbfgs are the only components that can't be installed directly with Python via
setuptools. Though if you have trouble installing some of the packages above, you might have better luck looking for those packages in your operating system's pacakge manager or as binaries on the projects' websites.
1. Installing libLBFGS
git clone https://github.com/chbrown/liblbfgs.git cd liblbfgs ./configure make sudo make install
2. Installing CRFsuite
Like libLBFGS, a tarball can be downloaded from the original website, though the accompanying fork on Github attempts to document the installation process and make compilation more automatic on both Linux and Mac OS X.
git clone https://github.com/chbrown/crfsuite.git cd crfsuite ./configure make sudo make install
That installs the library, but not the Python wrapper, which takes a few more steps:
cd swig/python python setup.py build_ext sudo python setup.py install_lib
To test whether it installed correctly, you can run the following at your terminal, which should print out the current CRFsuite version:
python -c 'import crfsuite; print crfsuite.version()' > 0.12.2
The github repository documents a few more options that might come in handy if the process above does not work for your operating system.
3. Configuring environment variables
Tweedr also connects to a number of remote resources when running live; see [[Environment]] for instructions on setting those up.
4. Installing Tweedr
liblbfgs, everything else should be installable via setuptools / distutils:
git clone https://github.com/dssg/tweedr.git cd tweedr python setup.py install
And then to download external data requirements:
python setup.py download_ext
download_ext command will download external data, which currently includes the following packages / sources:
You may get an error, "IOError: cmu.arktweetnlp.RunTagger error", if you try to use some parts of Tweedr before installing this component.
5. Instantiating the database
While we are not currently able to release our data, you can easily recreate the structure of our database by running the following command:
This simply uses SQLAlchemy to un-reflect the database, by running
At this point, you should have tools like
tweedr-pipeline on your
PATH, and you can run each of those with the
--help flag to view the usage messages.
See the API section of the wiki for a description of some of the fields that
If your installation is still missing packages, see the manually installing page of the wiki.
Contributing to the project
Copyright © 2013 The University of Chicago. MIT Licensed.