Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
a web application for data management and analysis
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
application
conf
features
tests
vendor
.gitignore
.gitmodules
MANIFEST.in
license
readme.md
requirements.txt
run.py
setup.py

readme.md

Pipeline is a data management and analysis site built by Aquaya. View the full site at pipelinehq.org. Our partners collect information in a variety of formats and this site helps aggregate, de-duplicate, edit and view those data points. Data can be uploaded via an Excel file or automatically imported from several mobile data collection services such as DiMagi's CommCare. Once uploaded, information can be filtered, statistically analyzed, and graphed. Reports can be created and periodically sent to managers via email or as simple SMS notifications.

Requirements

Tested on Ubuntu 11.10

You'll need a locally-running mongodb instance - see their docs for info.

use virtualenv and pip to install other reqs

$ virtualenv /path/to/venv
$ . /path/to/venv/bin/activate
$ (venv) pip install -r requirements.txt

after cloning this repo, pull in the dependencies:

$ git submodule init
$ git submodule update

for report-generation we use wkhtmltopdf

$ sudo apt-get install xvfb
$ sudo apt-get install xfonts-100dpi xfonts-75dpi xfonts-scalable xfonts-cyrillic
$ sudo apt-get install fontconfig
$ wget http://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2
$ tar xvf wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2
$ sudo mv wkhtmltopdf-amd64 /usr/bin/wkhtmltopdf

async tasks such as report-generation and talking to other data services happen via rq and rq-scheduler so redis needs to be running:

$ wget http://redis.googlecode.com/files/redis-2.4.14.tar.gz
$ tar xzf redis-2.4.14.tar.gz
$ cd redis-2.4.14
$ make
$ src/redis-server /path/to/redis.conf
$ worker
$ scheduler

You should turn on daemonization in the redis config file. Note also that only one rqscheduler process can be attached to a redis instance. Changing the connected db might work but has not been tested.

Tests

we use Lettuce for some BDD tests - improving coverage is a high priority.

(venv)$ lettuce

unittesting is via nose:

(venv)$ nosetests

Running locally

setup a real config file outside of source control

$ cp conf/application_settings_sample.py /path/to/real/settings.py              

edit that new config..then point an env var at it

$ export PIPELINE_SETTINGS=/path/to/real/settings.py

activate your virtualenv and create the default admin with controllers.seed()

$ ./path/to/venv/bin/activate
(venv)$ python
>> import application
>> application.controllers.seed()

start the server

(venv)$ python run.py
 * Running on http://127.0.0.1:8000/

Usage in production

we have some example config files for supervisord, gunicorn, nginx, and fabric -- check those out

Bootstrapping a new server:

  • install virtualenv and pip
  • copy over config files for supervisord, gunicorn, nginx and this app
  • make a dir for the config files and the log files
  • point the env var to the app config file and put this in your .zshrc
  • install the requirements using pip nad requirements.txt
  • use fabric to install the app
  • reload/reread/restart supervisord until it picks up the config file (annoyingly imprecise, I know..)
  • start the server and check with supervisorctl
  • edit the nginx config file at nginx.conf and the sites-availble dir (symlink to sites-enabled); restart nginx
  • seed the db from a shell
  • update your DNS

Accessing data from other services

Only CommCare is supported at the moment. Hoping to add IVRHub and formhub soon. CommCare instructions follow:

  • invite a new web user to your CommCare project with read-only access
  • create a new "Connection" in the system and add the appropriate credentials
  • Pipeline will periodically request new data from CommCare using your domain and export tag info

CommCare can include a lot of helpful metadata about each submission. When creating a 'Connection' there is an option to include or exclude this metadata. If the metadata is excluded, any future manual file uploads will have to match the new schema.

Editing entries

Editing an entry triggers automatically the following:

  • a locked comment is automatically created to describe the changes
  • possible duplicates are checked and processed: if edits make the entry a dupe or unique, it is converted appropriately. If a unique value that had a duplicate is edited, one of the old duplicates is shifted to unique.
  • if the entry was hidden, a warning is raised telling the user to consider un-hiding this entry
  • the entry is marked as having been edited
Something went wrong with that request. Please try again.