Skip to content
📷 A daily snapshot of all IATI data (and metadata) on the IATI Registry
Python Shell Ruby
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github/ISSUE_TEMPLATE
IATI-Registry-Refresher @ 97f5de9
docs
.gitignore
.gitmodules
Aptfile
CODE_OF_CONDUCT.md
Gemfile
Gemfile.lock
LICENSE
Procfile
README.md
dropbox_upload.py
requirements.txt
run.sh
runtime.txt

README.md

IATI Data Dump

A daily snapshot of all IATI data (and metadata) on the IATI registry.

Rationale

Lots of IATI products do a daily pull of all IATI XML data. That involves downloading gigabytes of data, via thousands of HTTP requests. So it’s quite slow.

Downloading a single archive file is significantly faster. That’s available here!

How does this work?

There’s an app called "iati-data-dump" that runs on heroku. This app performs a nightly fetch of all IATI data, compresses it, and puts the archive onto dropbox. There’s a timestamp on dropbox that lets you know how fresh the data is.

Under the hood, data is downloaded using a modified fork of IATI-Registry-Refresher.

Once the data is downloaded, validation.sh is run, to perform validation of all datasets. This github gist is updated with the output.

Acknowledgements

The code relies heavily on IATI Registry Refresher, made by @caprenter and @Bjwebb.

Setup on heroku

You’ll need the heroku CLI.

  1. Clone this repo:

    git clone --recursive https://github.com/codeforIATI/iati-data-dump.git
    cd iati-data-dump
  2. Create a new app on dropbox. This is where the downloadable file will live.

  3. In the dropbox page for your newly created app, click the “generate access token” button

  4. Create a new heroku app:

    heroku create iati-data-dump --region eu
  5. add the dropbox access token as an environment variable:

    heroku config:set DROPBOX_TOKEN=your-token-goes-here
  6. The app uses python and libxml2, so add both buildpacks:

    heroku buildpacks:add heroku/python
    heroku buildpacks:add https://github.com/mcolyer/heroku-buildpack-libxml2.git
  7. The app uses scheduler to run, and papertrail for logging. So add these addons:

    heroku addons:create scheduler
    heroku addons:create papertrail
  8. Push your app to heroku:

    git push heroku master
  9. Open schueduler in a browser:

    heroku addons:open scheduler
  10. On the scheduler website, create a new daily task. The task command should be:

    ./run.sh
You can’t perform that action at this time.