IATI Data Dump
A daily snapshot of all IATI data (and metadata) on the IATI registry.
Lots of IATI products do a daily pull of all IATI XML data. That involves downloading gigabytes of data, via thousands of HTTP requests. So it’s quite slow.
Downloading a single archive file is significantly faster. That’s available here!
How does this work?
There’s an app called "iati-data-dump" that runs on heroku. This app performs a nightly fetch of all IATI data, compresses it, and puts the archive onto dropbox. There’s a timestamp on dropbox that lets you know how fresh the data is.
Under the hood, data is downloaded using a modified fork of IATI-Registry-Refresher.
Once the data is downloaded, validation.sh is run, to perform validation of all datasets. This github gist is updated with the output.
Setup on heroku
You’ll need the heroku CLI.
Clone this repo:
git clone --recursive https://github.com/codeforIATI/iati-data-dump.git cd iati-data-dump
Create a new app on dropbox. This is where the downloadable file will live.
In the dropbox page for your newly created app, click the “generate access token” button
Create a new heroku app:
heroku create iati-data-dump --region eu
add the dropbox access token as an environment variable:
heroku config:set DROPBOX_TOKEN=your-token-goes-here
The app uses python and libxml2, so add both buildpacks:
heroku buildpacks:add heroku/python heroku buildpacks:add https://github.com/mcolyer/heroku-buildpack-libxml2.git
The app uses scheduler to run, and papertrail for logging. So add these addons:
heroku addons:create scheduler heroku addons:create papertrail
Push your app to heroku:
git push heroku master
Open schueduler in a browser:
heroku addons:open scheduler
On the scheduler website, create a new daily task. The task command should be: