Skip to content
This repository has been archived by the owner on Aug 13, 2020. It is now read-only.

codeforIATI/parallel-registry-refresher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Registry Refresher

This is a task-based IATI data downloader. All downloaded data gets pushed to Amazon S3.

Setup

You can run this wherever, but here are some instructions to run on Heroku. It assumes you have a Heroku account, and have the heroku CLI installed.

  • Create a new app:
    heroku create [app name]
    
  • Create a Redis instance (for managing the task queue)
    heroku addons:create redistogo:nano
    
  • add some AWS environment variables (for hosting the files)
    heroku config:set AWS_ACCESS_KEY_ID=[AWS access key]
    heroku config:set AWS_SECRET_ACCESS_KEY=[AWS secret key]
    heroku config:set S3_BUCKET_NAME=[name of S3 bucket]
    
  • push the app to heroku
    git push heroku master
    
  • Then scale the number of workers. The more you do, the faster it is (but the more it costs!)
    heroku scale worker=[number of workers]
    

Running

You can start a registry refresh using:

heroku run python run.py enqueue

You could add this as a cron job, or make it restart on completion so it crawls continuously.

Status

You can check how many tasks are remaining in the queue using:

heroku run python run.py status

Requeuing failed jobs

You can requeue failed jobs using:

heroku run rq requeue --queue default -u \$REDISTOGO_URL --all

About

🔀 Download data faster

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages