Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Template Usage

Replace scrapername everywhere with your scraper's name eg. worldbank Replace ScraperName everywhere with your scraper's name eg. World Bank Look for xxx and ... and replace add text accordingly.


Scrapers can be tested locally in a Python virtualenv which can be set up by running ./ and run with ./


Scrapers can either be installed on GitHub Actions or Jenkins. In either case, they can be set up to run on a schedule. For scripts that run for more than 6 hours and/or for which resuming failed runs in important, Jenkins must be used.

GitHub Actions

Uses the .github/workflows/python-package.yml file. Set up the environment variables to be passed to your script in that file mapping them from secrets configured in the GitHub repository's UI. You can also set up the cron schedule. (The files listed under Jenkins below are not used.)


docker-compose.yml, docker-requirements.yml, Dockerfile,, and run_env are only needed for Jenkins. (.github/workflows/python-package.yml is not used.)

Best Practices

It is highly recommended to have a that details what your script does, from where it obtains data, how to run it, what resources it uses (eg. CPU, memory, disk, network) and what environment variables are needed. An example is shown below the heading "Text for Scraper's below"

It is a good idea to set up Travis for running tests on check in (using the Travis website's simple UI) and to set up coveralls in the website UI to ensure that tests cover the majority of your code.

It is very helpful to comment your code appropriately especially things that you think might confuse someone looking at your code for the first time.


For a full scraper following this template for GitHub Actions see: OAD

For full scrapers following this template for Jenkins see: ACLED, FTS, WHO, World Bank, WorldPop

For a scraper that also creates datasets disaggregated by indicator (not just country) and reads metadata from a Google spreadsheet exported as csv, see: IDMC

Text for Scraper's below

Collector for ScraperName's Datasets

Build Status Coverage Status

Collector designed to collect ScraperName datasets from the ScraperName website and to automatically register datasets on the Humanitarian Data Exchange project.



For the script to run, you will need to either pass in your HDX API key as a parameter or have a file called .hdx_configuration.yml in your home directory containing your HDX key eg.

hdx_read_only: false
hdx_site: test

You will also need to pass in your user agent as a parameter or pass a parameter user_agent_config_yaml specifying where your user agent file is located. It should be of the form:

user_agent: MY_USER_AGENT

If you have many user agents, you can create a file of this form, put its location in user_agent_config_yaml and specify the lookup in user_agent_lookup:

    user_agent: MY_USER_AGENT
    user_agent: MY_USER_AGENT2

Note for HDX scrapers: there is a universal .useragents.yml file you should use.

Alternatively, you can set up environment variables eg. for production runs: USER_AGENT, HDX_KEY, HDX_SITE, BASIC_AUTH, EXTRA_PARAMS, TEMP_DIR, LOG_FILE_ONLY


No description, website, or topics provided.




No releases published


No packages published
You can’t perform that action at this time.