GitHub - OCHA-DAP/hdx-scraper-acled

Collector for ACLED's Datasets

ARCHIVED - ACLED now run a script they developed themselves.

This script connects to the ACLED API and extracts data country by country creating a dataset per country in HDX. The scraper takes around 20 minutes to run. It makes in the order of 200 reads from ACLED and 1000 read/writes (API calls) to HDX in total. It does not create temporary files as it puts urls into HDX. It is run when ACLED make changes (not in their data but for example in their API), in practice this is in the order of once or twice a year.

Usage

python run.py

For the script to run, you will need to have a file called .hdx_configuration.yml in your home directory containing your HDX key eg.

hdx_key: "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
hdx_read_only: false
hdx_site: prod

You will also need to supply the universal .useragents.yml file in your home directory as specified in the parameter user_agent_config_yaml passed to facade in run.py. The collector reads the key hdx-scraper-acled as specified in the parameter user_agent_lookup.

Alternatively, you can set up environment variables: USER_AGENT, HDX_KEY, HDX_SITE, EXTRA_PARAMS, TEMP_DIR, LOG_FILE_ONLY

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
.github/workflows		.github/workflows
config		config
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
acled.py		acled.py
docker-compose.yml		docker-compose.yml
docker-requirements.txt		docker-requirements.txt
requirements.txt		requirements.txt
run-dev.sh		run-dev.sh
run-once-dev.sh		run-once-dev.sh
run.py		run.py
run.sh		run.sh
run_env		run_env
setup.sh		setup.sh
test-requirements.txt		test-requirements.txt

License

OCHA-DAP/hdx-scraper-acled

Folders and files

Latest commit

History

Repository files navigation

Collector for ACLED's Datasets

Usage

About

Resources

License

Stars

Watchers

Forks

Languages