Skip to content

devinit/iati-covid19-first-prototype

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
lib
 
 
 
 
 
 
 
 
 
 
 
 
 
 

COVID-19 Data

Note that this data is not used for the prod visual

The scraper and data for the prod visual can be found here: https://github.com/OCHA-DAP/hdx-scraper-iati-viz

This scraper extracts data from IATI Datastore nightly and reprocesses it:

  • selects certain fields and exports them in a nice clean JSON format
  • converts financial data to USD

The scripts in this repository automatically generate fresh data every day (using Github Actions), which can be seen in (and downloaded from) the gh-pages branch.

For more detail on how the data was processed, see the data notes.

Installing

git clone git@github.com:OCHA-DAP/covid19-data.git
virtualenv ./pyenv
source ./pyenv/bin/activate
pip install -r requirements.txt

Running

Download and reprocess data using the following script. Add --help to see optional arguments.

python run.py

Running with cached rates (saves downloading a new file)

python run.py --cached-rates

Running and deploying to gh-pages

python run.py --deploy

Overview

The code in this repository runs at 1500 UTC every day, using Github Actions. Files are pushed to the gh-pages branch and made available through Github Pages. The data is then visualised using software stored in the OCHA-DAP/viz-covid19-visualisation repository, and also served from Github Pages.

Data sources

Data is downloaded from a few places:

  • IATI data: D-Portal
  • FTS data: UNOCHA FTS
  • Codelists: CodeforIATI
  • Exchange Rates: CodeforIATI

These downloads are now reasonably stable, though a few things to be aware of:

  • IATI data: D-Portal fairly frequently fails to respond with relevant data. This appears to be more reliable now that we request fewer activities at once, and we run at 1500 rather than early in the morning (when D-Portal is itself collecting and updating source data). One option could be to consider switching to the new IATI Datastore (though see discussion below).
  • FTS data: FTS now seems to be pretty stable; occasionally the FTS API is unavailable
  • Codelists: these endpoints are very stable now as flat files are hosted on Github Pages. These files are generally much faster to download than the official IATI codelists, and they are also often more up to date.
  • Exchange rates: this file is also now very stable, again as a single compiled flat file is hosted on Github Pages; previously this data was hosted only on morph.io, but there have been a lot of stability issues recently. There don't appear to be any significant problems here any more.

Process

The basic process is as follows:

  • run.py:
    • either download or load in a list of exchange rates
    • download data from D-Portal (get_activities_from_urls())
    • filter out activities that have certain problems (activities_filter())
    • filter out activities that don't conform to the IATI COVID-19 Publishing Guidance
    • extract relevant data from each activity (process_activity())
    • write XML data for all activities (write_xml_files())
      • up to 3000 activities per file, labelled activities-N.xml where N is the page)
    • write XML data for each reporting organisation
    • write out the list of sectors and countries that are used in the data (so that in the user interface we don't display countries or sectors with no activities)
    • download and process FTS data
    • run traceability.py (see below)
    • remove activities.xml (it is used by traceability.py, but it is a very large file and exceeds Github usage limits)
  • traceability.py:
    • read in list of exchange rates
    • download TransactionType codelist
    • read in the activities XML (from activities.xml)
    • identify which activities contain explicit COVID-19 transactions
    • extract relevant data from each transaction (make_transaction())
    • export transactions to Excel
    • disaggregate transactions by sector and country (make_sector_country_transactions_data())
    • export disaggregated data to JSON and Excel
    • make grouped traceability data for Sankey diagram
    • export grouped traceability data to JSON and Excel

About

Extracting COVID-19 data from D-Portal and reprocesses it nightly (not used for prod visual)

Resources

License

Stars

Watchers

Forks