Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Pull external codes automatically including withdrawn (historic) codes #51
The work done here means that the Country, Currency and FileFormat codelists can now be pulled from source programmatically.
Where IATI derives codelists from external sources we are aiming to pull those codes automatically and directly from their source. However, when those external sources delete or remove values we need to consider how we handle that. (We also need to consider the cases where IATI has agreed to add additional values not in the source e.g. XK=Kosovo on the Country codelist)
We are using the term 'withdrawn' to deal with codes that have been removed from the current source lists. We may also refer to these as historic codes.
We need to make these withdrawn, or historic, values available to data users that wish to report older data. Adding withdrawn codes is important, as it ensures that codes currently in use (and in historical data) are valid against the codelist, even though they may have been subsequently withdrawn.
In the case of the ISO Country and ISO currency sources, they have clearly defined ways of dealing with withdrawn values. Both source lists maintain their own list of withdrawn values, which we plan to now import. As a result, this pull request adds a large number of new (withdrawn) codes to these lists. Consequently, this pull request increases the country codelist from 251 to 308 codes in total; and the Currency codelist from 167 to 300 codes in total.
When we add a withdrawn value to the IATI codelists we flag it by adding a
As a result of adding withdrawn codes, this pull request doesn't remove any codes from the Country and Currency lists (but some existing codes may be listed as withdrawn if that is true in the ISO source lists).
Impact for codelist users
Once this change is accepted anyone parsing the XML codelists will, by default, see all the entries in their results - e.g. a drop down selection list will contain all entries. To exclude withdrawn entries you will need to specifically request 'not withdrawn' values.
As the codelist API currently stands, consumers of the JSON and CSV, CLv1 XML, CLv2 XML versions of codelists will not yet see the withdrawn attribute and therefore be unable to tell which are current values. However, we plan to address this before the change goes live: IATI/IATI-Codelists#79
Altering the codelist XML Schema
The withdrawn attribute should also be added to the xsd in the main codelists repository, see IATI/IATI-Codelists#78. The withdrawn attribute contains whatever information the source has about when the code withdrawn, so the format is not specified in the schema.
Handling IATI specific codes
Any IATI specific codes should are maintained by adding them to the codelist template file. e.g.
Country codelist note
This has had the withdrawn two letter codes added (which are only guaranteed to not be reused for 50 years), and the new 4 letter codes that these countries are assigned when withdrawn (see https://en.wikipedia.org/wiki/ISO_3166-3 for more information).
Renames on the IANA Media Types list (used for FileFormat)
These are partly because IANA does not treat their codes as case sensitive. In order to maintain faithfullness to the source list, and because no-one is using these codes, I suggest that in this case we make these renames on the FileFormat list without maintaining the old codes as withdrawn.
Relevant GitHub issues
This pull request resolves the following GitHub issues:
This was referenced
Feb 4, 2015
This was referenced
Feb 24, 2015
Also bumping (because automating the codelist update process is relevant to my interests!)
Finding a way to track withdrawn codes from DAC codelists would also be great. But that shouldn’t be a blocker for this PR! (so I have (re)moved my earlier comments)
@dalepotter @wendyrogers What is the current plan for automating updates to replicated Codelists? With a couple of fairly minor modifications, this could allow a couple of Codelists to be updated, though if the plan is to automate retrieval of larger number of Codelists then building a separate tool on top of
I would support automation of codelist updates for codelists that we consider to have robust governance and management processes, alongside machine-readable access. This would include the Country and Currency codelists (managed by the ISO). We should better understand the processes that lead to new versions of the FileFormat list (managed by the IANA).
Alongside this, we must have good test coverage for these automated processes. If they are run headless, there should be good logging and notification of actions taken. I've added this scoping to the list of weekly maintenance jobs, so that we can determine the roadmap to implementation.
Just to update on this issue - we are in discussions with the OECD regarding the publication of machine-readable codelists and are meeting to explore this further in early May. From the outcomes of these conversations, we will have a better view on how to take this work forward and we will update here accordingly.
The approach taken in this PR for withdrawn codes really only works for the Currency list, because the source data helpfully includes withdrawn data. It’s a good starting point, though!
A more general approach (i.e. that doesn’t rely on the source data tracking withdrawals) would be something like:
So I think the only third party datasets that don’t have openly licensed & machine readable versions, then, are LocationType, LocationType (category) & PolicySignificance. It’s trivial to scrape the first two – in fact here’s a quick scraper to demonstrate. Output here.
The source URL for PolicySignificance is broken, and I’m afraid I wasn’t able to find a working link.
added a commit
this pull request
Aug 29, 2017
referenced this pull request
Aug 30, 2017
There are several threads on IATI discuss that talk about the issue of keeping non-embedded codelists up-to-date, and I think it would be great to resolve this. Indeed, much of the work is already done thanks to @Bjwebb, @datasets and #172.