New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

January 2019 DAC codelist updates #283

Open
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@andylolz
Copy link
Contributor

andylolz commented Jan 24, 2019

These updates are all from the DAC XML source, available here:

https://webfs.oecd.org/crs-iati-xml/Lookup/

This replaces #249.

January 2019 DAC codelist updates
These updates are all from the DAC XML source, available here:

https://webfs.oecd.org/crs-iati-xml/Lookup/
@andylolz

This comment has been minimized.

Copy link
Contributor Author

andylolz commented Jan 24, 2019

@samuele-mattiuzzo

This comment has been minimized.

Copy link
Contributor

samuele-mattiuzzo commented Jan 25, 2019

@andylolz for the BAs benefit, would you mind providing a diff between your import and the original XML file (if you have something like that available that is)? Do you use the script you suggested we'd use in another PR?

Thank you!

@andylolz

This comment has been minimized.

Copy link
Contributor Author

andylolz commented Jan 25, 2019

Do you use the script you suggested we'd use in another PR?

Nope – I did this manually :) The script in #172 processes the DAC Excel file (well, it processes some CSV on datahub.io, but that comes from the DAC Excel file). @bill-anderson appears to suggest the Excel file should not be used ("more sustainable solution" etc) so this PR uses the XML instead.

would you mind providing a diff between your import and the original XML file (if you have something like that available that is)?

A diff is maybe tricky because the DAC XML file (available here) is just one big file. But I can explain the steps I went through. I did the following bits of cleanup:

  • Pretty print (with 4 space indentation)
  • Remove DAC namespace elements, because these aren’t valid
  • Remove trailing whitespace in text nodes
  • Replace bad @statuses with either "active" or "withdrawn" (The @status attribute in the DAC XML sometimes contains "active-MCD", "active- Pilot" or "Vonlontary basis" [sic], which are not valid statuses. I’ve flagged this issue with your colleagues and with Valerie by email, but for the purposes of this PR I’ve fixed these manually.)
  • Split into constituent files
  • Add the IATI metadata back (e.g. DAC calls the channel code codelist "Channelcode", whereas IATI calls it "CRSChannelCode")

I think that’s everything. Here’s what I haven’t done:

  • Reordered codes to match the previous order (which would probably make diffs a bit easier to read)
  • Gone through and checked for removed codes (anything removed is a problem, since it should instead be marked as status="withdrawn")

The diff in this PR shows that quite a lot of stuff has changed. I guess that’s mostly because the source has changed from Excel to XML, and there are some mismatches between the two. I think it will be difficult to verify and merge this PR for that reason. If the goal is to eventually use XML from the DAC as the source for these replicated codelists, then I’d be tempted to go back to the DAC technical team with a list of stuff to fix at their end, and use the Excel file as the source in the interim.


I’m very pleased you’re looking at this, because it’s really important that these replicated codelists are kept in sync with source. For instance, a validator might say that a dataset is invalid because a bad sector code is used, when in fact the problem might be that the IATI replicated Sector codelist is out of sync, and doesn’t include a complete list of sector codes. A publisher could also be scored down on the Aid Transparency Index for the same reason. Or an aid management system might rely on these codelists for interpreting published IATI data.

Anyway – I’d be happy to discuss next steps.

@samuele-mattiuzzo

This comment has been minimized.

Copy link
Contributor

samuele-mattiuzzo commented Jan 29, 2019

@andylolz fab, thanks! Petya and the BAs have this to check on their todo list, it'll be checked during this week!

@PetyaKangalova

This comment has been minimized.

Copy link
Contributor

PetyaKangalova commented Jan 30, 2019

@andylolz thanks so much for your work on this and clarifying the steps you have undertaken. The crucial bit here is to again get confirmation from the OECD DAC that the Excel and XML include exactly the same content which at moment is not the case! We were promised that the XML will be in sync with the source file. I have copied you in the email I sent to Valerie from the DAC so that we get an answer from them and be able to proceed with the changes as soon as possible. Thanks again!

@PetyaKangalova

This comment has been minimized.

Copy link
Contributor

PetyaKangalova commented Jan 31, 2019

We have now received a response from the OECD that the XML files has been updated and both Excel and XML files have been pulled from the same source.

Both xml and xl file have been regenerated (from SQL as unique source, except for Channel codes) and are available on our website http://www.oecd.org/dac/financing-sustainable-development/development-finance-standards/dacandcrscodelists.htm.

From a quick look of the difference I have identified before the codelists are now identical in the Excel and XML files so I think we can use the updated XML to update the codelists on the IATI website.

@andylolz Is there a way of easily re-doing what you have done so far with the updated XML file? Then I can review the pull request. If it requires a lot of manual work for you, then I can look into making the comparison and adding the pull requests.

@andylolz

This comment has been minimized.

Copy link
Contributor Author

andylolz commented Feb 1, 2019

@PetyaKangalova no problem – I’ll try and get this sorted today.

@samuele-mattiuzzo

This comment has been minimized.

Copy link
Contributor

samuele-mattiuzzo commented Feb 1, 2019

Thank you Andy!

@PetyaKangalova

This comment has been minimized.

Copy link
Contributor

PetyaKangalova commented Feb 1, 2019

Thanks @andylolz ! I am off on Monday and in meetings all of Tuesday but should be able to review mid-next week! Thanks again!

@andylolz

This comment has been minimized.

Copy link
Contributor Author

andylolz commented Feb 1, 2019

Okay – PR updated using the latest (updated) version of DAC XML. I followed the same steps described above.

@PetyaKangalova

This comment has been minimized.

Copy link
Contributor

PetyaKangalova commented Feb 13, 2019

@andylolz thanks again for redoing the commit. Really appreciate it! It took me a while to review all the changes as there are quite a lot of them! See summary below:

  • Aid Type Category- ready to approve changes
  • Aid Type- ready to approve changes
  • CRS Channel Code- Valerie confirmed that CRS Channel code is the only one not created from source. In the XML you have used some existing codes are missing and don’t think that is on purpose
  • Collaboration Type- ready to approve changes
  • Finance Type Category- ready to approve changes
  • Finance Type- ready to approve changes
  • Flow Type- ready to approve changes
  • Sector- ready to approve changes except for code 74010 and 74020
  • Sector Category- need to understand why descriptions have been removed before approving

Next steps:

  1. @andylolz , would you be able to remove the CRS Channel Code section from the pull request for now? I will contact Valerie to get confirmation but feel like this one will take some time as the XML has not been created from their source database and don’t want to hold the other changes.
  2. I will contact Valerie to get confirmation on whether sector code 74010 has been withdrawn and why description for sector categories have been removed.
  3. Once I get confirmation on 2 , I will make the necessary changes and approve the pull request. As I was reviewing the changes I kept track of all of them (whether it was code addition or change of name or description). I will then work on adding all changes to the non-embedded codelist changelog (just for the sector codelist there are more than 40 changes so might take us some time)
  4. Once codelist changes and changelog have been approved and deployed, we will add a post on IATI Discuss.
  5. We will also contact publishing tool providers to make them aware of the changes.
@andylolz

This comment has been minimized.

Copy link
Contributor Author

andylolz commented Feb 13, 2019

It took me a while to review all the changes as there are quite a lot of them!

There are indeed! Great work reviewing!

  1. @andylolz , would you be able to remove the CRS Channel Code section from the pull request for now?

Kk, done.

  • Sector- ready to approve changes except for code 74010 and 74020

Oh, good spot! The same applies to 41050 (Flood prevention/control), which has also disappeared.

Also, the following withdrawn sector codes have disappeared:

  • 15120: Public sector financial management
  • 15140: Government administration
  • 15161: Elections
  • 15162: Human rights
  • 15163: Free flow of information
  • 15164: Women's equality organisations and institutions
  • 23010: Energy policy and administrative management
  • 23020: Power generation/non-renewable sources
  • 23030: Power generation/renewable sources
  • 23040: Electrical transmission/ distribution
  • 23050: Gas distribution
  • 23061: Oil-fired power plants
  • 23062: Gas-fired power plants
  • 23063: Coal-fired power plants
  • 23064: Nuclear power plants
  • 23065: Hydro-electric power plants
  • 23066: Geothermal energy
  • 23067: Solar energy
  • 23068: Wind power
  • 23069: Ocean power
  • 23070: Biomass
  • 23081: Energy education/training
  • 23082: Energy research
  • 92010: Support to national NGOs
  • 92020: Support to international NGOs
  • 92030: Support to local and regional NGOs

They may have been replaced by other codes, but the idea is they’re supposed to remain in perpetuity as status="withdrawn".

@andylolz

This comment has been minimized.

Copy link
Contributor Author

andylolz commented Feb 13, 2019

3. just for the sector codelist there are more than 40 changes so might take us some time

I’ve mentioned elsewhere that I’m in favour of scrapping this changelog. I’m unconvinced it’s worth your time. It wasn’t updated for the last DAC codelist update (see: IATI/IATI-Guidance#312) so it’s only a partial list of changes anyway.

5. We will also contact publishing tool providers to make them aware of the changes.

Okay – this is very generous of you, but again I don’t think this should be standard practice. Tool providers should be keeping an eye on discuss, or routinely pulling from source. That’s the system as documented. If they start relying on updates from you then that just becomes an extra overhead for you.

@PetyaKangalova

This comment has been minimized.

Copy link
Contributor

PetyaKangalova commented Feb 13, 2019

@andylolz

Kk, done.

Thank you!

Oh, good spot! The same applies to 41050 (Flood prevention/control), which has also disappeared.

Thank you for flagging. I missed this one!

Also, the following withdrawn sector codes have disappeared:

  • 15120: Public sector financial management
  • 15140: Government administration
  • 15161: Elections
  • 15162: Human rights
  • 15163: Free flow of information
  • 15164: Women's equality organisations and institutions
  • 23010: Energy policy and administrative management
  • 23020: Power generation/non-renewable sources
  • 23030: Power generation/renewable sources
  • 23040: Electrical transmission/ distribution
  • 23050: Gas distribution
  • 23061: Oil-fired power plants
  • 23062: Gas-fired power plants
  • 23063: Coal-fired power plants
  • 23064: Nuclear power plants
  • 23065: Hydro-electric power plants
  • 23066: Geothermal energy
  • 23067: Solar energy
  • 23068: Wind power
  • 23069: Ocean power
  • 23070: Biomass
  • 23081: Energy education/training
  • 23082: Energy research
  • 92010: Support to national NGOs
  • 92020: Support to international NGOs
  • 92030: Support to local and regional NGOs

Yes, I agree! I also noticed that there were a few new 'withdrawn' as of 2015 that were not on the IATI list. As they are already withdrawn I was not so concerned but it means the XML is not consistent.

On your point for the changelog I agree that it is a lot of effort. However, this time round there are quite a lot of new codes and it will be important to alert people which ones those are and also make sure organisations can start using them via the various publishing tools. Hence, dropping them a quick email to speed up the process, but it is indeed their responsibility of the tool providers to keep them up-to-date.

Waiting to hear from Valerie and will then action the changes!

@andylolz

This comment has been minimized.

Copy link
Contributor Author

andylolz commented Feb 13, 2019

Excellent – all good!

Yes, I agree! I also noticed that there were a few new 'withdrawn' as of 2015 that were not on the IATI list. As they are already withdrawn I was not so concerned but it means the XML is not consistent.

Yes that’s true, but I’d expect DAC to have a better record of withdrawn codes than IATI (since IATI only started recording these relatively recently). So withdrawn codes in the XML that were not previously known to IATI are probably a good thing :)

@@ -3090,6 +3102,18 @@
</description>
<category>520</category>
</codelist-item>
<codelist-item status="withdrawn" activation-date="2018-01-01">
<code>52020</code>

This comment has been minimized.

@PetyaKangalova

PetyaKangalova Feb 20, 2019

Contributor

I am not sure why this was added as withdrawn- I cannot find in our existing codelist and seems to be withdrawn relatively recently. I will check with Valerie whether it was done in error. It will not have implications for user as it is withdrawn anyway, but it is strange.

This comment has been minimized.

@andylolz

andylolz Feb 20, 2019

Author Contributor

@PetyaKangalova: This document (from November 2018) says:

The code 52020 “Household food security programmes” previously approved at the July 2017 WP-STAT meeting but not yet in effect, is moved under the “other multisector” category (430) and assigned a new code 43072.

You can download a copy of the XLS as it stood in August 2017 from here:
https://github.com/datasets/dac-and-crs-code-lists/blob/0a354cefc5123cafd2b79aecb2cc31bb9753f9a9/source/codelists.xls

It doesn’t include the code, so perhaps “not yet in effect” means it didn’t actually ever make it onto the codelist. Not sure.

This comment has been minimized.

@andylolz

andylolz Feb 20, 2019

Author Contributor

BTW I tried to use the dashboard to check whether 52020 is used in IATI data, but found a bug 😞

This comment has been minimized.

@PetyaKangalova

PetyaKangalova Feb 21, 2019

Contributor

Thanks @andylolz I just got the response a response from Valerie which is in line with the comment you have provided. I think this code was never added to the DAC Excel spreadsheet as there is normally a timelag from approval until addition to the list, hence it was never replicated.

The code 52020 is a special case. Code 52020 has been approved in the context of aligning purpose codes to SDGs in the nutrition area in 2017 (DCD/DAC/STAT/RD(2017)11), to be implemented in 2019 on 2018 flows. But in the context of revision of purpose codes in light of SDGs in food safety and food security area, this purpose code has been recoded in 43072 (DCD/DAC/STAT(2018)40/REV2) also to be implemented in 2019 on 2018 flows.
We have decided to stop to disseminate it as it is confusing.

As such, I don't think we need to be adding it.

About the Dashboard, I was also unable to identify the publishers using this code.

@PetyaKangalova
Copy link
Contributor

PetyaKangalova left a comment

Approving the full pull request following a few revisions and updated commits. Many thanks @andylolz

Leaving for @IATI/devs to merge and deploy next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment