Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop blank columns and sheets when flattening #241

Closed
wants to merge 9 commits into from

Conversation

Bjwebb
Copy link
Member

@Bjwebb Bjwebb commented Oct 10, 2018

Relevant issue is in CoVE repo: OpenDataServices/cove#1019

(ie. don't pull extra columns from the schema)

A --empty-schema-columns flag is provided for commandline users who want the previous behaviour.

@odscjames
Copy link
Contributor

@Bjwebb , if you look at the branch "cove-1019-no-blank-sheets-docs-and-tests" you'll see that I've added a bit to the docs. I also added a new test. Now, with the changes to examples/flatten/simple tests that you did and my new test we have a test for this mode being both off and on. Are you happy to accept that? If so, feel free to merge it to this branch/P.R. with any changes you want. I haven't looked at detail to the code changes you've made, but I trust you on those as long as we have some clear docs and tests for both modes to help any future developers working on this.

@robredpath
Copy link
Member

@odscjames - @Bjwebb is unlikely to see or respond to your comment here. Good work, though!

It sounds like this is ready to go. For the 4 CoVEs:

  • IATI CoVE - talk to @stevieflow and @rory09 . I think they already work around this, and XML -> xlsx is less common for them anyway
  • 360Giving CoVE - talk to @stevieflow, but I think this is both welcome and expected.
  • OCDS CoVE - wait another week for comments from the community, then check in with @duncandewhurst
  • BODS CoVE - check with @ScatteredInk, if that's even a thing yet!

@stevieflow
Copy link
Member

Thanks

I can only see http://1019-no-blank-sheets.dev.cove.opendataservices.coop/validator/ to test for OCDS, so unsure if there should be a version for other CoVEs @odscjames @robredpath ?

360 - I can see from this : https://dataquality.threesixtygiving.org/data/59f09a1f-74cc-4427-a3cb-e9a6dd3426b7 - that a JSON file converted to a spreadsheet converted from JSON includes lots of extra sheets that are blank. Ill confirm with 360 ( @KDuerden @morchickit ) that we should only output populated sheets

IATI XML -> xlsx : I'm not sure we have this behaviour. When I convert an XML file that does not cover all parts of teh standard: http://iati.cove.opendataservices.coop/data/3105173d-22a4-4522-8e66-b4056b3b7fd8 - I receive a spreadsheet that has populated sheets. There are no blank sheets. Is this expected @Bjwebb ?

@stevieflow
Copy link
Member

Re: 360 data JSON > xlsx - @calummacuisdean may have an opinion, as SCVO publish 360 as JSON.

If I convert an SCVO JSON file to spreadsheet: https://dataquality.threesixtygiving.org/data/3b4a76c8-91aa-4786-b6cf-6c5d0e51fb3f --> is it useful to have all the blank fields from the standard, or safe to drop them, as per this PR...?

@odscjames
Copy link
Contributor

I have added my doc and test directly into this pull request.

@kindly
Copy link
Contributor

kindly commented Oct 19, 2018

@stevieflow yes for XML data the empty sheets will not be made as we do not look at the iati schema when flattening the data. So no extra columns/sheets are made.

I am not sure what the correct default behaviour here is. This pull request changes the default behaviour to ignore empty sheets/columns that are in the schema, but maybe the correct default behaviour is to keep flattentool as it is and have the flag be --remove-empty-schema-columns.

For each standard in cove we can then decide if we would like to switch this on or not.

@odscjames
Copy link
Contributor

Discussion with @kindly - I'm going to do a new pull request, keeping behaviour the same as he suggests

@odscjames
Copy link
Contributor

This has now been done by #268, and is now closed.

@odscjames odscjames closed this Oct 24, 2018
@BibianaC BibianaC deleted the cove-1019-no-blank-sheets branch April 7, 2020 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants