Skip to content
This repository has been archived by the owner on Mar 27, 2024. It is now read-only.

[clinical] Capture provenance of the collection-specific columns #24

Closed
fedorov opened this issue Mar 22, 2022 · 1 comment
Closed

[clinical] Capture provenance of the collection-specific columns #24

fedorov opened this issue Mar 22, 2022 · 1 comment
Assignees

Comments

@fedorov
Copy link
Member

fedorov commented Mar 22, 2022

Per discussion yesterday, it will be very useful to capture the provenance of the items in the per-collection metadata dictionary:

  • what file they are coming from (+hash of that file)
  • if the above is a zip file - which file from the zip file they come from
  • what sheet, if applicable
  • if there is a hierarchy of column names - include that hierarchy
@G-White-ISB
Copy link

Note: when the clinical data for one collection is split into two batches each column will have two source columns. Sometimes the original column name might be different between the sources - ie one batch may use 'Age' and another 'age' but they both map to the 'age' column in the final BQ table. But yes we can handles such situations in the captured provenance.

@fedorov fedorov closed this as completed Jun 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants