-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating RFC to query by metadata schema versions so that components are notified only of data that they are compatible with. #90
Conversation
@maniarathi I'm offering to shepherd here, if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maniarathi and I agreed that pinning the version would be optional and I think this should be explicitly stated here. Apologies if I missed it. I also think this got moved too quickly from the Google doc to the RFC stage. There is ongoing discussion in the doc.
We propose adding two new required fields to each list of schema properties: | ||
|
||
``` | ||
"schema_major_version": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What schema file exactly in would this addition happen in? The same schema that specifies "describedBy"?
Also, consider a nested approach:
"schema_version": {
"major": 1,
"minor": 2
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears in the provenance schema. https://github.com/HumanCellAtlas/metadata-schema/blob/master/json_schema/system/provenance.json
What's the benefit of the nested approach versus non-nested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree that this is a more straight-forward representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is that this will be in the provenance section of the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have to run this by the metadata folks... @simonjupp what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to have schema version fields in provenance. We want to ultimately move describedBy
there, as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the question was about the nested versus non-nested approach to specifying the schema version. Definitely will be in the provenance block and I made a note as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nit for me and my approval does not depend on it. I think the nesting makes sense and should be easy to implement. It would prevent the proliferation of related schema_…
properties in the provenance section, for example if we add the revision number in the future. If this nit is addressed, the example queries below would have to be adjusted (s/schema_/schema.
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sounds good! Still working on getting verification. The specific part I am unsure about is whether having nested properties means that I would have to create an entire new schema for the schema_version property. From pretty much every other example I've looked at, it seems as though I would have to create an entirely new schema_version.json schema which I would rather not do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically you can nest the schemas for nested objects but I don't know if that's against the MD teams's rules. Even if it were, adding a module (that's what they call schemas for nested objects) would allow it to be reused in other places. And it would not be hard to do. I'll file the MD PR if that helps.
953ae53
to
dce82f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe that these fields are added to system/provenance.json and the rationale being that it can be added automatically by ingest.
That plus hannes' changes and I will give this an enthusiastic thumbs up
It would also be good to provide a common library to implement create the subscriptions, rather than have each component recode the same thing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update example queries to reflect intended use case, but otherwise, sounds great to me!
ae43b90
to
e7124df
Compare
@diekhans This is definitely something I consider a "nice-to-have" rather than a feature to stick in the MVP :) I added a section to the end of the RFC on Future Work and included the automatic generation there. |
e7124df
to
e9abcdb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@diekhans Modified to add the note about the schema versions being placed into the provenance block. |
Many people are confused about usage here: * #91 (comment) * #90 (comment) * ... have observed others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some outstanding questions about the schema_revision_number
and schema_branch
fields, but I trust everyone else to reach a decision after I leave. So, happy to approve!
We propose adding two new required fields to each list of schema properties: | ||
|
||
``` | ||
"schema_major_version": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nit for me and my approval does not depend on it. I think the nesting makes sense and should be easy to implement. It would prevent the proliferation of related schema_…
properties in the provenance section, for example if we add the revision number in the future. If this nit is addressed, the example queries below would have to be adjusted (s/schema_/schema.
).
* Hint that the title name should be replaced Many people are confused about usage here: * #91 (comment) * #90 (comment) * ... have observed others * Update rfc-template.md * Update rfc-template.md
47793da
to
a3935ad
Compare
70a6471
to
664ac9b
Compare
* Hint that the title name should be replaced Many people are confused about usage here: * HumanCellAtlas#91 (comment) * HumanCellAtlas#90 (comment) * ... have observed others * Update rfc-template.md * Update rfc-template.md
RFC approved during the 2019-08-23 Technical Architecture Team meeting. |
…are notified only of data that they are compatible with.
664ac9b
to
ba59206
Compare
ba59206
to
38d6618
Compare
### User Stories | ||
|
||
*Share the [User Stories](https://www.mountaingoatsoftware.com/agile/user-stories) motivating this RFC.* | ||
* As a data wrangle, I would like to be able to push new data adhering to the latest metadata schema without having to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/wrangle/wranger/
this document simply designs the minimum system required to unblock wranglers from submitting data and protecting | ||
downstream components (and therefore the entire DCP) from breaking due to incompatibility. | ||
|
||
Note: the subscription modifications will only exist in the production environment. Once the metadata schema integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this mechanism should only run in production.
Integration should eagerly surface incompatibilities, sure. Chaos is more acceptable there.
However, the purpose of staging is to replicate as closely as possible the production environment, and be our "smoke test" before production deployment. This means it should bias towards stability, and use, as closely as possible, all the mechanisms that production uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
* Hint that the title name should be replaced Many people are confused about usage here: * HumanCellAtlas#91 (comment) * HumanCellAtlas#90 (comment) * ... have observed others * Update rfc-template.md * Update rfc-template.md
…are notified only of data that they are compatible with. (HumanCellAtlas#90) * Creating RFC to query by metadata schema versions so that components are notified only of data that they are compatible with. * Creating link now that the RFC is approved.
August 12: Last day for community reviewAugust 23: Last day for oversight reviewRFC approved during the 2019-08-23 Technical Architecture Team meeting.
During community review, reviewers