Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating RFC to query by metadata schema versions so that components are notified only of data that they are compatible with. #90

Merged
merged 2 commits into from
Aug 23, 2019

Conversation

maniarathi
Copy link
Contributor

@maniarathi maniarathi commented Jul 25, 2019

August 12: Last day for community review
August 23: Last day for oversight review
RFC approved during the 2019-08-23 Technical Architecture Team meeting.

During community review, reviewers

  • corrected some errors in sample ElasticSearch queries and improved the style of others
  • debated the style of whether to nest the major and minor versions of schema versions or not
  • fixed some links
  • discussed how to best notify teams of changes to the metadata schema
  • discussed the benefits of having schema branch and revision numbers in the metadata, this was deferred to different RFC in name of moving forward with an MVP

@mweiden
Copy link
Contributor

mweiden commented Jul 25, 2019

@maniarathi I'm offering to shepherd here, if needed

Copy link

@hannes-ucsc hannes-ucsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maniarathi and I agreed that pinning the version would be optional and I think this should be explicitly stated here. Apologies if I missed it. I also think this got moved too quickly from the Google doc to the RFC stage. There is ongoing discussion in the doc.

rfcs/text/0000-query-by-metadata-schema-versions.md Outdated Show resolved Hide resolved
We propose adding two new required fields to each list of schema properties:

```
"schema_major_version": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What schema file exactly in would this addition happen in? The same schema that specifies "describedBy"?

Also, consider a nested approach:

"schema_version": {
    "major": 1,
    "minor": 2
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears in the provenance schema. https://github.com/HumanCellAtlas/metadata-schema/blob/master/json_schema/system/provenance.json

What's the benefit of the nested approach versus non-nested?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree that this is a more straight-forward representation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is that this will be in the provenance section of the doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to run this by the metadata folks... @simonjupp what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to have schema version fields in provenance. We want to ultimately move describedBy there, as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question was about the nested versus non-nested approach to specifying the schema version. Definitely will be in the provenance block and I made a note as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit for me and my approval does not depend on it. I think the nesting makes sense and should be easy to implement. It would prevent the proliferation of related schema_… properties in the provenance section, for example if we add the revision number in the future. If this nit is addressed, the example queries below would have to be adjusted (s/schema_/schema.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good! Still working on getting verification. The specific part I am unsure about is whether having nested properties means that I would have to create an entire new schema for the schema_version property. From pretty much every other example I've looked at, it seems as though I would have to create an entirely new schema_version.json schema which I would rather not do.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically you can nest the schemas for nested objects but I don't know if that's against the MD teams's rules. Even if it were, adding a module (that's what they call schemas for nested objects) would allow it to be reused in other places. And it would not be hard to do. I'll file the MD PR if that helps.

rfcs/text/0000-query-by-metadata-schema-versions.md Outdated Show resolved Hide resolved
rfcs/text/0000-query-by-metadata-schema-versions.md Outdated Show resolved Hide resolved
@maniarathi maniarathi force-pushed the maniarathi-schema-version-query-rfc branch from 953ae53 to dce82f5 Compare July 26, 2019 21:56
Copy link
Contributor

@diekhans diekhans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please describe that these fields are added to system/provenance.json and the rationale being that it can be added automatically by ingest.

That plus hannes' changes and I will give this an enthusiastic thumbs up

@diekhans
Copy link
Contributor

It would also be good to provide a common library to implement create the subscriptions, rather than have each component recode the same thing.

Copy link

@calvinnhieu calvinnhieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update example queries to reflect intended use case, but otherwise, sounds great to me!

rfcs/text/0000-query-by-metadata-schema-versions.md Outdated Show resolved Hide resolved
rfcs/text/0000-query-by-metadata-schema-versions.md Outdated Show resolved Hide resolved
@maniarathi maniarathi force-pushed the maniarathi-schema-version-query-rfc branch 2 times, most recently from ae43b90 to e7124df Compare July 29, 2019 20:10
@maniarathi
Copy link
Contributor Author

It would also be good to provide a common library to implement create the subscriptions, rather than have each component recode the same thing.

@diekhans This is definitely something I consider a "nice-to-have" rather than a feature to stick in the MVP :) I added a section to the end of the RFC on Future Work and included the automatic generation there.

@maniarathi maniarathi force-pushed the maniarathi-schema-version-query-rfc branch from e7124df to e9abcdb Compare July 30, 2019 18:12
Copy link

@calvinnhieu calvinnhieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@maniarathi
Copy link
Contributor Author

@diekhans Modified to add the note about the schema versions being placed into the provenance block.

@maniarathi maniarathi removed the request for review from malloryfreeberg July 30, 2019 18:18
mweiden added a commit that referenced this pull request Jul 30, 2019
Many people are confused about usage here:
* #91 (comment)
* #90 (comment)
* ... have observed others
Copy link
Member

@malloryfreeberg malloryfreeberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some outstanding questions about the schema_revision_number and schema_branch fields, but I trust everyone else to reach a decision after I leave. So, happy to approve!

rfcs/text/0000-query-by-metadata-schema-versions.md Outdated Show resolved Hide resolved
rfcs/text/0000-query-by-metadata-schema-versions.md Outdated Show resolved Hide resolved
rfcs/text/0000-query-by-metadata-schema-versions.md Outdated Show resolved Hide resolved
We propose adding two new required fields to each list of schema properties:

```
"schema_major_version": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit for me and my approval does not depend on it. I think the nesting makes sense and should be easy to implement. It would prevent the proliferation of related schema_… properties in the provenance section, for example if we add the revision number in the future. If this nit is addressed, the example queries below would have to be adjusted (s/schema_/schema.).

mweiden added a commit that referenced this pull request Jul 31, 2019
* Hint that the title name should be replaced

Many people are confused about usage here:
* #91 (comment)
* #90 (comment)
* ... have observed others

* Update rfc-template.md

* Update rfc-template.md
@maniarathi maniarathi force-pushed the maniarathi-schema-version-query-rfc branch from 47793da to a3935ad Compare August 1, 2019 14:31
@maniarathi maniarathi force-pushed the maniarathi-schema-version-query-rfc branch from 70a6471 to 664ac9b Compare August 2, 2019 22:02
diekhans pushed a commit to barkasn/dcp-community that referenced this pull request Aug 16, 2019
* Hint that the title name should be replaced

Many people are confused about usage here:
* HumanCellAtlas#91 (comment)
* HumanCellAtlas#90 (comment)
* ... have observed others

* Update rfc-template.md

* Update rfc-template.md
@mweiden
Copy link
Contributor

mweiden commented Aug 23, 2019

RFC approved during the 2019-08-23 Technical Architecture Team meeting.

…are notified only of data that they are compatible with.
@maniarathi maniarathi force-pushed the maniarathi-schema-version-query-rfc branch from 664ac9b to ba59206 Compare August 23, 2019 23:17
@maniarathi maniarathi force-pushed the maniarathi-schema-version-query-rfc branch from ba59206 to 38d6618 Compare August 23, 2019 23:19
@maniarathi maniarathi merged commit 52c0b55 into master Aug 23, 2019
@maniarathi maniarathi deleted the maniarathi-schema-version-query-rfc branch August 23, 2019 23:19
### User Stories

*Share the [User Stories](https://www.mountaingoatsoftware.com/agile/user-stories) motivating this RFC.*
* As a data wrangle, I would like to be able to push new data adhering to the latest metadata schema without having to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/wrangle/wranger/

this document simply designs the minimum system required to unblock wranglers from submitting data and protecting
downstream components (and therefore the entire DCP) from breaking due to incompatibility.

Note: the subscription modifications will only exist in the production environment. Once the metadata schema integration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this mechanism should only run in production.
Integration should eagerly surface incompatibilities, sure. Chaos is more acceptable there.
However, the purpose of staging is to replicate as closely as possible the production environment, and be our "smoke test" before production deployment. This means it should bias towards stability, and use, as closely as possible, all the mechanisms that production uses.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

diekhans pushed a commit to diekhans/dcp-community that referenced this pull request Oct 31, 2019
* Hint that the title name should be replaced

Many people are confused about usage here:
* HumanCellAtlas#91 (comment)
* HumanCellAtlas#90 (comment)
* ... have observed others

* Update rfc-template.md

* Update rfc-template.md
diekhans pushed a commit to diekhans/dcp-community that referenced this pull request Oct 31, 2019
…are notified only of data that they are compatible with. (HumanCellAtlas#90)

* Creating RFC to query by metadata schema versions so that components are notified only of data that they are compatible with.

* Creating link now that the RFC is approved.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet