RFC: Ensure dependent services are set up in production before allowing a PR requiring schema changes to be merged #109

orta · 2018-11-15T17:37:55Z

Proposal:

An example might be easier to think about:

Let's say you have made changes to Exchange (which Metaphysics depends on) adding a new field to a type in it's API. You will need to make a PR to metaphysics to merge those changes into the metaphysics global schema. In making this PR, if the changes in Exchange haven't been deployed to production haven't been shipped to production then your PR to metaphysics will fail.

Roughly

If you want to make a change further down this list, the things above you need to be deployed in prod when you are making GraphQL schema changes.

This is currently happening on a force deploy with artsy/force#3061 - but we'd like to move that behavior to run on metaphysics for its dependencies ( and eventually in Emission/Reaction too)

Reasoning

This came up in the platform practice that the ability to be sure that all your dependencies are set up is valuable enough to introduce some friction to the process.

Doing this:

Ensures we move dependencies into production more often
Ensures that PRs which do introduce changes to the system won't be deployed because another team is working in the same repo and does a deploy

Additional Context:

You can see our discussion in slack here

mbilokonsky · 2018-11-15T17:47:55Z

I think this is a good idea. We have to be able to reason in isolation, and shouldn't be relying on remembering what got deployed to what server in what order etc. The fact is that if the MP PR under discussion got merged and deployed then there would be a crash, right? Seems like a simple win/win to me - it doesn't add complexity, that complexity is already there, it just provides a formal mechanism for dealing with it safely.

joeyAghion · 2018-11-16T17:48:52Z

If possible, I'd like to expand this RFC to generalize it from schema changes to all changes. (Otherwise I'd have to make a separate RFC 😄.)

The GraphQL validation script enforces this for schema changes. A human process should enforce the same expectation when it comes to other dependencies. I.e., we should model and encourage pull requests to only be opened when their prerequisites are already in production. (This brings the same benefits described above of deploys being frequent, low-risk, and fast.)

Of course, [WIP] PRs are still acceptable for early feedback or whatever, but they should be clearly labeled as such and/or include blocking failures.

mzikherman · 2018-11-16T18:03:53Z

I.e., we should model and encourage pull requests to only be opened when their prerequisites are already in production.

This (anecdotally in my own experience) differs from the 'ideal' developer workflow. Which is, to make some backend/MP changes, in concert with a front-end update. Sometimes you don't exactly know if everything is truly ready to go on the backend/MP as you're working on the front-end (Reaction component). Then, when you have something working on the front-end, and you're more confident in your backend/MP changes, they're then generally ready to PR at the same time.

So requiring the backend/MP (and within that group, backend and then MP) to be deployed to production before any front-end PR that consumes it feels a bit less than ideal, and adds some friction.

If people do want that, then 👍 all the better, I just wanted to point out that the ideal workflow (IMO) conflicts with this slightly, so we should figure out a way to enforce/encourage this. Otherwise it might be hard to keep that discipline up.

joeyAghion · 2018-11-16T18:18:24Z

@mzikherman would designating those downstream PRs as works-in-progress (i.e., unmergeable) work? Or does the ideal workflow require that those PRs be merged?

dleve123 · 2018-11-19T17:19:55Z

+1 to this!

I think solid next steps would be to do:

Add comparable checks in Reaction and Emission (to protect end-clients)
Add a check to MP

I think there's slightly less risk to the MP side ATM as at least Exchange has some tooling to enforce _schema.yml updates, so I would prioritize Reaction and Emission first.

dleve123 · 2018-11-19T17:25:45Z

Re #109 (comment)

I'm all for ratifying this deployment philosophy, but I think there would need to be some tooling / workflow to support this. Otherwise, I think it would incredibly hard to operationalize behavior change. The fact that the graphql ecosystem has built tooling like findBreakingChanges makes me feel more confident that we can actually operationalize this philosophy for graphql-based dependencies.

@joeyAghion Do you have any thoughts on how this philosophy could be operationalized? I think planning rigor in JIRA could be an aid here and services like Horizon help a ton, but nothing comes to mind as some sort of definitive assurance check.

orta · 2018-11-19T17:36:42Z

Do you have any thoughts on how this philosophy could be operationalized?

We should have all the tools necessary to do it on CI (local schemas, tagged releases) that can let Peril or Danger figure it out at PR time.

joeyAghion · 2018-11-20T19:12:32Z

We should definitely enforce what can be done via tools. As far as non-schema dependencies, I first wanted to see if there was agreement about the goal. It sounds like there is, except for @mzikherman's observation about how it hampers the ideal dev workflow. Matt, see my question above about that.

If there's consensus, we should at least update the playbook to explicitly state the expectation that prerequisite releases are complete before PRs can be merged.

I think it is possible to set an example about this and casually enforce it via PR comments like:

This should be [WIP] until ... is released.

Or, harsher:

Closed pending ... release.

Or, punnier:

Don't forget PreRequisites when making a PR

ashfurrow · 2018-11-21T20:09:50Z

I'm 👍 on the schema change requirements; I'm also 👍 on the changes Joey mentioned as long as they're automated by tools. I think we should prioritize getting the schema requirements set up, though.

mzikherman · 2018-11-21T20:28:16Z

@joeyAghion designating as WIP is great! (My #minor point was about wanting to open PRs at the same time, designating some as blocked/WIP is perfect for that. If that could somehow be automated, ie- if you write 'depends on ...' in the body maybe a WIP label applied cough danger, etc.).

👍 on all this.

ashkan18 · 2018-12-19T18:33:45Z

Sorry to be late in this, but we've been practicing this for Exchange -> MP -> Reaction -> Force 🚋 and has been really useful so all 👍for this.

One interesting side-effect of this is also we end up deploying downstream services more often.

izakp · 2019-01-17T18:41:41Z

I feel like I have half the picture and I want to see a full dependency graph.

Can we spec implementation more clearly in terms of the operations we want to perform for a given repo?
I.e. in terms of Metaphysics, on PRs to master we should:

Check the proposed schema changes against Exchange production introspection API
Check the proposed schema changes against Gravity production introspection API
... and so on, for all dependent services.

@ashkan18 in this sense I am not following you when you say you are practicing this for "Exchange -> MP -> Reaction -> Force" - can you point to me to this implementation?

In any case let's try and get artsy/force#3061 working first, then come back to a Metaphysics implementation once well defined

mbilokonsky · 2019-01-17T19:41:08Z

We're doing this on local discovery now, too - it's not really a formal system exactly it's just that if a feature we need says "Add field X to model Y" then we create tickets to make the change in gravity, then make the change in MP, then downstream systems - and we set up the is blocking/is blocked by relations in JIRA to capture this.

The real question is, to what extent can this be automated? Can we catch errors in e.g. whether arguments are expressed in camelCase or snake_case? How much can we enforce via linting? Etc.

Discussing in the platform practice today the notion of grabbing the schema associated with a given commit hash makes some sense, but I'm not sure it captures all cases. As I understand it, we essentially have a KV map where the K is a commit hash and the V is the string schema as of that commit. But given that we're deploying these systems independently, how do we say that a given MP PR requires Gravity to be on commit X and Exchange to be on commit Y? Presumably both gravity and exchange could be on future commits, or could have been rolled back, right? So it feels like we still lack the ability to describe the system at large, or to enforce an RFC like this fully programmatically?

izakp · 2019-01-25T13:05:22Z

Now that Force has implemented validating its schema against Metaphysics staging on merges to master, and against production on merges to release, I would suggest that we roll out this pattern to other applications downstream from Metaphysics and validate upstream... i.e. downstream applications like Force, Exchange, Kaws, Gravity adopt the same pattern.

Validating downstream i.e. Metaphysics validating itself against all its downstream services would IMHO put us closer to a dependency-hell like @mbilokonsky describes - a k/v map of commit hash to the schema at that commit.

ashkan18 · 2019-01-25T14:10:14Z

@izakp this has mostly been through PR process and pretty manual. Meaning when I make a MP change that depends on the Exchange change, we make sure the MP PR is WIP till Exchange has been deployed to production, once that gets deployed, we remove WIP from that PR.

orta · 2019-02-18T15:39:18Z

I think discussion this RFC is pretty much at a stalled point, some work got put on the platform roadmap. So, I'm going to close this up.

Resolution

We did some of this notably the most critical part of force/metaphysics.

Level of Support

2: Positive feedback.

Additional Context:

Interesting questions about the scope of this changes, and overall feasibility.

Next Steps

N/A

Exceptions

N/A

orta changed the title ~~RFC: Ensure dependent services are set up in production before allowing a PR requiring changes to be merged~~ RFC: Ensure dependent services are set up in production before allowing a PR requiring schema changes to be merged Nov 15, 2018

peril-staging bot added the RFC label Nov 16, 2018

joeyAghion mentioned this issue Nov 28, 2018

Clarify expectation that PRs are only opened when safe to merge and release #119

Merged

orta mentioned this issue Dec 12, 2018

Adds a check for PRs on Metaphysics that production is up-to-date with staging for dependent repos artsy/peril-settings#86

Closed

joeyAghion mentioned this issue Jan 18, 2019

Reinstate validate schemas artsy/force#3379

Merged

orta closed this as completed Feb 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Ensure dependent services are set up in production before allowing a PR requiring schema changes to be merged #109

RFC: Ensure dependent services are set up in production before allowing a PR requiring schema changes to be merged #109

orta commented Nov 15, 2018 •

edited

mbilokonsky commented Nov 15, 2018

joeyAghion commented Nov 16, 2018

mzikherman commented Nov 16, 2018

joeyAghion commented Nov 16, 2018

dleve123 commented Nov 19, 2018

dleve123 commented Nov 19, 2018 •

edited

orta commented Nov 19, 2018

joeyAghion commented Nov 20, 2018

ashfurrow commented Nov 21, 2018

mzikherman commented Nov 21, 2018

ashkan18 commented Dec 19, 2018

izakp commented Jan 17, 2019

mbilokonsky commented Jan 17, 2019

izakp commented Jan 25, 2019 •

edited

ashkan18 commented Jan 25, 2019

orta commented Feb 18, 2019

RFC: Ensure dependent services are set up in production before allowing a PR requiring schema changes to be merged #109

RFC: Ensure dependent services are set up in production before allowing a PR requiring schema changes to be merged #109

Comments

orta commented Nov 15, 2018 • edited

Proposal:

Roughly

Reasoning

Additional Context:

mbilokonsky commented Nov 15, 2018

joeyAghion commented Nov 16, 2018

mzikherman commented Nov 16, 2018

joeyAghion commented Nov 16, 2018

dleve123 commented Nov 19, 2018

dleve123 commented Nov 19, 2018 • edited

orta commented Nov 19, 2018

joeyAghion commented Nov 20, 2018

ashfurrow commented Nov 21, 2018

mzikherman commented Nov 21, 2018

ashkan18 commented Dec 19, 2018

izakp commented Jan 17, 2019

mbilokonsky commented Jan 17, 2019

izakp commented Jan 25, 2019 • edited

ashkan18 commented Jan 25, 2019

orta commented Feb 18, 2019

Resolution

Level of Support

Additional Context:

Next Steps

Exceptions

orta commented Nov 15, 2018 •

edited

dleve123 commented Nov 19, 2018 •

edited

izakp commented Jan 25, 2019 •

edited