Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-180 default breaking changes #1076

Merged
merged 5 commits into from
Jun 21, 2023

Conversation

slevenick
Copy link
Contributor

Add specific examples of breaking semantic changes for default values & serialization of default values

Prompted by issues like:

@slevenick slevenick requested a review from a team as a code owner April 24, 2023 18:28
@toumorokoshi
Copy link
Contributor

Adding @jskeet explicitly, who helped look at the internal version and I think would have good perspective on this.

Copy link
Contributor

@toumorokoshi toumorokoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have one question I wanted to think through a bit, but the overall PR looks good and once we get that comment resolved LGTM.


Changing the default value is considered breaking and **must not** be done. The
default behavior for a resource is determined by it's default values, and this
**must not** change across minor versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to sanity check with @loudej on this one.

The motivation and client-friendliness makes total sense to me. At the same time, I'm wondering about how we do enable use cases like modifying the default disk type / instance type for compute APIs. New instance types come out all the time, and using them as the default helps GCP customers get more performance / $.

One way I could imagine that being solved is by having clients always intern the defaults, and having servers never set the default. WDYT?

Maybe if that was the solution then this guidance would still hold (don't change the server-side default).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are direct API clients relying heavily on defaults? Changes [outside of controlled windows] are probably more harm than good for IaC. I would consider having gcloud/Console change defaults clientside at will, and updating IaC during breaking change windows (or never).

If volatility across all clients is desirable, a double simplex (whether client or server side, although server side would be a little nicer imo. with obvious bias as a client owner.) could be viable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way I could imagine that being solved is by having clients always intern the defaults, and having servers never set the default. WDYT?

So essentially mark all fields with defaults as required instead and bake the defaults into clients? It could work, but may have other usability issues like requiring a much larger set of values to be specified if a user is working with an API outside of a client.

Overall I think the suggestion of a double simplex can improve this situation greatly, as it would remove the direct connection between the unset value in an IaC client and the server-set default. Without that connection I expect we wouldn't see the issues that cause data loss when these defaults change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple points here.

  1. Default values can't really change without a new major version unless the default value was always specified to be a dynamically chosen default, in which case the value must be pinned for a specific resource. It shouldn't change once the resource has been created. Double simplex is probably the best option for such cases, but this approach shouldn't be applied to every field.

  2. The values need to be easily discoverable, such as for policy enforcement. In Kubernetes we didn't want admission controllers, asynchronous controllers, and other clients to need to try to figure out what default values were or how they were set or whether they were even defaults. Thus, Kubernetes explicitly sets all values that need to have values. However, that requires clients to gracefully accommodate that behavior. Server-side apply was necessary for Terraform integration. One idea we've discussed is (optionally) returning these values in validate_only responses and with a special Get request parameter.

  3. Always setting the defaults in clients seems like a recipe for inconsistency and maintenance problems to me. Plus it doesn't work for attributes that are newer than the clients. Effectively it creates a "blueprint", regardless of what type of client/surface was used to create the resource and whether it was hardcoded or data-driven.

We need to provide more complete guidance around unset values, default values, dynamically generated / allocated / assigned values, and other values transformed by the service (semantic strings, associative lists, etc.), but I'm comfortable with the statement that changing default values or explicitly setting default values when they weren't previously populated explicitly are non-backward-compatible changes within the same major version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll raise my hand and say I don't know what's meant by "double simplex" here.

Just to establish baselines, can we all agree that @bgrant0607's statement of:

It shouldn't change once the resource has been created.

... is correct? If a resource has been created, and the server has populated the field, it mustn't change to some other value later. I suspect we're all happy with that.

My guess is that the other side of Brian's point 1 (the viability of "if the client doesn't specify a value for this field, the server may choose a default, and that default may vary over time) will be less unanimously agreed. Would it be okay if the server guaranteed to always use the same value if the client specifies the same API version? (Looking to future work.)

I don't quite follow the point around policy enforcement, but I do agree it would be nice to be able to access the defaults... although that could be tricky if "default for field X depends on field Y".

Putting everything client-side does feel problematic - if we can agree on that much, we don't need to investigate the details of that further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to summarize my stance: I don't agree with the principle, I just worry it's not defensible because this happens often enough to be a practice. e.g. default instance type for VM creation for compute.

That said it looks like GCE hard-codes disk types in the client. And instance types don't have defaults at all. I guess I was thinking of AWS.

I'm still fairly skeptical of enforcement but I agree with the rationale here. We can try enforcing this guidance for now and see how it goes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCE doesn't follow the AIPs. They explicitly set defaults in the resources, more like Kubernetes.

Something like the machine type or disk interface is actually in the category of "dynamic defaults". It's dynamically selected, allocated, computed/generated, etc. The "default" may change slowly in practice, but it could be different across regions or set via project-level settings or something as well as changing over time.

"double simplex" was used in Greg's exploratory doc to describe the pattern of 2 fields, one input and one output. The output field would always return the effective value used by the system. The input value would respect the rules about not changing what was provided by the client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if the server could change a default value without a version change, then it MUST somehow persist the selected value in the resource, and return that value somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jskeet re:

Would it be okay if the server guaranteed to always use the same value if the client specifies the same API version?

Likely yes, but it may depend on the exact details of the API versioning scheme.

The double simplex idea is an interesting way to mitigate these problems going forward, but I don't think it has much bearing on this proposal which is targeted at preventing breaking changes in existing APIs. We do need to come up with a plan for better handling of default values overall for new APIs.

For existing APIs though we have limited options and I believe the best path is to stop changes in server-side default values for existing APIs. Clients can modify client-set default values as appropriate to accommodate changing preferences and more efficient instance types in a non-breaking way. APIs can also change defaults during a major version increment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having reread all of this, I'm pretty sure that:

  • We need a way of defaults changing over time (think encryption algorithms etc)
  • Some clients (e.g. Terraform) need a level of stability

Personally I'd be okay with having this guidance for now, with an understanding that when client-specific versioning happens, we change the guidance to "changing the default requires a new version, and the server will still honor the previous default when the client specifies the old version". Terraform can then consider "we've changed default" to require a new major Terraform provider version.

I think that's the best compromise we'll be able to reach, but I'd like to check that both @slevenick and @toumorokoshi are okay with it.

aip/general/0180.md Show resolved Hide resolved
aip/general/0180.md Outdated Show resolved Hide resolved

Changing the default value is considered breaking and **must not** be done. The
default behavior for a resource is determined by it's default values, and this
**must not** change across minor versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll raise my hand and say I don't know what's meant by "double simplex" here.

Just to establish baselines, can we all agree that @bgrant0607's statement of:

It shouldn't change once the resource has been created.

... is correct? If a resource has been created, and the server has populated the field, it mustn't change to some other value later. I suspect we're all happy with that.

My guess is that the other side of Brian's point 1 (the viability of "if the client doesn't specify a value for this field, the server may choose a default, and that default may vary over time) will be less unanimously agreed. Would it be okay if the server guaranteed to always use the same value if the client specifies the same API version? (Looking to future work.)

I don't quite follow the point around policy enforcement, but I do agree it would be nice to be able to access the defaults... although that could be tricky if "default for field X depends on field Y".

Putting everything client-side does feel problematic - if we can agree on that much, we don't need to investigate the details of that further.

aip/general/0180.md Outdated Show resolved Hide resolved

```proto
// A representation of an automobile
message Automobile {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: this PR feels like it's mostly thinking about clients creating new resources... but what about default values populated when new fields are introduced?

Suppose Automobile originally didn't have a wheels field at all, and I create an automobile with name X.

If I fetch X later, after the server has been updated to introduce wheels, with a later wheels-aware client, should the field:

  • Not be populated
  • Always populated with the value that it would have if I created a new automobile now without specifying the number of wheels
  • Potentially be given a value different to the one that would be created for a new automobile

The ability to have different "migrated" vs "newly created" defaults could be useful, I suspect - would it cause Terraform issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default values for new fields that are introduced are the cause of the types of bugs in hashicorp/terraform-provider-google#14264 (comment) and hashicorp/terraform-provider-google#14203 (comment)

The cases that cause issues for Terraform are when a field has different behavior depending on when the resource was provisioned.

Often this happens when a Terraform provider developer adds support for a new field and assumes that the field is always populated in the API response due to a server-set default value. This may be true for all instances of the resource created after the field was introduced, but the field may not be returned for older resources. The developer typically doesn't have access to older resources and so can't tell if that's the case until a public release happens and customers run into issues.

To me this is pretty clearly a failure of the API to have a consistent experience. Clients shouldn't need to be aware of different types of behavior for a field depending on the date the resource was provisioned. Having consistent behavior on the field by always returning the server-set value even if the default for that value may have changed is important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clients shouldn't need to be aware of different types of behavior for a field depending on the date the resource was provisioned.

I'm not so sure about that. I agree with it as a general rule, but I suspect there are plenty of cases where it's actually really useful. The example that comes to mind is around encryption:

  • You don't want an encryption type for a single resource to vary over time
  • It is useful to be able to say "If you don't specify an encryption type when creating the resource, we'll use the strong currently-available one"

(The same might be true of some other quality metrics - audio/video/image encoding, for example. Today's default might be effectively obsolete in 10 years... it would be really unfortunate to have to keep using it as the default when much better options are available.)

I can absolutely see that causes a problem for declarative clients - but I think it's worth acknowledging as a scenario where there are conflicting benefits.

Having consistent behavior on the field by always returning the server-set value even if the default for that value may have changed is important.

Definitely agree on this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we do need to have static defaults at the API level. Different clients have different assumptions and tolerance for changes in defaults, and what works for a client like gcloud may not work for declarative clients.

As a broad assumption we should make it true that the same API request creates the same resource on an API regardless of the time when that request is made. Things like new features will happen, but the functionality of the resource shouldn't be impacted by changes to default values.

I propose that changes to default values should be made at the client level rather than globally at the API level. This can be accomplished by the API freezing behavior on unset values, while clients can update the default value sent as appropriate for their needs.

In practice this would involve clients like Console updating the default values fairly often, gcloud maybe a bit less often, and declarative clients the least often (though it should happen at some point!). We may want additional information on "recommended" defaults to be published by the API somewhere, so that clients can identify changes and make them on their own time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need static defaults in the API, at least for the case that started this thread, which is new fields with default values, like wheels=4. Whether/when/where default values should be returned by the API is another issue. Kubernetes always returns the default values, but it also deals with merging in server-side apply and strategic merge patch.

Changing a static default is a non-backward-compatible API change. That's the position we took in Kubernetes. If an API plans on changing such values, they become dynamic defaults, essentially, and the service needs to persist the selected value and return it somehow/somewhere.

Users of declarative clients generally expect the effective desired state to be deterministic, so changing a default in such a client would be non-backward-compatible there also. Anyway, defaults can effectively be set in templates/modules by the users.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need static defaults in the API

+1. New defaults ("Always populated with the value that it would have if I created a new automobile now without specifying the number of wheels") are definitely fine- and if they weren't, that'd be 100% a client problem.

In the cases we've ran into, there's been a static value for new resources after the default was introduced internally (~O(weeks - months prior to release)) and a different value returned for older resources (typically returning null due to missing backfills, which ~O(months) later gets corrected

The ability to have different "migrated" vs "newly created" defaults could be useful, I suspect - would it cause Terraform issues?

Yeah 🙁 . Defaults are static in Terraform, mostly, and the cases where they aren't aren't easy to use.


#### Serializing defaults

APIs **must not** change the way a field with a default value is serialized. For
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable - I suspect it'll all get muddier when we've thought more about field presence, but I'm okay with this description as it is for now.

Changing it to "resource field" might be worth while though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the difficulty here comes from inconsistent behavior for serializing a default value (default as in runtime-set that is not defined in the proto) so I actually think that field presence would fix it. In a world where field presence is meaningful the problem goes away because it's clear that what unset means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you can't have default values for fields where presence is meaningful. At least that's my understanding

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I can imagine a case where field presence is useful in create/update (so that the client can say whether they're trying to update it) but where you always want the server to specify a value when fetching, so the consumer can know what value is in effect. That makes it sort of "semi-optional" - which can't easily be expressed just in proto terms, of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If field presence is meaningful in create/update then declarative clients need to know what value was actually set (or unset) when they retrieve the resource. "semi-optional" is not a feasible solution, and must be prohibited from existing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you go into more detail about this? Suppose we have a situation where the client doesn't specify a value, and the server populates it with a default and returns that value. So long as the result is exactly the same as if the client specified that value to start with, is there still a problem? I may well be missing something. If you could give an example of a problematic flow, I'll see whether we're talking about different things, or whether there's an aspect I was missing.

My understanding of this "semi-optional" field is as such:

If the field is specified by the client it works as normal.
If the field is unspecified by the client, a server-set default is returned

There are a couple potential behaviors for when a client sends an empty value during a PATCH request:

  1. Server ignores the field
  2. Server sets the field to the empty value
  3. Server sets the field to the default value it would have chosen if the resource was created with the field unspecified

1 is just a normal field where presence is not meaningful
2 is inconsistent with the behavior of the field on create, which doesn't make sense. In order to create a resource with an empty value for this field the client would need to create and then update immediately to remove the field value
3 is implementing a sort of "reset" functionality for the field, which doesn't seem to fit standard CRUD. It could make sense to have on a custom method if the functionality is needed for the API, but doesn't fit into standard resource behavior that should be implemented in clients

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 isn't quite the same as a normal field - because if the field has the optional keyword (i.e. presence is available) then the server can tell the difference between the client sending a value of 0 and not sending anything at all. I think it would be reasonable to treat not sending anything as "don't change this", if the field is always effectively present.

(There's a harder decision around how to send an update which explicitly wants to clear a field when that's a meaningful state.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah that's an odd one. It's similar to a normal field with a server-set default, but adds the zero value as a settable value. I'm concerned with how we would communicate this to a client, as the optional keyword will generally signify that a field can be unset, but in this situation sending unset would have an entirely different behavior.

What would the overlap be between the unset value meaning "don't change" and setting an updateMask for that field? What if these conflict, where a user sends unset but sets the updateMask

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the overlap be between the unset value meaning "don't change" and setting an updateMask for that field?

Not as well defined as I'd like, basically. (That tends to be the case for anything to do with FieldMask, unfortunately.)

What if these conflict, where a user sends unset but sets the updateMask

This could be the signal used for "please clear the field" - which would be invalid in the case I'm thinking of, where the field always must have a value, it's just it doesn't have to be specified by the user.

Taking a step back, "The user doesn't have to set this explicitly, but 0 is a valid value" sounds like a reasonable pair of requirements to me. It's unclear how important it is as a pair of requirements, mind you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a step back, "The user doesn't have to set this explicitly, but 0 is a valid value" sounds like a reasonable pair of requirements to me.

Yes I agree this is a reasonable behavior for a field. I think the problem is that we don't have a clear way to handle these types of fields currently. We would need a revamp of FieldMask & existing functionality before this can be widely adopted

@shwoodard
Copy link
Contributor

Just curious why we're using cars and not books here?

@slevenick
Copy link
Contributor Author

Just curious why we're using cars and not books here?

Happy to change it to books if that's preferred. Is there an analog to a default value like "wheels" on books? Genre maybe?

@shwoodard
Copy link
Contributor

Just curious why we're using cars and not books here?

Happy to change it to books if that's preferred. Is there an analog to a default value like "wheels" on books? Genre maybe?

int num_pages

?

@slevenick
Copy link
Contributor Author

Just curious why we're using cars and not books here?

Happy to change it to books if that's preferred. Is there an analog to a default value like "wheels" on books? Genre maybe?

int num_pages

?

That's less intuitive to me as setting a default value for the number of pages in a books doesn't seem useful. Assuming an automobile has 2 or 4 wheels if the user doesn't specify it makes more sense in my mind. Happy to change it though if you would prefer to use books

@jskeet
Copy link
Contributor

jskeet commented May 4, 2023

Just curious why we're using cars and not books here?

Happy to change it to books if that's preferred. Is there an analog to a default value like "wheels" on books? Genre maybe?

int num_pages
?

That's less intuitive to me as setting a default value for the number of pages in a books doesn't seem useful. Assuming an automobile has 2 or 4 wheels if the user doesn't specify it makes more sense in my mind. Happy to change it though if you would prefer to use books

I agree that num_pages does feel like something that would have a default value. A few other options we could consider:

  • Category (fiction or non-fiction)
  • Initial publishing region (default to US then change to GLOBAL?)
  • Illustrated (true or false)
  • Hardback (true or false)

Just a few ideas.

@shwoodard
Copy link
Contributor

Agree that num_pages was a bad suggestion. I like Jon's.

@slevenick
Copy link
Contributor Author

Agree that num_pages was a bad suggestion. I like Jon's.

Modified to use a genre enum with a default value changing from FICTION --> NONFICTION

@slevenick slevenick requested a review from jskeet May 11, 2023 18:46
@slevenick
Copy link
Contributor Author

@jskeet can you take another look at this?

Copy link
Contributor

@jskeet jskeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the text as it is (barring one extraneous message declaration), but there are aspects worth further discussion. I don't mind whether that's on the PR or in docs.


Changing the default value is considered breaking and **must not** be done. The
default behavior for a resource is determined by it's default values, and this
**must not** change across minor versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having reread all of this, I'm pretty sure that:

  • We need a way of defaults changing over time (think encryption algorithms etc)
  • Some clients (e.g. Terraform) need a level of stability

Personally I'd be okay with having this guidance for now, with an understanding that when client-specific versioning happens, we change the guidance to "changing the default requires a new version, and the server will still honor the previous default when the client specifies the old version". Terraform can then consider "we've changed default" to require a new major Terraform provider version.

I think that's the best compromise we'll be able to reach, but I'd like to check that both @slevenick and @toumorokoshi are okay with it.

aip/general/0180.md Outdated Show resolved Hide resolved

#### Serializing defaults

APIs **must not** change the way a field with a default value is serialized. For
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If field presence is meaningful in create/update then declarative clients need to know what value was actually set (or unset) when they retrieve the resource. "semi-optional" is not a feasible solution, and must be prohibited from existing.

Can you go into more detail about this? Suppose we have a situation where the client doesn't specify a value, and the server populates it with a default and returns that value. So long as the result is exactly the same as if the client specified that value to start with, is there still a problem? I may well be missing something. If you could give an example of a problematic flow, I'll see whether we're talking about different things, or whether there's an aspect I was missing.


#### Serializing defaults

APIs **must not** change the way a field with a default value is serialized. For
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I still agree with the paragraph - the server shouldn't suddenly start or stop populating it. It was the side-issue of "semi-optional".

@slevenick
Copy link
Contributor Author

@toumorokoshi there seems to be consensus that this change is viable, can I get another reviewer on this?

Copy link
Contributor

@bgrant0607 bgrant0607 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky topic, but LGTM for a start. Thanks for the persistence.

@toumorokoshi
Copy link
Contributor

@rileykarson is this ok to merge? if so I'll give it one pass myself and merge it in (since we have two API design reviewers).

Copy link

@rileykarson rileykarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!


APIs **must not** change the way a field with a default value is serialized. For
example if a field does not appear in the response if the value is equal to the
default, the serialization **must not** change to include the field with the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the examples would be easier to read as bullets and not as several paragraphs. But I won't block this PR on that.

Copy link
Contributor

@toumorokoshi toumorokoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a fourth approval just for extra-emphasis :) Thanks!

@toumorokoshi toumorokoshi merged commit 6b4e460 into aip-dev:master Jun 21, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants