Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(functions): support environment if present #4632

Merged
merged 2 commits into from
Oct 6, 2023

Conversation

phisco
Copy link
Contributor

@phisco phisco commented Sep 15, 2023

Description of your changes

Fixes #4600

Even with the whole conversation going on about environment and EnvironmentConfigs, I think this simple change will allow us to accomodate whatever we decide to do with that

Environment being added to both the observed and desired state allows to first pass whatever environment a Composition built before calling the function pipeline, but also using it as a variable space where functions can write stuff to be used by following functions.

I have:

  • Read and followed Crossplane's contribution process.
  • Added or updated unit and E2E tests for my change.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR, if necessary.
  • Opened a PR updating the docs, if necessary.

@phisco phisco requested a review from negz September 15, 2023 12:17
@phisco phisco requested review from turkenh and a team as code owners September 15, 2023 12:17
@phisco phisco marked this pull request as draft September 15, 2023 12:20
@phisco
Copy link
Contributor Author

phisco commented Sep 15, 2023

not sure whether we want to also apply patches at spec.environment.patches here, I guess that would be a P&T leakage into function based compositions, the same thing can be done by a function using the provided environment and composite resource.

@phisco
Copy link
Contributor Author

phisco commented Sep 15, 2023

The other option we could have considered was injecting into the observed state a fake EnvironmnentConfig, but I think such implementation would have the following issues with that:

  • we would bind functions to EnvironmnentConfigs, while I think there is value into the separation between EnvironmentConfig and the in-memory environment.
  • we would miss on the in my opinion useful "variable-like" capabilities I mentioned above as we don't allow functions to write to the observed state and we discard any change across different function calls.

Copy link
Member

@negz negz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phisco! I agree this is probably good enough for now.

@@ -90,6 +90,9 @@ message State {

// The state of any composed resources.
map<string, Resource> resources = 2;

// The environment passed to the Function pipeline.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some kind of THIS IS AN ALPHA FIELD warning like we do on our CRDs? Something that communicates that this field is more at-risk of changing than the rest of the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As to how it might change - if we did make environment an array or map of resources I could imagine reflecting that here. So something more like:

map<string, google.protobuf.Struct> environment = 3;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I will update it 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the alpha warning as suggested 👍

About the type of the environment field, wouldn't a google.protobuf.Struct already accomodate such a change? currently the environment is fully unstructured:

// Environment defines unstructured data.
type Environment struct {
	unstructured.Unstructured
}

But if we decided to have two separate sections, "data" and "resources" as I'm suggesting, we would just have to populate the Environment in a different way, but we shouldn't need to change the code here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we wouldn't have to, in that a google.protobuf.Struct can pretty much model anything. It does however seem to me that if we know the environment will always have a certain shape (i.e. data and array of resources) it would be a better experience for folks building functions for us to reflect that shape in our 'schema' as much as possible.

@negz
Copy link
Member

negz commented Sep 18, 2023

Environment being added to both the observed and desired state allows to first pass whatever environment a Composition built before calling the function pipeline, but also using it as a variable space where functions can write stuff to be used by following functions.

@phisco, elsewhere observed and desired work roughly like:

  • Observed is read-only and identical for every Function in the pipeline. Crossplane populates it as fully as possible.
  • Desired is read-write. It starts out completely empty, and the pipeline produces it as essentially an 'overlay' on observed state to be applied.

Would the "desired environment" work the same way? i.e. Functions just set the keys they want to set? What would we do with the desired environment at the end of the pipeline run? Discard it?

@phisco
Copy link
Contributor Author

phisco commented Sep 18, 2023

Would the "desired environment" work the same way? i.e. Functions just set the keys they want to set? What would we do with the desired environment at the end of the pipeline run? Discard it?

Yes, for consistency I would expect it to be exactly the same as the observed/desired states. But then a downstream function would have to "merge" the observed and desired environment to access any info. The only other option would be to have it at the same level as the states in both the request and response, with its own semantic, more similar to a context, rather than the observed/desired state.

Currently the state is discarded at the end of the reconciliation loop for P&T, so yes, I'd say for the moment we should stick to that.

@negz
Copy link
Member

negz commented Sep 18, 2023

The only other option would be to have it at the same level as the states in both the request and response, with its own semantic, more similar to a context, rather than the observed/desired state.

I'm maybe leaning toward this option, but only very slightly. Mostly because it's a bit odd that it's not really desired environment state, but kinda something else. 🤔

@phisco
Copy link
Contributor Author

phisco commented Sep 18, 2023

Yeah, I agree it feels odd. I will update the PR tomorrow with a dedicated structure for both request and response, using a map as you suggested. I'd arbitrarily already put it under a "data" top level key, just to accommodate future changes such as the ones we are discussing ("resources"), wdyt?

@phisco
Copy link
Contributor Author

phisco commented Sep 18, 2023

@negz, pushed some changes, let me know wdyt 🙏

Comment on lines 220 to 242
if req.Environment != nil {
d, err := AsStruct(req.Environment)
if err != nil {
return CompositionResult{}, errors.Wrap(err, errEnvAsStruct)
}
env = &v1beta1.Environment{
Data: d,
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@negz how would you feel if we did not pass the computed environment and therefore effectively ignored the whole spec.environment stanza for Function based compositions, but we still added the in-memory environment here and we worked to implement the current EnvironmentConfig retrieval and merging logic as a function, similarly to what the patch-and-transform function is doing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel pretty genuinely ambivalent about that, haha.

I do really like the idea of moving the "bootstrapping" of the environment into a Function. This gives us more room to experiment there, and allows folks to opt out of the complexity if they don't need it.

What I'm not so sure about is the general pattern of Functions reading stuff from the API server and passing them to the Function pipeline. (I presume a Function would select and read EnvConfigs). Is that something we should allow? Or do we want Crossplane to be the sole broker of everything read from and written to the API server. I'm not concretely sure of what risk Functions reading/writing to the API server poses, more of an intuition at this point.

If we did allow this we'd probably need to update the spec: https://github.com/crossplane/crossplane/blob/5797021/contributing/specifications/functions.md#runtime-environment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding side effects could make everything more complex, e.g. how do we handle that from xrender? However I feel it's going to be hard to prevent, people can always add a service account and use that to do whatever they want, so I guess we should just embrace it and make it a thing we take into account. For example, we could add an annotation to let xrender know a function should be skipped or add conditionals as we were discussing in some other thread, so that one can render compositions skipping the retrieval steps and passing stuff into the environment instead of letting it be retrieved from the outside world.

Copy link
Member

@turkenh turkenh Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding side effects could make everything more complex, e.g. how do we handle that from xrender? However I feel it's going to be hard to prevent, people can always add a service account and use that to do whatever they want, so I guess we should just embrace it and make it a thing we take into account.

I am pretty skeptical to the idea of natively supporting direct access to Kube API from a function. This is mostly because, this would open a door invalidating most of the opinionated framework that we're trying to provide with RunFunctionRequest/RunFunctionResponse. If a function could access to the Kube API natively, why should it bother getting resources from Observations or specifying the desired state instead of just getting/patching the resources directly. I am imagining that there will be a tendency to open that door with the first limitation in the framework that somebody hits and never get back to track even we resolve that limitation afterwards. There is also a whole new complexity that would be introduced due to the RBAC. How and who will define the RBAC for such a service account, if it is users, then this will still require manual work and passing a service account wouldn't solve the problem. I don't want to think about Crossplane also taking care of RBAC for such a case 😅

However, considering the Kube API is just yet another API and we are planning to enable passing credentials to functions, power users could still communicate with the Kube API if/once they really have to.

This feels quite similar to what one can do with provider-kubernetes and provider-helm. They can be used to create resources (other than the providers own CRDs and a limited set of other types) on the same Kubernetes cluster if a provider config provided accordingly. However, we didn't enable this as a native functionality for all Crossplane Providers, and working fine so far.

Copy link
Member

@turkenh turkenh Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me invalidate a bit what I wrote above 😄

There is also a whole new complexity that would be introduced due to the RBAC. How and who will define the RBAC for such a service account, if it is users, then this will still require manual work and passing a service account wouldn't solve the problem. I don't want to think about Crossplane also taking care of RBAC for such a case 😅

I just remembered there was a functionality in the Provider Meta definition that no provider uses currently (to the best of my knowledge), which is PermissionRequests. I am not sure if it got broken at some point but if we reflect that functionality to Function meta package, an environment-merger function should be able to give a list RBAC policy rules in its package definition to access EnvironmentConfig resources. And this doesn't feel that bad as it will be something defined at the package level and have a very limited scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, having it already implemented makes it feel less bad, isn't it? 😂

Not really 🙂
For me, the key here is RBAC and how/where it is defined.
Initially, I was thinking it as if it would enable/promote doing anything on the Kube API and functions evolving to k8s operators.
However, a permission model where only an explicit/limited set of resources defined by function author is accessible makes it more controlled as you are stating and not harmful to our opinionated framework

Copy link
Member

@negz negz Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm feeling aligned with what @turkenh wrote in #4632 (comment).

If we embrace letting Functions write to and from the API server we break the nascent contract we have where Crossplane and controllers are responsible for all API server interactions. Per #4723 I think this contract is an invaluable property of Functions.

As a concrete example, a Function that read EnvConfigs from the API server would not work with xrender. Or rather, you'd need to somehow plumb it up to an API server to allow it to get access to the types it needs.

I don't think we have to submit to inevitability and try to support these Fucntions just because it's hard to stop folks from writing a Function that talks to the API server if they want to. For example probably just saying "your Function isn't compliant with the spec and thus may not work in various contexts" might be enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly it means that while Functions can be used to modify the current environment/context, and could populate the environment from non-K8S API sources, they should not populate it from K8S API resources and that functionality needs to stay where it is (for now) in the Composition, and the discussion in #4583 should stay out of the realm of Functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@negz

As a concrete example, a Function that read EnvConfigs from the API server would not work with xrender. Or rather, you'd need to somehow plumb it up to an API server to allow it to get access to the types it needs.

Same issue would apply to object retrieval defined at composition level, even more with what we are discussing in #4583.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phisco I don't think it would be quite the same. xrender would need to be aware that a Composition could reference EnvConfigs and have a flag like --env-configs or something to pass in the EnvConfigs, similar to the --observed-resources flag.

That all said, maybe there's a middle ground here where a Function could ask Crossplane to provide the pipeline with "extra" observed resources. I wrote up a sketch of how that might work in #4739.

apis/apiextensions/fn/proto/v1beta1/run_function.proto Outdated Show resolved Hide resolved
Comment on lines 220 to 242
if req.Environment != nil {
d, err := AsStruct(req.Environment)
if err != nil {
return CompositionResult{}, errors.Wrap(err, errEnvAsStruct)
}
env = &v1beta1.Environment{
Data: d,
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel pretty genuinely ambivalent about that, haha.

I do really like the idea of moving the "bootstrapping" of the environment into a Function. This gives us more room to experiment there, and allows folks to opt out of the complexity if they don't need it.

What I'm not so sure about is the general pattern of Functions reading stuff from the API server and passing them to the Function pipeline. (I presume a Function would select and read EnvConfigs). Is that something we should allow? Or do we want Crossplane to be the sole broker of everything read from and written to the API server. I'm not concretely sure of what risk Functions reading/writing to the API server poses, more of an intuition at this point.

If we did allow this we'd probably need to update the spec: https://github.com/crossplane/crossplane/blob/5797021/contributing/specifications/functions.md#runtime-environment

// The environment to be passed down the Function pipeline.
message Environment {
// The data section of the environment.
google.protobuf.Struct data = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Functions use keys in this data struct to pass information to each other?

My understanding from #4731 (comment) is "no". If that's correct, I think it would be useful to bring that use case into scope for this PR. Would we add a new field to the Environment message for that?

Copy link
Member

@negz negz Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could imagine foregoing the Environment message and just having something like:

message RunFunctionRequest {

  // Fields omitted for brevity

  google.protobuf.Struct environment = 5;
}

You could then put the merged EnvironmentConfigs under a well-known field within the environment struct like apiextensions.crossplane.io/config and let Functions use other fields to pass information without worrying about collisions with what was loaded from the EnvironmentConfigs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Functions use keys in this data struct to pass information to each other?
What do you mean? "data" is what we get today merging EnvironmentConfig and patching to and from the environment, if we were to add other fields to the environment these would be on the same level as the "data".

environment.data is exactly what you were putting at environment[apiextensions.crossplane.io/environment]. I would let people use FromEnvironment patches adding the "data" prefix to fields, so that if we ever added something in the future it could still be reachable by FromEnvironmentPatches in the patch and transform function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would let people use FromEnvironment patches adding the "data" prefix to fields

We're on the same page here pattern wise, i.e. nesting what we would call "the environment" today under a key in... the environment. I think we need a more descriptive, specific names than "environment" and "data" though. If my EnvironmentConfigs and defaults become the "data" key in the environment, what else would there be? Would the other "keys" be something other than data?

Perhaps if we really want to stick with "environment" as the top-level name environment.data could become environment.config? or environment[apiextensions.crossplane.io/config]?

If we ever added something in the future it could still be reachable by FromEnvironmentPatches in the patch and transform function.

I feel that v1.14 we should have a story for Functions to write to the "environment" for v1.14. If we merge this PR as-is it seems like their only option will be to write to environment.data, which means they could potentially collide with the object that Crossplane loads into that struct from EnvConfigs.

I lean toward the top level of the "environment" being a struct or map because then a Function can "claim" an arbitrary field/key within that top-level struct/map without needing an update to the protobuf schema to add more fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't feel great about having a well-known key we use "officially" to inject the current environment into an unstructured object.

I'd be more comfortable saying that the environment (or context) is just an unstructured object and letting functions define their convention, e.g. an EnvironmentConfig aggregating function defines a key it will put the aggregated result in some key, which will be the same key expected by the patch and transform function, ignoring the whole spec.environment section in Pipeline mode. This was what I was hinting at here: #4632 (comment).

My point is, if stuff is official it should be explicitly represented in the schema, otherwise let's fully go the other way and let functions define their own conventions, just providing them a way to pass data through the pipeline outside the desired state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to find a more descriptive name than "data", though.

If we do go with a map / arbitrary keys, I think it makes sense to reserve and use certain key namespaces for Crossplane (core, not Functions) to write to. Similar to how kubernetes.io/ and k8s.io/ annotation key prefixes are reserved for use by Kubernetes components, and GITHUB_ env vars are reserved for use by GitHub in GitHub Workflows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't feel great about having a well-known key we use "officially" to inject the current environment into an unstructured object.

Sorry, GitHub didn't show me this comment until I refreshed the page.

I think I get your reservations - explicit schemas are usually best. That said, your analogy of the environment being like environment variables, and looking at how other systems use environment variables is making me feel better about the pattern.

If we say that Functions can effectively communicate by reading from and writing to shared "environment variables", i.e. top-level keys in the environment, is it so bad if Crossplane "bootstraps" some of that environment, like GitHub workflows does with their env vars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't feel super strongly against that and I can see you point, so I would be fine with annotations-style reserved keyspaces. I'll revert the whole environment to be just a struct tomorrow. Now the only point is whether we want to wire the current EnvironmentConfig retrieval from configurations or just accept that people are going to access the API server (or any other external system) from functions and just deal with it, which is the topic of the other thread above, so let's continue there if you agree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: 2b28d9a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

called it context as you were doing in #4731, as it didn't make much sense to put the current environment in the environment but under a key .../environment...

Signed-off-by: Philippe Scorsolini <p.scorsolini@gmail.com>
…known fields

Signed-off-by: Philippe Scorsolini <p.scorsolini@gmail.com>
@phisco
Copy link
Contributor Author

phisco commented Oct 6, 2023

I think this can be merged as is, because if we ever decided to embrace functions accessing the API server as we are discussing in the thread above, we will just have to remove the environment fetching wiring in here and move the logic to a dedicated function without any major disruption for users.

@negz
Copy link
Member

negz commented Oct 6, 2023

@phisco Thanks, I agree we can merge this.

I do like your idea of letting function-patch-and-transform patch to and from (what we'd now call) the whole context, rather than just the environment key. I could see that being solvable in function-patch-and-transform by introducing a new PatchFromContext set of patches and leaving the PatchFromEnvironment ones for backward compatibility.

@negz negz merged commit 4120759 into crossplane:master Oct 6, 2023
14 of 16 checks passed
negz added a commit to negz/function-sdk-go that referenced this pull request Oct 13, 2023
This is the Go SDK end of crossplane/crossplane#4632

Signed-off-by: Nic Cope <nicc@rk0n.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support EnvironmentConfig when using Composition Functions
4 participants