Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide status field with a reference to the underlying GCP resource #295

Open
yhrn opened this issue Oct 29, 2020 · 7 comments
Open

Provide status field with a reference to the underlying GCP resource #295

yhrn opened this issue Oct 29, 2020 · 7 comments
Labels
enhancement New feature or request

Comments

@yhrn
Copy link

yhrn commented Oct 29, 2020

What I would like is a status field with the same name for all resources that is populated with the value one would use in the external field of a resource reference.

The use case here is to be able to build operators for 3p systems resources where the underlying system need to interact with the data plane of GCP resources referenced by the 3p systems resources.

Today this information is available for some KCC resource types but not in a consistent way (I realize this is because it reflects the underlying APIs):

  • PubSubSubscription - status.path
  • PubSubTopic - n/a
  • IAMServiceAccount - status.email
  • StorageBucket - status.url
  • KMSCryptoKey - status.selfLink
  • ComputeNetwork - status.selfLink (here the link is the full URL starting with https://www.googleapis.com/compute/v1/, which is typically not the case for other resource types with selfLink)
  • BigQueryDataset - status.selfLink
  • SpannerDatabase - n/a
  • BigtableTable - n/a

@spew - I hope this case covers what we discussed earlier.

@yhrn yhrn added the enhancement New feature or request label Oct 29, 2020
@spew
Copy link
Contributor

spew commented Oct 29, 2020

Yes it does, and I think it would include the service-scoped ResourceName, i.e.

//compute.googleapis.com/projects/...
//pubsub.googleapis.com/projects/...

@yhrn
Copy link
Author

yhrn commented Oct 29, 2020

Wouldn't it be a nice symmetric property if this field matched what is expected for external fields for resource references? Maybe there service part could be a separate field? Either way, this would not be a big deal to me as long as it is consistent.

@spew
Copy link
Contributor

spew commented Oct 29, 2020

Probably what is in external should include that as well, i.e.
//pubsub.googleapis.com/projects/my-project/topics/my-topic

As that is all the information needed to properly identify a resource.

@fsommar
Copy link

fsommar commented Nov 2, 2020

Probably what is in external should include that as well, i.e.
//pubsub.googleapis.com/projects/my-project/topics/my-topic

It would be good if external at least accepted that format, but that it could be omitted in the general case since I believe the resource service is (or can be) implied from the kind when available.

@toumorokoshi
Copy link
Contributor

@yhrn Would you mind giving some context on how you would use this value, once given? We discussed a bit internally and there's a few ways that such a standardization could become valuable:

  1. as a way to allow other operators to pipe the resource reference into the creation of other resources
  2. if our external references standardize with GCP and use the resource name format, they can be used to identify those resources in other systems such as cloud asset inventory.

The pain that you can't get this value in a programmatic way is clear. I think the question is more around how this value will be used. If it's values that will consumed by config connector exclusively, the values just have to have meaning to config connector. If the values are intended to be consumed by other GCP services, then we need to standardize across GCP (and resource name referenced above is our best bet to do so now).

We'd prefer option 2, but even consuming resource names are not possible across all GCP services, so even if config connector standardizes that may not mean it would work for your specific use case.

@yhrn
Copy link
Author

yhrn commented Nov 3, 2020

It is more 2 than 1 but unfortunately it is not that clear cut. The general use case I want to solve for is to be able to pass the reference to the underlying GCP resource to a non-KCC aware system that will interact with the resource.

One example is that we have the concept of Data Endpoints, which is an abstraction that adds some internal functionality related to e.g. discovery, partitioning, retention, schema and access control. A data endpoint can be backed by different data sources, e.g. a bucket, a BQ dataset or a topic. So a data endpoint CRD needs to reference the backing GCP resource and the operator for it needs to be able to pass an "external" reference to it to the "data endpoint system" that will need to access the resource.

Another example is our event deliver system where the CRD for an event type will specify an IAM Service Account that is authorized to produce events. The email address of that SA is then propagated to the service that implements the entry point to the data plane of the event delivery system and used there for authorization.

The problem is of course the lack of standardized naming that you mention, for a IAM service account we typically want the email address, for a GCS bucket the URL for a P/S topic the path (which I think aligns with https://google.aip.dev/122). This is why I formulated the original request as "the value one would use in the external field of a resource reference" because there you seem to have taken this approach, i.e. use the most common identifier. I realize that the resource name format approach would be nice and consistent but also less convenient for many use cases.

On the other hand it will never be perfect like you say so in some cases we might have to do some translations anyway. The most important aspect is that it is possible to do this translation of the identifier we get from KCC to the format required for the specific use case in an offline manner further downstream where the requirement is well known. For example, as long as the identifier contains the project ID and the SA name we can construct the email address but we don't need to do that in the operator where we are just dealing with generic KCC object references.

Maybe two different status fields could be an option, one that always contains the standardized name and on the would contain the "most common identifier", which would contain the email address for an SA, URL for a bucket, or if there is no such identifier it would just contain the standardized name.

@toumorokoshi
Copy link
Contributor

Thanks for the clarification! That helps.

I think it's easier for us to guarantee exposing an identifier in the status field that can be used by config connector. Although I don't think we can guarantee the string itself will be usable in other services, I think we can guarantee that it can uniquely identify a resource of that kind (by definition of an external reference that's a requirement).

Let me bring this up with the team a bit more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants