Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve multi operator handling and multi tenancy model #3358

Merged
merged 17 commits into from
Jul 11, 2022

Conversation

christophd
Copy link
Contributor

Fixes #2177
Fixes #3164

This PR improves/enables the handling of multiple Camel K operators (global and local) on a single cluster in order to provide a more powerful and easy multi tenancy model for Camel K.

The changes leverage the existing concept of operator id annotations as described in https://camel.apache.org/camel-k/1.9.x/installation/advanced/multi.html. The approach introduces the camel.apache.org/operator.id annotation on resources to select the operator that should managed the resource.

The PR fixes some issues related to this approach and makes sure that one single cluster is able to handle multiple Camel K operators at the same time where each resource is only reconciled by a single operator instance.

The PR introduces following changes:

Use operator id by default

  • Set operator id with every Camel K installation (id provided as KAMEL_OPERATOR_ID EnvVar setting on the operator deployment) The default operator id is "camel-k"
  • Align integration platform name with given operator id
  • Align namespace lease name with the given operator id (so only operator instances with the same id elect a leader)
  • Make sure to use unique operator ids on a cluster
  • add checks on kamel CLI install command to use proper unique operator id
  • Introduce operator-id setting on kamel CLI when installing Camel K
  • Introduce operator-id setting on kamel CLI when running integrations
  • Enforce to use an operator-id when installing Camel K and running integrations (default="camel-k")
  • Add operator id as environment variable to integration pods (CAMEL_OPERATOR_ID EnvVar)
  • Use operator id in OLM deployment (default="camel-k")
  • Add more comfortable way to set operator id annotation on resources

Make sure to properly reinitialize/reconcile integration resources

  • Reinitialize integration when operator id annotation has changed (allows to move a resource from one operator to another by changing the annotation)
  • Remove IntegrationKit reference for integration that has a new operator id annotation (makes sure to reinitialize the integration with a new integration kit for the new operator)
  • Add operator id annotation to relevant fields that determine the integration digest hash (to determine operator id changes on the integration resource)
  • Add common utility method to get the operator id annotation from a resource
  • Backward compatibility
  • Allow default operator (id="camel-k") to handle all resources that are missing proper operator id annotations
  • Allow operators that use a proper id to reconcile resources on any namespace (regardless of operator mode global vs. local and regardless of existing namespace locking)
  • Update the integration deployment ready logic. Integration pod specs are compared based on integration digest and container image only (makes sure to determine running pods for integration properly)

Use operator id in KameletBindings

  • Introduce operator-id option to kamel CLI bind command
  • Make sure to use proper operator id when creating the binding
  • Propagate operator id to dependent resources (e.g. integration)
  • Rebuild Kamelet binding when operator id annotation changes

Create proper integration platform as part of the operator

  • Make sure that each operator instance adds its integration platform (name=operator id) if not already exists (also include OLM installations)
  • Automatically adds an integration platform during the operator start-up
  • Add integration platform viewer role so authenticated users can see the platform instances (used in kamel CLI to verify operator id usage)

Improve operator version lookup

  • Fix kamel CLI version command
  • Find local integration platform or try to search for default operator in any namespace instead of just looking for the "camel-k" integration platform in local namespace
  • Eliminate nasty "No Integration platform available in current namespace" warnings when using the kamel CLI

Fix e2e tests

  • Handle forbidden error when user lacks privileges to list all integration platforms on cluster during install command
  • Use unique operator id when installing operators in e2e test namespaces
  • Make sure to run integrations with respective operator id for selected reconciling
  • Automatically uninstall operators after the test
  • Introduce new GitHub action to install Camel K in global mode

With all these changes the user should be able to introduce multiple operators (global and local mode) in a single cluster. This is the base for proper multi tenancy capabilities in Camel K. Before the multi tenancy model worked with namespaces where the operator was able to exclusively handle resources in a single namespace. The new approach (that has already been there for quite some time but needed some attention and fixes) with operator ids selecting the resources to manage is much more powerful and overcomes the limitations in terms of rolling upgrades and selective upgrades through canary release strategies that the old approach was facing.

Existing Camel K users with the usual single Camel K operator installation will not even notice the changes as the default operator id "camel-k" has already been used as integration platform name and namespace lease locking. The default operator with id "camel-k" is allowed to also handle resources with missing operator id annotations. This makes sure that existing Camel K workload is reconciled as before.

New Camel K users and especially users that want to leverage multi tenancy capabilities on a cluster will have to properly handle the operator ids used on the cluster.

Yet there are some follow-up issues to this PR that need to be taken care of in order to make multi tenancy clusters as comfortable as possible. Here is a list of future tasks to consider:

  • Allow multiple operators in a single namespace (align operator deployment name and other resources to given operator-id)
  • Allow to rename/change the operator id via kamel CLI command and cascade changes to dependent resources (integration platform, integrations, kamelet-bindings, builds, integration-kits, etc.)
  • Deprecate/remove global vs. local operator distinction (namespace local operators are obsolete)
  • Remove secondary integration platform logic (default integration platforms will be linked to the operator by its id, custom integration platforms will have to make use to use a unique name. Resources should select the platform to use via annotations or use the default platform attached to the current operator)
  • Kamel CLI: Add operation to list integrations that are missing a proper operator id annotation (=legacy integrations)
  • Kamel CLI: Add operation to list all available integration platform instances (= operators) on the cluster
  • Remove duplicate state for integration platform
  • IntegrationPlatform as the driver for every operator installation. User adds new platform CR and a new operator is started accordingly)

Release Note

feat(operator): Improve multi operator handling and multi tenancy model

Copy link
Contributor

@squakez squakez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a deeper look at the code change later. In the while, a very first comment. Why don't we add a deprecation notice when we miss the --operator-id. It would be better to let the user know that this is going to be needed in future version. We can keep the default as "camel-k" if the user provides none.

@squakez squakez added the kind/feature New feature or request label Jun 14, 2022
@christophd
Copy link
Contributor Author

@lburgazzoli @astefanutti FYI

Copy link
Contributor

@squakez squakez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. A few duplication that we can remove for a better maintenance. I also miss some specific E2E test in order to make sure we won't break this feature eventually. For instance, I think we can introduce a test where we have 2 operators and make sure that we handle an Integration from the operator targeted in its annotation.

Before going ahead I also advice a deeper review from @astefanutti to make sure it's in line with the latest architecture designs.

pkg/apis/camel/v1/integration_types_support.go Outdated Show resolved Hide resolved
pkg/cmd/bind.go Outdated Show resolved Hide resolved
pkg/cmd/run.go Outdated Show resolved Hide resolved
@christophd
Copy link
Contributor Author

@squakez Thank you for having a look. I have reduced the code duplications. Not sure about the shortcut for the --operator-id option though.

@astefanutti @lburgazzoli can you please also have a look. I want to get this merged soonish 😄

@christophd christophd force-pushed the issue/2177/multi-operator-handling branch from 03710e2 to a095d98 Compare July 1, 2022 06:32
Copy link
Member

@astefanutti astefanutti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear what mechanism prevents an operator to use the same id as another operator in another namespace?


log.Info("Starting the manager")
exitOnError(mgr.Start(signals.SetupSignalHandler()), "manager exited non-zero")
}

// findOrCreateIntegrationPlatform create default integration platform in operator namespace if not already exists.
func findOrCreateIntegrationPlatform(ctx context.Context, c client.Client, operatorNamespace string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the logic that creates the default IntegrationPlatform in the platform trait become redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In rare cases the IntegrationPlatform that has been automatically installed with the operator may have been removed for some reason. In this case the trait will install a fresh IntegrationPlatform at integration runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be interesting to understand what these rare cases are, so that it justifies the logic duplication.

}
pl = &defaultPlatform
e.Resources.Add(pl)

// Make sure that IntegrationPlatform installed in operator namespace can be seen by others
if err := install.IntegrationPlatformViewerRole(e.Ctx, t.Client, namespace); err != nil && !k8serrors.IsAlreadyExists(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be done in pkg/install/optional.go where other viewer roles are installed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The viewer role is bound to the namespace where the integration platform is living. so I think it makes sense to do this right after the integration platform has been created. following from that I see this in both the integration platform trait and the operator optional tools install mechanism.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered an aggregating to viewer role instead with rbac.authorization.k8s.io/aggregate-to-view?

@christophd christophd force-pushed the issue/2177/multi-operator-handling branch from ccac92d to 1c922f0 Compare July 4, 2022 18:43
@christophd
Copy link
Contributor Author

regarding the operator ids: the kamel CLI install command will check for existing ids and throw errors in case of duplications. But this only works when the user that installs the operator has the permissions to see the other namespaces and its integration platforms. In OLM installation the bundle needs to make sure to use unique operator ids.

I do not see any other option to ensure unique ids across the whole cluster other than super admins receiving the errors when using duplicate ids.

@christophd christophd force-pushed the issue/2177/multi-operator-handling branch 3 times, most recently from a4f77be to 55a32cb Compare July 5, 2022 09:36
- Fix typos
- Use IsSecondary() utility method
- By default use an operator-id (default='camel-k')
- Introduce operator-id setting on kamel CLI when installing Camel K
- Introduce operator-id setting on kamel CLI when running integrations
- Make sure to use an operator-id when installing Camel K and running integrations
- Set camel.apache.org/operator.id annotation on resources managed by operator with given id
- Add operator id as environment variable to integration pods
- Use default operator id in OLM deployment
- Add more comfortable way to set operator id annotation on resources
- Use operator id when creating the operator lease so only operator instances with the same id elect a leader
- Reinitialize integration when operator id annotation has changed
- Remove IntegrationKit reference for integration that has a new operator id annotation
- Add operator id annotation to relevant fields that determine the integration digest hash
- Add common utility method to get the operator id annotation from a resource
- Allow default operator (id="camel-k") to handle legacy resources that are missing proper operator id annotations
- Allow operators that use a proper id to reconcile resources on any namespace (regardless of operator mode global vs. local and regardless of existing namespace lease)
- Make sure to not install same operator id twice
- Users must use --force option to overwrite existing operator with given id
- Searches for all integration platforms in any namespace and verify that platform name does not exist
- Make sure to use a known operator id when running an integration
- Check performs a cluster wide search for integration platforms and verifies that the platform name exists
- Users may use --force option to skip the check
- Introduce operator-id option to kamel CLI bind command
- Make sure to use proper operator id when creating the binding
- Rebuild Kamelet binding when operator id annotation changes
- Integration platforms need to be visible to authenticated users in order to verify operator ids
- Add integration platform viewer role in all namespaces that hold at least one integration platform so users can get and list these platform instances
- Automatically add the role during kamel CLI installation and in platform trait
…erator

- Make sure that each operator instance adds its integration platform if not already exists
- Automatically adds an integration platform in operator start-up
- Also adds integration platform viewer role so authenticated users can see the platform instance
- Required for kamel CLI to verify existing operator ids on the cluster when doing an operator install or running integrations
- Find local integration platform or try to search for default operator in any namespace instead of just looking for the "camel-k" integration platform in local namespace
- Should eliminate nasty "No Integration platform available in current namespace" warnings when using the kamel CLI
- Handle forbidden error when user lacks privileges to list all integration platforms on cluster during install command
- Use unique operator id when installing operators in test namespaces
- Make sure to run integrations with respective operator id for selected reconciling
- Automatically uninstall operators after the test
- Introduce new GitHub action to install Camel K in global mode
- Run YAKS e2e tests with single global Camel K operator
- Use local operator with id camel-k-preflight in preflight test checks and remove operator once checks are done
- Update deployment ready logic with pod spec comparison based on integration digest and container image only
@christophd christophd force-pushed the issue/2177/multi-operator-handling branch from 74f7b09 to 9003dc4 Compare July 8, 2022 09:03
Copy link
Member

@astefanutti astefanutti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, there are two remaining points, but I think these can be addressed in subsequent PRs:

  • Remove duplication of the default platform creation that's currently done in two different places
  • Use an aggregating ClusterRole to grant access to the IntegrationPlatform resources to the viewer role, instead of doing it manually

@christophd
Copy link
Contributor Author

Thanks @astefanutti for the findings! I agree with the two remaining points you have mentioned.

I would appreciate to get this PR merged now and tackle the remaining points in subsequent PRs because this PR is quite big already and I need to resolve many merge conflicts with main branch evolving.

I have created some follow-up issues to track the remaining points:

@astefanutti
Copy link
Member

Let's merge this and follow up on the open issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

simplify platforms handling/configuration Global vs local operator
4 participants