Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert the clustermesh subsystem into a hive.Cell #25561

Merged
merged 5 commits into from May 26, 2023

Conversation

giorio94
Copy link
Member

This PR refactors the clustermesh subsystem into a cell, which is registered in the controlplane module. Please refer to the individual commit messages for more details about the preliminary steps.

Convert the clustermesh subsystem into a hive.Cell

@giorio94 giorio94 added area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. area/modularization labels May 19, 2023
@giorio94 giorio94 requested review from a team as code owners May 19, 2023 16:57
Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me assuming your last force push only changed the cmdref.

@giorio94
Copy link
Member Author

Looks good to me assuming your last force push only changed the cmdref.

Yes, the force push was only to fix the autogenerated cmdref: https://github.com/cilium/cilium/compare/e799226a916e3ca5767c0fc7f4c9696c7c38398d..8bad0ae5e9ad41b637472d375702e981b5147bfa

@giorio94
Copy link
Member Author

TravisCI failed with the same error reported in #25272. Rerunning

@giorio94
Copy link
Member Author

Last force-push only fixed an out-of-date comment.

cell.Provide(NewClusterMesh),

// Convert concrete objects into more restricted interfaces used by clustermesh.
cell.ProvidePrivate(func(sc *k8s.ServiceCache) ServiceMerger { return sc }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProvidePrivate takes a variable number of constructors which might make this easier to read (matter of taste, I'm fine seeing either style used)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't notice that. Given that you don't have a specific preference, I'm going to keep it as it is now, because it seems slightly clearer to me.

@giorio94
Copy link
Member Author

ConformanceK8sKind failed but the logs don't mention any explicit failure. I'm going to rerun it and to see if the failure is consistent.

@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

giorio94 commented May 22, 2023

/test-1.26-net-next

Hit known flake #25524

@giorio94
Copy link
Member Author

Force-pushed to rebase onto main and fix an import-related conflict.

@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

/test-1.26-net-next

@giorio94
Copy link
Member Author

Cilium Conformance E2E failed while provisioning the LVH VM: #25483. Manually restarting the affected matrix task

@giorio94
Copy link
Member Author

giorio94 commented May 23, 2023

/test-runtime

Hit new flake: #25628

@giorio94
Copy link
Member Author

giorio94 commented May 24, 2023

/test-runtime

Hit #25628 and #25637


// RemoteIdentityWatcher provides identities that have been allocated on a
// remote cluster.
RemoteIdentityWatcher RemoteIdentityWatcher

IPCache ipcache.IPCacher

// ClusterSizeDependantInterval allows to calculate intervals based on cluster size.
ClusterSizeDependantInterval kvstore.ClusterSizeDependantIntervalFunc `optional:"true"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "Dependent"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used Dependant for consistency with the rest of the code base. Given that there are quite a lot of occurrences, I would prefer not to rename them all in this PR, as it would increase the number of modified files quite significantly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, it's like HTTP referer all over again. Sure, I didn't realize it was so wide-spread.

Copy link
Contributor

@squeed squeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor nits, but looks good for my files

@@ -37,15 +38,12 @@ const (
// Configuration is the configuration that must be provided to
// NewClusterMesh()
type Configuration struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extremely minor nit: "Configuration" is a confusing name. It's really a set of dependencies, not a configuration.

But, like, it's also very much not a big deal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right that it is not a pure configuration anymore. I'm not sure about a better name though, Dependencies also looks a bit weird when accessing for instance the cluster name as rc.mesh.deps.ClusterName (I personally feel that rc.mesh.conf.ClusterName is a bit more natural here).

Currently, the ServiceCache is created and stored as concrete object,
rather than pointer type. Yet, this is problematic for the upcoming
modification to provide it through the hive subsystem, because it contains
a mutex and cannot be passed around. Hence, let's change it to be of
pointer type and adapt the usages accordingly.

We don't risk incurring in new nil pointer exceptions, because that struct
already needed to be explicitly initialized as that it contains maps.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Let's provide the ServiceCache instance through hive. This is one of the
requirements to convert the clustermesh subsystem to a cell.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The "PolicyTrifecta" is currently responsible to initialize the
CachingIdentityAllocator, along others. Let's also return the
RemoteIdentityWatcher sub-interface, which is one of the requirements
to convert the clustermesh subsystem to a cell.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 requested a review from a team as a code owner May 25, 2023 08:11
@giorio94 giorio94 requested a review from lmb May 25, 2023 08:11
@giorio94
Copy link
Member Author

Rebased onto main to fix a minor conflict

@giorio94
Copy link
Member Author

/test

daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved
return cm
}

func (cm *ClusterMesh) Start(hive.HookContext) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, interesting! I kind of prefer the "cm implements a startable interface" rather than "cm appends hooks". I didn't realise how easy it is to convert from one to the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully got your comment. Are you suggesting to adopt a different approach?

pkg/clustermesh/clustermesh_test.go Outdated Show resolved Hide resolved
@@ -247,12 +251,14 @@ func (cm *ClusterMesh) Close() {
cluster.onRemove()
delete(cm.clusters, name)
}
cm.controllers.RemoveAllAndWait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has probably moved somewhere, but I can't find where!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have completely dropped this variable, since it was never actually used.

}, 10*time.Second), IsNil)
}

func (s *ClusterMeshServicesTestSuite) TearDownTest(c *C) {
if s.mesh != nil {
s.mesh.Close()
if s.hive != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to do c.Cleanup in SetUpTest() now that #25484 is in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've dropped that cleanup part, since it is now handled by hivetest.Lifecycle.

pkg/clustermesh/types/types.go Outdated Show resolved Hide resolved
@@ -23,14 +23,16 @@ type backendOption struct {

type backendOptions map[string]*backendOption

type ClusterSizeDependantIntervalFunc func(baseInterval time.Duration) time.Duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you turn this into a separate type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong reason, it just seemed more idiomatic to me.

Introduce a new type that wraps the ClusterID and the ClusterName, so
that is can be passed around through the hive machinery.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit refactors the clustermesh subsystem into a cell, which is
registered in the controlplane module.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94
Copy link
Member Author

/test

Copy link
Contributor

@thorn3r thorn3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@giorio94
Copy link
Member Author

Conformance E2E failed due to #25705. Manually rerunning.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 26, 2023
@tklauser tklauser merged commit 69b5845 into cilium:main May 26, 2023
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/modularization ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants