Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustermesh-apiserver: expose information about completion of initial synchronization through etcd #25388

Merged
merged 4 commits into from May 25, 2023

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented May 11, 2023

This PR extends the SyncStore abstraction to add the support for synced canaries, which are used to convey the information that the initial synchronization from an external source (e.g., k8s or another kvstore) of the keys hosted at a different prefix has completed. This information is required when using an ephemeral etcd instance (e.g., the clustermesh-apiserver sidecar container), because from a client perspective the completion of the initial range operation is not sufficient in this case to guarantee that an up-to-date snapshot of the state has been retrieved. Specifically, this information will be leveraged in a subsequent PR to address #24740.

At the moment, "synced" canaries are enabled only for service entries. The introduction for the other resources depends on #25049, and I temporarily removed that part to allow reviewing both in parallel.

Please, refer to the individual commits for more details.

I've tagged this PR as release-note/minor since it introduces a new metric.

Related: #24740

clustermesh-apiserver: expose information about completion of initial synchronization through etcd

@giorio94 giorio94 added area/misc Impacts miscellaneous areas of the code not otherwise owned by another area. area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. sig/kvstore Impacts the KVStore package interactions. labels May 11, 2023
@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-sync-canaries branch 4 times, most recently from 9f07b96 to 80a3ba7 Compare May 16, 2023 16:22
@giorio94 giorio94 added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. and removed area/misc Impacts miscellaneous areas of the code not otherwise owned by another area. release-note/misc This PR makes changes that have no direct user impact. labels May 16, 2023
@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 marked this pull request as ready for review May 16, 2023 16:29
@giorio94 giorio94 requested review from a team as code owners May 16, 2023 16:29
Copy link
Contributor

@tommyp1ckles tommyp1ckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes make sense from a metrics perspective.

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand the role of the metric. The description says "Whether the initial synchronization from/to the kvstore has completed", but that to me sounds like it represents a boolean state. The metric is a gauge, so what is it actually representing?

@giorio94
Copy link
Member Author

giorio94 commented May 17, 2023

I'm not sure I fully understand the role of the metric. The description says "Whether the initial synchronization from/to the kvstore has completed", but that to me sounds like it represents a boolean state. The metric is a gauge, so what is it actually representing?

The metric is currently of gauge type mainly considering the parallel usage when reading data from the kvstore (to be introduced in a subsequent commit not part of this PR). In that case, and specifically in the clustermesh context, we might watch the same (cluster, kvstore prefix) pair different times following a connection restart. The current implementation resets the metric to false any time we start listing the given prefix, and turns it to true once listing has completed. Another alternative might be to configure it to true only the first time, but that seemed less informative to me.

I'm more doubtful about the description, because I agree that initial might sound misleading. It is there because I would say that in general synchronization from/to the kvstore is an ongoing process which cannot complete. Only the initial synchronization (I couldn't come up with a better term, but I was referring to the list part of a list-and-watch process) can.

@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-sync-canaries branch from 80a3ba7 to f8e2543 Compare May 17, 2023 07:58
@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 requested a review from christarazi May 23, 2023 17:01
Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@giorio94
Copy link
Member Author

giorio94 commented May 24, 2023

/test-runtime

The link leads to a 404, I cannot tell why this failed.

@giorio94
Copy link
Member Author

giorio94 commented May 24, 2023

/test-1.26-net-next

The link leads to a 404, I cannot tell why this failed.

@giorio94
Copy link
Member Author

A few Cilium Conformance E2E jobs failed during provisioning: #25483. Manually rerunning.

@giorio94
Copy link
Member Author

giorio94 commented May 24, 2023

/test-runtime

Hit #25637

@giorio94
Copy link
Member Author

giorio94 commented May 24, 2023

/test-1.26-net-next

Failed with known flakes: #25605, #25524

pkg/kvstore/kvstore.go Outdated Show resolved Hide resolved
Introduce the new `cilium/synced` kvstore prefix, which will be used to
convey the information that the initial synchronization from an external
source (e.g., k8s or another kvstore) of the keys hosted at a different
prefix has completed. For instance, `cilium/synced/foo/cilium/state/node/v1`
means that synchronization from cluster `foo` of the keys hosted under
`cilium/state/node/v1` has finished.

This information is required when using an ephemeral etcd instance (e.g.,
the clustermesh-apiserver sidecar container), because from a client
perspective the completion of the initial range operation is not
sufficient in this case to guarantee that an up-to-date snapshot of the
state has been retrieved. We use a separate prefix (rather than using a
special key under the original path) to prevent compatibility issues, as
it would not be interpreted correctly by older cilium agent versions.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Extend the CiliumClusterConfig struct to allow expressing cluster-specific
capabilities. Currently, the only supported capability concerns the
support of per-prefix "synced" canaries. When enabled, the clustermesh-apiserver
writes keys in the form `cilium/synced/$name/$prefix` into etcd to
state that the initial synchronization of the keys under the `$prefix`
prefix for cluster `$name` has completed successfully.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-sync-canaries branch from 60be128 to bdbfca5 Compare May 24, 2023 16:11
@giorio94
Copy link
Member Author

/test

Copy link
Contributor

@chancez chancez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metric looks good overall, had a question about whether or not we need the action label.

pkg/kvstore/store/syncstore.go Show resolved Hide resolved
@christarazi
Copy link
Member

This metric currently conveys two pieces of information, depending on the action type:

* `write`: the initial synchronization from an external source (which might be k8s or a kvstore) to the kvstore has completed (this is the part which is implemented in this PR). In other words, the initial list has completed, and all the returned elements have been successfully written into the kvstore. This is why I used _sync_ rather than _list_ for the name.

* `read`: this is currently implemented in [Fix missed deletion events when reconnecting to/disconnecting from remote clusters (nodes and services) #25499](https://github.com/cilium/cilium/pull/25499), and it is set by the agent when the initial list operation for a given prefix has completed, and all the entries received up to that moment have been processed.

Additionally, will this metric have any utility after the "initial list" completes? What is the role of the metric after the user has already "observed" the metric reach this state?

Depending on the action type:

* `write`: it can be used to infer the time required to complete the synchronization when the clustermesh-apiserver is restarted, which can be potentially large with the default rate-limiting settings if thousands of entries need to be sync'ed. This operation is always performed since the etcd database used by the clustermesh-apiserver is ephemeral, and it needs to be reinitialized from scratch after any restart.

* `read`: it conveys that the agent has completed the initial synchronization from the kvstore for a given cluster, and it is now observing an up-to-date state. The metric would again transition to false if the connection needs to be re-established (mainly because the remote clustermesh-apiserver has been restarted) or the agent restarted. In other words, when the metric is false, the agent is operating with stale data about the given remote cluster.

Thanks for the explanation! Could you include the description of these in the commit msg that introduces the metrics?

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending my comment above.

The SyncStore interface is extended with an additional `Synced()` method,
which triggers the asynchronous creation of the appropriate "synced" key
in the kvstore once all the previously started upsertion operations
completed successfully. This information can be used by readers to
confirm that the initial synchronization from an external source to the
kvstore has completed. Hence, they can read an up-to-date state.

Additional callbacks can be registered to be executed when the initial
synchronization completed, which might be used for instance to gate
other operations or to implement readiness handlers. Finally, the new
`kvstore_initial_sync_completed` metric is introduced to convey the
synchronization status. It is labeled by `scope` (matching the same
format adopted for the other kvstore metrics), `source_cluster`
(identifying the source cluster of the information) and `action`
(either write or read, the SyncStore always does write operations).

More specifically, the new metric conveys two different pieces of
information, depending on the `action` type:
* `write`: the initial synchronization from an external source (which
  might be k8s or a kvstore) to the kvstore has completed. In other
  words, the initial list has completed, and all the returned elements
  have been successfully written into the kvstore. It can be used to
  infer the time required to complete the synchronization when the
  clustermesh-apiserver is restarted.
* `read` (to be supported in an upcoming commit): set by the agent
  when the initial list operation for a given prefix has completed,
  and all the entries received up to that moment have been processed.
  It conveys that the agent has completed the initial synchronization
  from the kvstore for a given cluster, and it is now observing an
  up-to-date state. The metric would again transition to false if
  the connection needs to be re-established (mainly because the remote
  clustermesh-apiserver has been restarted) or the agent restarted.
  In other words, when the metric is false, the agent is operating
  with stale data about the given remote cluster.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit includes the glue logic to write the "synced" canary when
the initial synchronization from k8s completes. The corresponding
capability is not yet turned on in the CiliumClusterConfig, because
"synced" canaries need still to be enabled for the other resource types.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-sync-canaries branch from bdbfca5 to 37b4940 Compare May 24, 2023 17:04
@giorio94
Copy link
Member Author

Thanks for the explanation! Could you include the description of these in the commit msg that introduces the metrics?

Sure, I've extended the commit message with the information mentioned above.

@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

giorio94 commented May 25, 2023

/ci-l4lb

Failure tracked in #25663

@giorio94
Copy link
Member Author

k8s-1.26-kernel-net-next failed again due to known flake #25605. Given the currently high failure rate (>50%) it wouldn't be much useful to rerun it again. Marking this PR as ready to merge, since all the other tests are green, and all reviews are in.

@giorio94 giorio94 added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 25, 2023
@squeed squeed merged commit 751ab54 into cilium:main May 25, 2023
57 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/kvstore Impacts the KVStore package interactions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants