Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource: Work around a rare race in initial sync #23292

Merged
merged 1 commit into from Feb 1, 2023

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Jan 24, 2023

TestResource_WithFakeClient is sometimes failing due to seeing two Upserts instead of Upsert+Sync. This is due to the fact that we add the initial set of keys from the store to the queue when subscribing in Events() and the fact that adding to the store is happening before our event handler runs. The sequence of events looks like this:

[informer 1/2] add object A to store
[resource] subscribe and add object with key A to queue
[resource] dequeue A and emit Event{Upsert, A}
[informer 2/2] call AddFunc handler for A and queue A
[resource] dequeue A and emit Event{Upsert, A}

Work around the issue by allowing the extra Upsert event in the test case.

Proper fix for this would be to either detect a duplicate using ResourceVersion, or implementing a custom (cache.Config) Process function that synchronizes the updating of the store with subscribing (e.g. pkg/k8s/informer/informer.go).

Fixes: #23079
Fixes: 4101e2c ("k8s: Add resource package")

TestResource_WithFakeClient is sometimes failing due to seeing
two Upserts instead of Upsert+Sync. This is due to the fact that
we add the initial set of keys from the store to the queue when
subscribing in Events() and the fact that adding to the store is
happening before our event handler runs. The sequence of events
looks like this:

  [informer 1/2] add object A to store
  [resource] subscribe and add object with key A to queue
  [resource] dequeue A and emit Event{Upsert, A}
  [informer 2/2] call AddFunc handler for A and queue A
  [resource] dequeue A and emit Event{Upsert, A}

Work around the issue by allowing the extra Upsert event in
the test case.

Proper fix for this would be to either detect a duplicate using
ResourceVersion, or implementing a custom Process function that
synchronizes the updating of the store with subscribing
(e.g. pkg/k8s/informer/informer.go).

Fixes: cilium#23079
Fixes: 4101e2c ("k8s: Add resource package")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki requested a review from a team as a code owner January 24, 2023 13:55
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 24, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jan 24, 2023
@joamaki joamaki added release-note/misc This PR makes changes that have no direct user impact. and removed kind/community-contribution This was a contribution made by a community member. labels Jan 24, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jan 24, 2023
@joestringer joestringer added needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch release-note/ci This PR makes changes to the CI. and removed release-note/misc This PR makes changes that have no direct user impact. labels Jan 24, 2023
@joestringer
Copy link
Member

I marked this for backport to 1.13 since it looks like it will mitigate a unit test failure. If this relies on changes only in the master branch, feel free to remove the label.

@joamaki
Copy link
Contributor Author

joamaki commented Jan 26, 2023

/test

Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathConfig Encapsulation Check connectivity with transparent encryption, VXLAN, and endpoint routes

Failure Output

FAIL: Kubernetes DNS did not become ready in time

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.25-kernel-4.19 so I can create one.

Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed:

Click to show.

Test Name

K8sAgentHubbleTest Hubble Observe Test L3/L4 Flow

Failure Output

FAIL: hubble-relay was not able to get into ready state

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.9 so I can create one.

Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pchaigno
Copy link
Member

pchaigno commented Feb 1, 2023

@joamaki Why were the end-to-end tests executed here? AFAICS, it's only affecting the non-privileged unit tests which run in Travis CI. Is that was a mistake, I can probably merge.

@joamaki
Copy link
Contributor Author

joamaki commented Feb 1, 2023

@joamaki Why were the end-to-end tests executed here? AFAICS, it's only affecting the non-privileged unit tests which run in Travis CI. Is that was a mistake, I can probably merge.

Yep only Travis CI needed to check that the test passes. Go ahead and merge.

@pchaigno pchaigno merged commit 5fa2ac4 into cilium:master Feb 1, 2023
@pchaigno pchaigno mentioned this pull request Feb 12, 2023
29 tasks
@pchaigno pchaigno added the backport/author The backport will be carried out by the author of the PR. label Feb 14, 2023
@julianwiedmann
Copy link
Member

@joamaki is the backport for 1.13 still on your radar, or is it fine to drop that?

@julianwiedmann
Copy link
Member

Taking that as a no :).

@julianwiedmann julianwiedmann added affects/v1.13 This issue affects v1.13 branch and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 2, 2024
@joamaki
Copy link
Contributor Author

joamaki commented Apr 2, 2024

@julianwiedmann oops, sorry about the delay (had half-answered this and then it got buried). Yes no need to backport to 1.13.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.13 This issue affects v1.13 branch backport/author The backport will be carried out by the author of the PR. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: Travis: Unit test panic in TestResource_WithFakeClient
6 participants