-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: kvstoremesh unit test ClusterMeshServicesTestSuite/TestRemoteServiceObserver panic #32179
Labels
area/CI
Continuous Integration testing issue or flake
area/clustermesh
Relates to multi-cluster routing functionality in Cilium.
ci/flake
This is a known failure that occurs in the tree. Please investigate me!
Comments
squeed
added
area/CI
Continuous Integration testing issue or flake
area/clustermesh
Relates to multi-cluster routing functionality in Cilium.
ci/flake
This is a known failure that occurs in the tree. Please investigate me!
labels
Apr 25, 2024
giorio94
added a commit
to giorio94/cilium
that referenced
this issue
May 13, 2024
The clustermesh logic is currently affected by a possible, although rare, race condition occurring if the cluster configuration is being retrieved while the connection to the remote cluster is stopped. Indeed, this operation stops two controllers -- the one handling the connection to the remote cluster and the one responsible for the retrieval of the cluster config. However, this causes the getRemoteCluster function to possibly terminate before the termination of the second controller, in turn leading to a panic due to send on closed channel. Let's fix this issue by explicitly removing only the first controller, and letting the other terminate normally due to the parent context having been terminated. Hence, ensuring that the controller has always terminated before closing the cfgch channel. Fixes: cilium#32179 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
github-merge-queue bot
pushed a commit
that referenced
this issue
May 16, 2024
The clustermesh logic is currently affected by a possible, although rare, race condition occurring if the cluster configuration is being retrieved while the connection to the remote cluster is stopped. Indeed, this operation stops two controllers -- the one handling the connection to the remote cluster and the one responsible for the retrieval of the cluster config. However, this causes the getRemoteCluster function to possibly terminate before the termination of the second controller, in turn leading to a panic due to send on closed channel. Let's fix this issue by explicitly removing only the first controller, and letting the other terminate normally due to the parent context having been terminated. Hence, ensuring that the controller has always terminated before closing the cfgch channel. Fixes: #32179 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
YutaroHayakawa
pushed a commit
that referenced
this issue
May 23, 2024
[ upstream commit 104a302 ] The clustermesh logic is currently affected by a possible, although rare, race condition occurring if the cluster configuration is being retrieved while the connection to the remote cluster is stopped. Indeed, this operation stops two controllers -- the one handling the connection to the remote cluster and the one responsible for the retrieval of the cluster config. However, this causes the getRemoteCluster function to possibly terminate before the termination of the second controller, in turn leading to a panic due to send on closed channel. Let's fix this issue by explicitly removing only the first controller, and letting the other terminate normally due to the parent context having been terminated. Hence, ensuring that the controller has always terminated before closing the cfgch channel. Fixes: #32179 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
YutaroHayakawa
pushed a commit
that referenced
this issue
May 24, 2024
[ upstream commit 104a302 ] The clustermesh logic is currently affected by a possible, although rare, race condition occurring if the cluster configuration is being retrieved while the connection to the remote cluster is stopped. Indeed, this operation stops two controllers -- the one handling the connection to the remote cluster and the one responsible for the retrieval of the cluster config. However, this causes the getRemoteCluster function to possibly terminate before the termination of the second controller, in turn leading to a panic due to send on closed channel. Let's fix this issue by explicitly removing only the first controller, and letting the other terminate normally due to the parent context having been terminated. Hence, ensuring that the controller has always terminated before closing the cfgch channel. Fixes: #32179 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
YutaroHayakawa
pushed a commit
that referenced
this issue
May 24, 2024
[ upstream commit 104a302 ] The clustermesh logic is currently affected by a possible, although rare, race condition occurring if the cluster configuration is being retrieved while the connection to the remote cluster is stopped. Indeed, this operation stops two controllers -- the one handling the connection to the remote cluster and the one responsible for the retrieval of the cluster config. However, this causes the getRemoteCluster function to possibly terminate before the termination of the second controller, in turn leading to a panic due to send on closed channel. Let's fix this issue by explicitly removing only the first controller, and letting the other terminate normally due to the parent context having been terminated. Hence, ensuring that the controller has always terminated before closing the cfgch channel. Fixes: #32179 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
YutaroHayakawa
pushed a commit
that referenced
this issue
May 24, 2024
[ upstream commit 104a302 ] The clustermesh logic is currently affected by a possible, although rare, race condition occurring if the cluster configuration is being retrieved while the connection to the remote cluster is stopped. Indeed, this operation stops two controllers -- the one handling the connection to the remote cluster and the one responsible for the retrieval of the cluster config. However, this causes the getRemoteCluster function to possibly terminate before the termination of the second controller, in turn leading to a panic due to send on closed channel. Let's fix this issue by explicitly removing only the first controller, and letting the other terminate normally due to the parent context having been terminated. Hence, ensuring that the controller has always terminated before closing the cfgch channel. Fixes: #32179 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
YutaroHayakawa
pushed a commit
that referenced
this issue
May 24, 2024
[ upstream commit 104a302 ] The clustermesh logic is currently affected by a possible, although rare, race condition occurring if the cluster configuration is being retrieved while the connection to the remote cluster is stopped. Indeed, this operation stops two controllers -- the one handling the connection to the remote cluster and the one responsible for the retrieval of the cluster config. However, this causes the getRemoteCluster function to possibly terminate before the termination of the second controller, in turn leading to a panic due to send on closed channel. Let's fix this issue by explicitly removing only the first controller, and letting the other terminate normally due to the parent context having been terminated. Hence, ensuring that the controller has always terminated before closing the cfgch channel. Fixes: #32179 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
YutaroHayakawa
pushed a commit
that referenced
this issue
May 24, 2024
[ upstream commit 104a302 ] The clustermesh logic is currently affected by a possible, although rare, race condition occurring if the cluster configuration is being retrieved while the connection to the remote cluster is stopped. Indeed, this operation stops two controllers -- the one handling the connection to the remote cluster and the one responsible for the retrieval of the cluster config. However, this causes the getRemoteCluster function to possibly terminate before the termination of the second controller, in turn leading to a panic due to send on closed channel. Let's fix this issue by explicitly removing only the first controller, and letting the other terminate normally due to the parent context having been terminated. Hence, ensuring that the controller has always terminated before closing the cfgch channel. Fixes: #32179 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
YutaroHayakawa
pushed a commit
that referenced
this issue
May 24, 2024
[ upstream commit 104a302 ] The clustermesh logic is currently affected by a possible, although rare, race condition occurring if the cluster configuration is being retrieved while the connection to the remote cluster is stopped. Indeed, this operation stops two controllers -- the one handling the connection to the remote cluster and the one responsible for the retrieval of the cluster config. However, this causes the getRemoteCluster function to possibly terminate before the termination of the second controller, in turn leading to a panic due to send on closed channel. Let's fix this issue by explicitly removing only the first controller, and letting the other terminate normally due to the parent context having been terminated. Hence, ensuring that the controller has always terminated before closing the cfgch channel. Fixes: #32179 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
YutaroHayakawa
pushed a commit
that referenced
this issue
May 25, 2024
[ upstream commit 104a302 ] The clustermesh logic is currently affected by a possible, although rare, race condition occurring if the cluster configuration is being retrieved while the connection to the remote cluster is stopped. Indeed, this operation stops two controllers -- the one handling the connection to the remote cluster and the one responsible for the retrieval of the cluster config. However, this causes the getRemoteCluster function to possibly terminate before the termination of the second controller, in turn leading to a panic due to send on closed channel. Let's fix this issue by explicitly removing only the first controller, and letting the other terminate normally due to the parent context having been terminated. Hence, ensuring that the controller has always terminated before closing the cfgch channel. Fixes: #32179 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
aanm
pushed a commit
that referenced
this issue
May 27, 2024
[ upstream commit 104a302 ] The clustermesh logic is currently affected by a possible, although rare, race condition occurring if the cluster configuration is being retrieved while the connection to the remote cluster is stopped. Indeed, this operation stops two controllers -- the one handling the connection to the remote cluster and the one responsible for the retrieval of the cluster config. However, this causes the getRemoteCluster function to possibly terminate before the termination of the second controller, in turn leading to a panic due to send on closed channel. Let's fix this issue by explicitly removing only the first controller, and letting the other terminate normally due to the parent context having been terminated. Hence, ensuring that the controller has always terminated before closing the cfgch channel. Fixes: #32179 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/CI
Continuous Integration testing issue or flake
area/clustermesh
Relates to multi-cluster routing functionality in Cilium.
ci/flake
This is a known failure that occurs in the tree. Please investigate me!
CI failure
In an unrelated PR, I saw a panic in a kvstoremesh privileged unit test:
The text was updated successfully, but these errors were encountered: