Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CancellationToken support and a lot of async improvements #725

Merged
merged 19 commits into from Mar 7, 2024

Conversation

nachtjasmin
Copy link
Contributor

@nachtjasmin nachtjasmin commented Feb 27, 2024

This PR adds the ability to cancel (almost) all asynchronous operations with the built-in CancellationToken. Due to the nature of interfaces, almost all of those changes are breaking the existing API.

It also adds an WatchAsync method to the IKubernetesClient, which is making use of the newer asynchronous enumerables.

I tried to keep it as backwards-compatible as possible, however, this was not possible due to different problems. Most notably, the interfaces no longer have a default implementation to actually break existing code instead of failing silently. If I would've just changed the existing interface (with Task.CompletedTask as the return , current implementations of ReconcileAsync would have not been called anymore. Therefore I removed the default implementations to ensure that all remaining controllers get updated.

Note about the one failing test

During local development with a kind cluster, I could not get the KubeOps.Operator.Test.Events.EventPublisherIntegrationTest.Should_Increase_Count_On_Existing_Event to run successfully. It always broke with some certification error, other tests work fine tho. I could not determine whether it is an issue with the kind cluster or some other error.

@nachtjasmin nachtjasmin force-pushed the feature/better-asynchronous-api-usage branch from c5ca283 to c9082a8 Compare February 27, 2024 08:26
@nachtjasmin nachtjasmin marked this pull request as ready for review March 1, 2024 15:46
@nachtjasmin nachtjasmin force-pushed the feature/better-asynchronous-api-usage branch from c9082a8 to b12f14b Compare March 1, 2024 15:46
@nachtjasmin nachtjasmin changed the title feature: add CancellationToken support CancellationToken support and a lot of async improvements Mar 1, 2024
@nachtjasmin nachtjasmin force-pushed the feature/better-asynchronous-api-usage branch from b12f14b to becd43c Compare March 1, 2024 16:26
@buehler
Copy link
Owner

buehler commented Mar 4, 2024

Hey @nachtjasmin
Thank you for your contribution!

I don't mind breaking changes at all. While I'm reviewing the stuff, the linter is ranting about some things it seems. dotnet format should be able to help as far as I know.

Another general comment: Why do we need factories? Personally I'm not really a fan of factories. Is there a specific use-case / reason for them?

@buehler
Copy link
Owner

buehler commented Mar 4, 2024

The linter seems to be angry with you...

@nachtjasmin nachtjasmin force-pushed the feature/better-asynchronous-api-usage branch 2 times, most recently from c72dd5e to 32a34c5 Compare March 5, 2024 08:20
@nachtjasmin
Copy link
Contributor Author

The joy of line endings. Running dotnet format was not enough, cause Git would've changed the line endings back to LF on git add. Added a .gitattributes file in order to enforce the correct line endings for the .cs files, now it should work as intended.

@buehler
Copy link
Owner

buehler commented Mar 5, 2024

all fingers crossed.

{
public Task ReconcileAsync(V1TestEntity entity, CancellationToken cancellationToken)
{
logger.LogInformation("Reconciling entity {Entity}.", entity);

Check failure

Code scanning / CodeQL

Log entries created from user input

This log entry depends on a [user-provided value](1).

public Task DeletedAsync(V1TestEntity entity, CancellationToken cancellationToken)
{
logger.LogInformation("Deleted entity {Entity}.", entity);

Check failure

Code scanning / CodeQL

Log entries created from user input

This log entry depends on a [user-provided value](1).
{
public async Task ReconcileAsync(V1TestEntity entity, CancellationToken cancellationToken)
{
logger.LogInformation("Reconciling entity {Entity}.", entity);

Check failure

Code scanning / CodeQL

Log entries created from user input

This log entry depends on a [user-provided value](1).

public Task DeletedAsync(V1TestEntity entity, CancellationToken cancellationToken)
{
logger.LogInformation("Deleting entity {Entity}.", entity);

Check failure

Code scanning / CodeQL

Log entries created from user input

This log entry depends on a [user-provided value](1).
{
public Task ReconcileAsync(V1TestEntity entity, CancellationToken cancellationToken)
{
logger.LogInformation("Reconciling entity {Entity}.", entity);

Check failure

Code scanning / CodeQL

Log entries created from user input

This log entry depends on a [user-provided value](1).

public Task DeletedAsync(V1TestEntity entity, CancellationToken cancellationToken)
{
logger.LogInformation("Deleted entity {Entity}.", entity);

Check failure

Code scanning / CodeQL

Log entries created from user input

This log entry depends on a [user-provided value](1).
@buehler
Copy link
Owner

buehler commented Mar 5, 2024

Hey @nachtjasmin

Do you have any idea, why the tests won't start?

It seems, that the host does not start in test mode.

The debug output of the github actions:

KubeOps.Operator.Test -> /home/runner/work/dotnet-operator-sdk/dotnet-operator-sdk/test/KubeOps.Operator.Test/bin/Debug/net8.0/KubeOps.Operator.Test.dll
Test run for /home/runner/work/dotnet-operator-sdk/dotnet-operator-sdk/test/KubeOps.Operator.Test/bin/Debug/net8.0/KubeOps.Operator.Test.dll (.NETCoreApp,Version=v8.0)
Microsoft (R) Test Execution Command Line Tool Version 17.9.0 (x64)
Copyright (c) Microsoft Corporation.  All rights reserved.
Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
<7>Microsoft.Extensions.Hosting.Internal.Host[1] Hosting starting
<6>KubeOps.Operator.Watcher.ResourceWatcher[0] Starting resource watcher for V1OperatorIntegrationTestEntity.
<6>KubeOps.Operator.Watcher.ResourceWatcher[0] Started resource watcher for V1OperatorIntegrationTestEntity.
<6>Microsoft.Hosting.Lifetime[0] Application started. Press Ctrl+C to shut down.
<6>Microsoft.Hosting.Lifetime[0] Hosting environment: Production
<6>Microsoft.Hosting.Lifetime[0] Content root path: /home/runner/work/dotnet-operator-sdk/dotnet-operator-sdk/test/KubeOps.Operator.Test/bin/Debug/net8.0
<7>Microsoft.Extensions.Hosting.Internal.Host[2] Hosting started
<6>KubeOps.Operator.Watcher.ResourceWatcher[0] Received watch event "Added" for "OperatorIntegrationTest/test-entity", last observed resource version: 601.
<7>Microsoft.Extensions.Hosting.Internal.Host[3] Hosting stopping
<6>Microsoft.Hosting.Lifetime[0] Application is shutting down...
<6>KubeOps.Operator.Watcher.ResourceWatcher[0] Stopping resource watcher for V1OperatorIntegrationTestEntity.
<6>KubeOps.Operator.Watcher.ResourceWatcher[0] Stopped resource watcher for V1OperatorIntegrationTestEntity.
<7>Microsoft.Extensions.Hosting.Internal.Host[4] Hosting stopped
<7>Microsoft.Extensions.Hosting.Internal.Host[1] Hosting starting
<6>KubeOps.Operator.Watcher.ResourceWatcher[0] Starting resource watcher for V1OperatorIntegrationTestEntity.
<6>KubeOps.Operator.Watcher.ResourceWatcher[0] Started resource watcher for V1OperatorIntegrationTestEntity.
<6>Microsoft.Hosting.Lifetime[0] Application started. Press Ctrl+C to shut down.
<6>Microsoft.Hosting.Lifetime[0] Hosting environment: Production
<6>Microsoft.Hosting.Lifetime[0] Content root path: /home/runner/work/dotnet-operator-sdk/dotnet-operator-sdk/test/KubeOps.Operator.Test/bin/Debug/net8.0
<7>Microsoft.Extensions.Hosting.Internal.Host[2] Hosting started
<7>Microsoft.Extensions.Hosting.Internal.Host[3] Hosting stopping
<6>Microsoft.Hosting.Lifetime[0] Application is shutting down...
<6>KubeOps.Operator.Watcher.ResourceWatcher[0] Stopping resource watcher for V1OperatorIntegrationTestEntity.
<6>KubeOps.Operator.Watcher.ResourceWatcher[0] Stopped resource watcher for V1OperatorIntegrationTestEntity.
<7>Microsoft.Extensions.Hosting.Internal.Host[4] Hosting stopped
<7>Microsoft.Extensions.Hosting.Internal.Host[1] Hosting starting
<7>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] Subscribe for leadership updates.
<6>Microsoft.Hosting.Lifetime[0] Application started. Press Ctrl+C to shut down.
<6>Microsoft.Hosting.Lifetime[0] Hosting environment: Production
<6>Microsoft.Hosting.Lifetime[0] Content root path: /home/runner/work/dotnet-operator-sdk/dotnet-operator-sdk/test/KubeOps.Operator.Test/bin/Debug/net8.0
<7>Microsoft.Extensions.Hosting.Internal.Host[2] Hosting started
<6>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] This instance started leading, starting watcher.
<6>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] Starting resource watcher for V1OperatorIntegrationTestEntity.
<6>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] Started resource watcher for V1OperatorIntegrationTestEntity.
<6>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] Received watch event "Added" for "OperatorIntegrationTest/test-entity", last observed resource version: 600.
<6>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] Received watch event "Added" for "OperatorIntegrationTest/test-entity", last observed resource version: 617.
<6>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] Received watch event "Added" for "OperatorIntegrationTest/test-entity2", last observed resource version: 610.
<6>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] Received watch event "Added" for "OperatorIntegrationTest/test-entity", last observed resource version: 601.
<7>Microsoft.Extensions.Hosting.Internal.Host[3] Hosting stopping
<6>Microsoft.Hosting.Lifetime[0] Application is shutting down...
<7>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] Unsubscribe from leadership updates.
<7>Microsoft.Extensions.Hosting.Internal.Host[4] Hosting stopped
<7>Microsoft.Extensions.Hosting.Internal.Host[1] Hosting starting
<6>KubeOps.Operator.Watcher.ResourceWatcher[0] Starting resource watcher for V1OperatorIntegrationTestEntity.
<6>KubeOps.Operator.Watcher.ResourceWatcher[0] Started resource watcher for V1OperatorIntegrationTestEntity.
<6>Microsoft.Hosting.Lifetime[0] Application started. Press Ctrl+C to shut down.
<6>Microsoft.Hosting.Lifetime[0] Hosting environment: Production
<6>Microsoft.Hosting.Lifetime[0] Content root path: /home/runner/work/dotnet-operator-sdk/dotnet-operator-sdk/test/KubeOps.Operator.Test/bin/Debug/net8.0
<7>Microsoft.Extensions.Hosting.Internal.Host[2] Hosting started
<6>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] Received watch event "Deleted" for "OperatorIntegrationTest/test-entity", last observed resource version: 628.
<6>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] Received watch event "Deleted" for "OperatorIntegrationTest/test-entity", last observed resource version: 631.
<6>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] Received watch event "Deleted" for "OperatorIntegrationTest/test-entity2", last observed resource version: 640.
<6>KubeOps.Operator.Watcher.LeaderAwareResourceWatcher[0] Received watch event "Deleted" for "OperatorIntegrationTest/test-entity", last observed resource version: 649.

@nachtjasmin
Copy link
Contributor Author

Do you have any idea, why the tests won't start?

Not at all, but I'll try to fix it. ^^

@buehler
Copy link
Owner

buehler commented Mar 5, 2024

If I can be of any help, just say the word. I try to fetch your fork and run the tests locally.

buehler
buehler previously approved these changes Mar 7, 2024
@buehler buehler enabled auto-merge (squash) March 7, 2024 09:13
auto-merge was automatically disabled March 7, 2024 09:31

Head branch was pushed to by a user without write access

@nachtjasmin
Copy link
Contributor Author

At least with the local KinD cluster and with act, the tests are working again. The KubeOps.Operator.Test.Events.EventPublisherIntegrationTest.Should_Increase_Count_On_Existing_Event seems to be flaky for me, I'll try to fix it as well.

After that, I plan to rewrite the commit history to ensure a clean history. Gonna let you know when I'm done with that, just so you don't have to accept my WIP commits all the time. 😄

@buehler
Copy link
Owner

buehler commented Mar 7, 2024

Nice work, thank you :-)

Don't worry about the commits, they're gonna be squashed anyway.

@buehler
Copy link
Owner

buehler commented Mar 7, 2024

And thank you for your effort!

The previous implementation effectively enforced that a Kubernetes
cluster is present. While this is acceptable for the production
environment, it made integration testing much harder. By adding several
factories for publishers, attachers and the requeue delegates, we can
now mock those tasks as well.

Furthermore, the leader election is no longer done during the creation
phase, instead it's now implemented as one additional background
service.
The synchronous event-based implementations were replaced by two
separate, now asynchronous IHostedService services. One is responsible
for watching the resources while the other is watching the queue for
manual requeues.

The queue for requeueing entities is now asynchronous as well and
instead of creating multiple timers, it leverages Task.Delay and the
CancellationTokenSource.

Also, finalizer registrations are taking advantage of the ability to store
keyed services in the DI container.
We don't wanna let run the tests forever, a generous timeout of 30
seconds should be suitable for most cases.
@nachtjasmin nachtjasmin force-pushed the feature/better-asynchronous-api-usage branch from 90074f9 to 81a74fd Compare March 7, 2024 11:16
@nachtjasmin
Copy link
Contributor Author

nachtjasmin commented Mar 7, 2024

Well, if the commits are squashed anyway, I think I'm done now. :D
Although I'm not happy with the big "change all line endings" commit. That should've been avoided in the first place. But with the new .gitattributes file it hopefully shouldn't happen anymore for upcoming PRs.

(note to self: make smaller PRs next time 🙈)

@buehler buehler merged commit 2d17bff into buehler:main Mar 7, 2024
1 of 3 checks passed
@buehler
Copy link
Owner

buehler commented Mar 7, 2024

The linter is still yapping, but I'll fix the issues right away.

@nachtjasmin
Copy link
Contributor Author

nachtjasmin commented Mar 7, 2024

The linter is still yapping, but I'll fix the issues right away.

aaah, sorry ><
totally forgot to push that one little commit which would've avoided that.

@nachtjasmin nachtjasmin deleted the feature/better-asynchronous-api-usage branch March 7, 2024 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants