-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make admin operations on Statestore non blocking #9348
Make admin operations on Statestore non blocking #9348
Conversation
...unctions/worker/src/main/java/org/apache/pulsar/functions/worker/rest/api/ComponentImpl.java
Outdated
Show resolved
Hide resolved
5ad3883
to
8012492
Compare
8012492
to
0a41af1
Compare
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
1 similar comment
/pulsarbot run-failure-checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we do not wait for the completion of the operation we could fall into a case in which the execution of the deregisterFunction
completes but we still have ongoing operations.
This may be a problem and also it may make tests less predictable (and so more flaky)
Am I correct ? (I hope I am missing some part of the story here and the change looks good indeed)
/pulsarbot run-failure-checks |
@eolivelli I agree. As I mentioned in the Modification section in the PR. Talking about tests. I am not sure we have test coverage for this class and I mentioned the same in PR message. |
Probably other tests use this function, and if we do not have tests then it is the good time to add them :) |
Yeah other tests certainly use the method |
/pulsarbot run-failure-checks |
I am sorry, but I am not sure this is the right way. Why waiting for this operation is so annoying ? does it take too much time ? |
@eolivelli I know, and I mentioned that in the commit message that we may end up leaving garbage behind which can be cleaned up manually later, as it is being logged as error log. I can be wrong but my theory is. Deletion of table in state store can take long time sometimes(We are trying to address that) but regardless of that deletion is a heavy operation and it may fail at multiple places. In that scenarios should we allow ingestion to fail or block because it could not delete a table? Or we leave the garbage behind and make progress. While a cron or an operator scans the logs and deletes the garbage. I am leaning towards leaving the garbage and make progress. |
@eolivelli thanks for chiming in!
This is not entirely true. If an error occurs, it is logged. The reason for this change:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerrypeng thanks for your explanation.
+1
0a41af1
to
e83182b
Compare
/pulsarbot run-failure-checks |
Co-authored-by: Prashant <prashantk@splunk.com>
Motivation
Admin operations should be non blocking.
Typical admin operations particularly delete table/namespace operations should not be a blocking operation.
If delete admin operation fails the maximum cost is it will leave garbage behind in the state store, which can always be cleaned up later manually by an operator.
Modifications
Delete the table in non blocking way and do not wait for operation to be completed.
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Does this pull request potentially affect one of the following parts:
If
yes
was chosen, please highlight the changes