Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert collect task is not completed before fall-back to resume #15495

Merged
merged 2 commits into from
Feb 19, 2024

Conversation

jeeminso
Copy link
Contributor

@jeeminso jeeminso commented Feb 1, 2024

Summary of the changes / Why this improves CrateDB

try {
ShardCollectorProvider shardCollectorProvider = getCollectorProviderSafe(shardId);
CompletableFuture<BatchIterator<Row>> iterator = shardCollectorProvider
.awaitShardSearchActive()
.thenApply(batchIteratorFactory -> batchIteratorFactory.getIterator(
collectPhase,
requiresScroll,
collectTask
))
.exceptionallyCompose(err -> shardFailureFallbackOrRaise(err, shardId, collectPhase, collectTask, requiresScroll));
iterators.add(iterator);
} catch (Throwable e) {
iterators.add(shardFailureFallbackOrRaise(e, shardId, collectPhase, collectTask, requiresScroll));
}

batchIteratorFactory.getIterator() and shardFailureFallbackOrRaise() both eventually call collectTask.addSearcher() with the same searcherId (without checking if the searcherId is already added) causing the ShardCollectContext already added exception.

Checklist

  • Added an entry in the latest docs/appendices/release-notes/<x.y.0>.rst for user facing changes
  • Updated documentation & sql_features table for user facing changes
  • Touched code is covered by tests
  • CLA is signed
  • This does not contain breaking changes, or if it does:
    • It is released within a major release
    • It is recorded in the latest docs/appendices/release-notes/<x.y.0>.rst
    • It was marked as deprecated in an earlier release if possible
    • You've thought about the consequences and other components are adapted
      (E.g. AdminUI)

@jeeminso jeeminso marked this pull request as draft February 1, 2024 15:28
@jeeminso jeeminso force-pushed the jeeminso/collect-task-add-searcher-exception branch 2 times, most recently from abeeff4 to d8d2eb2 Compare February 4, 2024 22:37
@jeeminso jeeminso changed the title Prevent duplicate reader/searcher ids Allow CollectTask to resume if not yet started Feb 6, 2024
@jeeminso jeeminso force-pushed the jeeminso/collect-task-add-searcher-exception branch from d8d2eb2 to e9ba409 Compare February 14, 2024 21:17
Comment on lines 403 to 406
// the collect task is not in a resume-able state
if (collectTask.completionFuture().isDone()) {
throw Exceptions.toRuntimeException(err);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fall back function must check if the collect task is resume-able before resuming.

Copy link
Contributor Author

@jeeminso jeeminso Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like no concrete scenario that will cause this, but could add an assert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @matriv could you have a quick look if it is beneficial to add an assert here? It was my former attempt to #15520 before Moll reported steps to reproduce.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertions are only checked when running tests, and the issue was spotted with running cratedb as "blackbox", so I don't think it will benefit, but on the other hand, the assertion seems logical, so it doesn't harm to add it. Although, the collect task is used only in the else part, shouldn't we have the assertion there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you moved it.

@jeeminso jeeminso force-pushed the jeeminso/collect-task-add-searcher-exception branch from e9ba409 to 3c5ca8c Compare February 14, 2024 22:33
@jeeminso jeeminso changed the title Allow CollectTask to resume if not yet started Assert CollectTask is resume-able before calling call-back method Feb 14, 2024
@jeeminso jeeminso force-pushed the jeeminso/collect-task-add-searcher-exception branch from 3c5ca8c to c64b297 Compare February 14, 2024 23:19
@jeeminso jeeminso changed the title Assert CollectTask is resume-able before calling call-back method Assert collect task is not completed before fall-back to resume Feb 14, 2024
@jeeminso jeeminso force-pushed the jeeminso/collect-task-add-searcher-exception branch from c64b297 to c9a824e Compare February 15, 2024 15:07
@jeeminso jeeminso marked this pull request as ready for review February 15, 2024 16:20
@jeeminso jeeminso requested a review from matriv February 15, 2024 16:20
@jeeminso jeeminso added the ready-to-merge Let Mergify merge the PR once approved and checks pass label Feb 19, 2024
@mergify mergify bot merged commit af9e225 into master Feb 19, 2024
15 checks passed
@mergify mergify bot deleted the jeeminso/collect-task-add-searcher-exception branch February 19, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge Let Mergify merge the PR once approved and checks pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants