Skip to content

Add interface support for incremental/deferred query execution in main thread#2797

Merged
Mytherin merged 22 commits intoduckdb:masterfrom
Mytherin:taskexecutor
Dec 15, 2021
Merged

Add interface support for incremental/deferred query execution in main thread#2797
Mytherin merged 22 commits intoduckdb:masterfrom
Mytherin:taskexecutor

Conversation

@Mytherin
Copy link
Collaborator

This PR adds support for incremental query execution from the main thread to the C++ API. For example:

DuckDB db;
Connection con(db);

// create the pending query object for a query
unique_ptr<PendingQueryResult> pending_query;
pending_query = con.PendingQuery("SELECT SUM(i) FROM range(100000000) tbl(i)");

// execute the query piece-by-piece until we are done
PendingExecutionResult execution_result;
do {
   execution_result = pending_query->ExecuteTask();
} while(execution_result == PendingExecutionResult::RESULT_NOT_READY);

// check if execution succeeded
if (execution_result == PendingExecutionResult::EXECUTION_ERROR) {
   throw std::runtime_error(pending_query->error);
}

// we are finished!
auto result = pending_query->Execute();
// do something with the result...
result->Print();

Such an interface allows incrementally processing a query while periodically checking or updating external state. For example, as part of this PR we rework the progress bar to no longer require an extra thread to run in the background. Instead, we use the pending query API to process a part of the query, update/print the progress bar, and repeat until we are done. Other potential uses for this interface are adding support for query cancellation without requiring extra background threads or updating external interfaces based on progress.

This interface also creates a path to interrupt a task without completing it and giving control back to the user, which is an initial step towards supporting async I/O. For example, it would now be quite easily possible to extend the GetSource method with a return code such as WAITING_FOR_INPUT, which will cause a task to be interrupted early. The task could then be re-scheduled later when the async I/O requests have been completed.

Internally the ExecuteTask will run part of a pipeline task. Specifically, we currently process 50 chunks from the pipeline source before handing control back over to the client. This is accomplished by passing in a TaskExecutionMode to the ::ExecuteTask method, in which we can specify that we only want to complete a task partially (TaskExecutionMode::PROCESS_PARTIAL). When this mode is set, the ExecuteTask method can return TaskExecutionResult::TASK_NOT_FINISHED which will require it to be called again later.

CC @ankoh

@Mytherin Mytherin merged commit 52c0188 into duckdb:master Dec 15, 2021
@hamilton
Copy link
Contributor

hamilton commented Feb 18, 2022

So glad to see this has landed! @Mytherin are there plans in the roadmap to implement query cancellation from client libraries?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants