Skip to content

Conversation

Y--
Copy link
Contributor

@Y-- Y-- commented May 1, 2025

In the current code, we call Fetch on a query result before checking if there is a result available, which may cause deadlocks since we don't return to the event loop in the BLOCKED and NO_TASKS_AVAILABLE cases.

This PR solves it by returning a special buffer containing one "magic" byte out of the C++ layer, then a null buffer indicating the caller they need to retry fetching.

We tested this with MotherDuck operators which do return "BLOCKED", and this fixes the streaming issue we had.

Ideally, this would test this with a custom "test" extension that reproduces the behavior.

Another way to test it (albeit less ideal), would be to run DuckDB Wasm tests with a special version of DuckDB compiled with FORCE_ASYNC_SINK_SOURCE flag.

@Y-- Y-- force-pushed the yl/ensure-fetch-ready branch 2 times, most recently from ef2f016 to 8cff0fa Compare May 5, 2025 08:49
Copy link
Collaborator

@carlopi carlopi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is properly hacky, love it.

Left some comments, for you or for me in a bit.

I have a setup to run most duckdb unittester in duckdb-wasm (currently only node) vs various duckdb-wasm interfaces, I need to upstream that, and that can be used to stress test this more.

}

const res = copyBuffer(this.mod, d, n);
dropResponseBuffers(this.mod);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dropResponseBuffers(this.mod);
dropResponseBuffers(this.mod);

I am not sure if this needs to be performed, or moved afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it may need to be revisited though might be worth a dedicated PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking moving this below the return null might be it, but I need to check.
In any case, this impact only retry cases, that are currently not exposed in the wild, so might be ok-ish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want it in both "SUCCESS" cases since a buffer is instantiated, otherwise we would leak the (1-byte) retry buffer each time no?

Comment on lines 235 to 240
if (res.length === 1 && res[0] == 84) {
return null;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative would be returning an arrow error with custom error code, unsure if that is properly supported, let me check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's the other option we considered, and I agree it should work just as well. We felt this way was the most straightforward and explicit compared to abuse a specific Arrow error code.

@carlopi carlopi self-requested a review May 5, 2025 09:18
@carlopi
Copy link
Collaborator

carlopi commented May 5, 2025

Actually not properly approved, left some comments that I need to double check.

@carlopi
Copy link
Collaborator

carlopi commented May 6, 2025

@Y--: I think the best path forward is wrapping Uint8Array in an object that can then hold also additional informations about what kind of buffer is returned and allows more flexibility, this is still needed going forward.

At that point instead of the magic byte a duckdb-wasm level error can be cleanly returned with a payload.

@carlopi
Copy link
Collaborator

carlopi commented May 6, 2025

@Y--: I think the best path forward is wrapping Uint8Array in an object that can then hold also additional informations about what kind of buffer is returned and allows more flexibility, this is still needed going forward.

At that point instead of the magic byte a duckdb-wasm level error can be cleanly returned with a payload.

I will set this up and ping back.

@Y--
Copy link
Contributor Author

Y-- commented May 6, 2025

@Y--: I think the best path forward is wrapping Uint8Array in an object that can then hold also additional informations about what kind of buffer is returned and allows more flexibility, this is still needed going forward.

At that point instead of the magic byte a duckdb-wasm level error can be cleanly returned with a payload.

I can try to work on that ~ my main concern is to do this preserving the 0-copy transfer as much as possible, but hopefully the Arrow API can help here.

@Y--
Copy link
Contributor Author

Y-- commented May 6, 2025

@Y--: I think the best path forward is wrapping Uint8Array in an object that can then hold also additional informations about what kind of buffer is returned and allows more flexibility, this is still needed going forward.
At that point instead of the magic byte a duckdb-wasm level error can be cleanly returned with a payload.

I can try to work on that ~ my main concern is to do this preserving the 0-copy transfer as much as possible, but hopefully the Arrow API can help here.

Sadly it looks like ConcatenateBuffers does copy its input.

But we can use SerializeRecordBatch with a buffer that we manage ourselves instead. That looks reasonably easy, I'll give it a shot later today.

@Y-- Y-- force-pushed the yl/ensure-fetch-ready branch from dbb95ca to 8bfe08b Compare May 6, 2025 08:24
@Y--
Copy link
Contributor Author

Y-- commented May 6, 2025

@Y--: I think the best path forward is wrapping Uint8Array in an object that can then hold also additional informations about what kind of buffer is returned and allows more flexibility, this is still needed going forward.
At that point instead of the magic byte a duckdb-wasm level error can be cleanly returned with a payload.

I can try to work on that ~ my main concern is to do this preserving the 0-copy transfer as much as possible, but hopefully the Arrow API can help here.

Sadly it looks like ConcatenateBuffers does copy its input.

But we can use SerializeRecordBatch with a buffer that we manage ourselves instead. That looks reasonably easy, I'll give it a shot later today.

Implemented here - 8bfe08b (#1996)

@carlopi
Copy link
Collaborator

carlopi commented May 6, 2025

I ended up sketching an alternative here: main...carlopi:duckdb-wasm:refs/heads/wrap_fetch, what do you think? I might clean up and PR that, then you can add your on top

@carlopi
Copy link
Collaborator

carlopi commented May 7, 2025

Hi, I ended up with #2000, after which it should be somewhat trivial to add this on top, it should be a matter of adding two enums, one at lib/include/duckdb/web/webdb.h on the C++ side and one at packages/duckdb-wasm/src/status.ts in the typescript layer.

Plus adding the relevant handling in fetchQueryResults such as return null;

Would you mind taking a look, feedback is welcome

@Y--
Copy link
Contributor Author

Y-- commented May 7, 2025

Hi, I ended up with #2000, after which it should be somewhat trivial to add this on top, it should be a matter of adding two enums, one at lib/include/duckdb/web/webdb.h on the C++ side and one at packages/duckdb-wasm/src/status.ts in the typescript layer.

Plus adding the relevant handling in fetchQueryResults such as return null;

Would you mind taking a look, feedback is welcome

Had a quick look and it look very reasonable. I would need to look in more detail to make sure we don't copy the buffer un-necessarily, but I assume you already made sure of it?

@carlopi
Copy link
Collaborator

carlopi commented May 7, 2025

I would need to look in more detail to make sure we don't copy the buffer un-necessarily, but I assume you already made sure of it?

I will check this myself and verify this is the case. I would be inclined to say so, but the question is reasonable and better to get another look.

@Y-- Y-- force-pushed the yl/ensure-fetch-ready branch from e82805f to a3462d0 Compare May 9, 2025 14:04
@Y--
Copy link
Contributor Author

Y-- commented May 9, 2025

Thanks for landing #2000 @carlopi ! I've rebased this PR on it, let me know if it needs more help :-)

Copy link
Collaborator

@carlopi carlopi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@carlopi carlopi merged commit d9e0a4f into duckdb:main May 10, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants