Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve passing buffer to wasm #17

Closed
domoritz opened this issue Feb 10, 2021 · 3 comments
Closed

Improve passing buffer to wasm #17

domoritz opened this issue Feb 10, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@domoritz
Copy link
Owner

domoritz commented Feb 10, 2021

Loading arrow from ipc will be a super common use case so it should be fast. We want to avoid as many memory copies as possible. We should look into what is happening and which copies we can avoid.

Try rustwasm/wasm-bindgen#1643 and rustwasm/wasm-bindgen#1993 and rustwasm/wasm-bindgen#2456

@domoritz domoritz added the enhancement New feature or request label Feb 10, 2021
@domoritz
Copy link
Owner Author

I tried the approach in rustwasm/wasm-bindgen#1993 (comment) and it was actually slower than taking an &[u8].

Maybe I made a mistake.

    pub fn from(buffer: &Buffer) -> Result<Table, JsValue> {
        let contents: Vec<u8> = Uint8Array::new_with_byte_offset_and_length(
            &buffer.buffer(),
            buffer.byte_offset(),
            buffer.length(),
        )
        .to_vec();
        let cursor = std::io::Cursor::new(contents);
        let reader = match arrow::ipc::reader::FileReader::try_new(cursor) {
            Ok(reader) => reader,
            Err(error) => return Err(format!("{}", error).into()),
        };

        let schema = reader.schema();
        match reader.collect() {
            Ok(record_batches) => Ok(Table {
                schema,
                record_batches,
            }),
            Err(error) => Err(format!("{}", error).into()),
        }
    }

@domoritz
Copy link
Owner Author

No more memory copies 🎉 .

Once https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamBYOBReader is out, we don't even need to make copies anymore (as long as we know the size of the IPC beforehand).

const filePath = path.join(__dirname, "./flights-1m.arrow");
const file = fs.readFileSync(filePath);

const wasmArray = new arrow_wasm.WasmUint8Array(file.length);
file.copy(wasmArray.view);

const table = arrow_wasm.Table.fromWasmUint8Array(wasmArray);

@domoritz
Copy link
Owner Author

Now it seems to take a long time to create the buffers in Arrow. I think the issue is https://issues.apache.org/jira/browse/ARROW-11696.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant