Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JS] Table getByteLength and indexOf don't work #20140

Closed
asfimport opened this issue Mar 4, 2022 · 2 comments
Closed

[JS] Table getByteLength and indexOf don't work #20140

asfimport opened this issue Mar 4, 2022 · 2 comments

Comments

@asfimport
Copy link

The functions table.getByteLength() and table.indexOf() don't return the correct values.

They are bound dynamically to the Table class, in a way I don't fully understand, with the following code:

arrow/js/src/table.ts

Lines 378 to 390 in 1b796ec

protected static [Symbol.toStringTag] = ((proto: Table) => {
(proto as any).schema = null;
(proto as any).batches = [];
(proto as any)._offsets = new Uint32Array([0]);
(proto as any)._nullCount = -1;
(proto as any)[Symbol.isConcatSpreadable] = true;
(proto as any)['isValid'] = wrapChunkedCall1(isChunkedValid);
(proto as any)['get'] = wrapChunkedCall1(getVisitor.getVisitFn(Type.Struct));
(proto as any)['set'] = wrapChunkedCall2(setVisitor.getVisitFn(Type.Struct));
(proto as any)['indexOf'] = wrapChunkedIndexOf(indexOfVisitor.getVisitFn(Type.Struct));
(proto as any)['getByteLength'] = wrapChunkedCall1(byteLengthVisitor.getVisitFn(Type.Struct));
return 'Table';
})(Table.prototype);

The other functions like that, get(), set(), and isValid() all seem to work.  However, getByteLength() and indexOf() return the placeholder/sentinel values of 0 and -1 respectively that are defined in the no-op code here: https://github.com/apache/arrow/blob/1b796ec3f9caeb5e86e3348ba940bef8d95915c5/js/src/table.ts#L207-L221, which I assume is to generate the right type definitions, and thus documentation.

It's fairly simple for a user to implement the right logic themselves (at least for getByteLength) and it's a quick patch to define the functions normally instead of on the prototype, e.g.:

 

    /**
     * Get the size in bytes of an element by index.
     * @param index The index at which to get the byteLength.
     */
    // @ts-ignore
    public getByteLength(index: number): number { return this.data[index].byteLength; }
    /**
     * Get the size in bytes of a table.
     */
    //@ts-ignore
    public getByteLength(): number { 
        return this.data.map((batch) => batch.byteLength).reduce((sum, newLength) => sum + newLength);
    } 

I'd be happy to send this as a PR if that's an OK alternative to the way it's currently implemented.

Here's a Github repo of a minimal reproduction of the issue in NodeJS:
https://github.com/alexkreidler/apache-arrow-js-small-bug
 
And an observable notebook for in the browser (although I couldn't get ESM working): https://observablehq.com/@08027ecfa2b2f7bb/arrow-7-canary
 
Thanks to all for your work on Arrow!

 

Reporter: Timothy Higinbottom
Assignee: Paul Taylor / @trxcllnt

PRs and other links:

Note: This issue was originally created as ARROW-15852. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Paul Taylor / @trxcllnt:
[~timhigins] Thanks for the report. In running your code, I discovered an oversight we made in the v7.0 refactor.

That said, I think your indexOf() call is incorrect – indexOf() is the inverse of get() such that this should assert true: table.indexOf(table.get(0)) === 0

In your case (looking up the index of a row), you want to pass the entire row contents to the table.indexOf() call like this:

const { tableFromArrays } = require('apache-arrow');

const t = tableFromArrays({
  a: [0, 1, 2],
  b: ["foo", "bar", "baz"]
});

console.log(t.getByteLength(0));
console.log(t.getByteLength(1));

console.log(t.indexOf({a: 0, b: "foo"}));
console.log(t.indexOf({a: 1, b: "bar"}));
console.log(t.indexOf({a: 2, b: "baz"}));

@asfimport
Copy link
Author

Dominik Moritz / @domoritz:
Issue resolved by pull request 12771
#12771

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants