Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-2144: [JS] Don't repeat dictionary lookups in DataFrame ops #1599

Closed
wants to merge 9 commits into from

Conversation

TheNeuralBit
Copy link
Member

@TheNeuralBit TheNeuralBit commented Feb 12, 2018

The optimized Equals predicate now caches its reverse dictionary lookup for subsequent bind calls. Also adds DictionaryVector.reverseLookup(value) and Vector.indexOf(value).

js/src/vector.ts Outdated
@@ -414,6 +414,16 @@ export class DictionaryVector<T extends DataType = DataType> extends Vector<Dict
}
public getKey(index: number) { return this.indicies.get(index); }
public getValue(key: number) { return this.dictionary.get(key); }
public reverseLookup(value: T): number|undefined {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trxcllnt do you think it makes sense to implement reverseLookup here, or should it be in the view?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheNeuralBit I think this is in the right place, since it's specific to DictionaryVectors. The View interface is intended to provide a minimal common interface for views over the the Vector buffer layouts (aka the Vector Data types), somewhat analogously to the DataView/TypedArray relationship with ArrayBuffer.

js/src/vector.ts Outdated
public reverseLookup(value: T): number|undefined {
let key = -1;
for (let len = this.dictionary.length; ++key < len;) {
if (this.dictionary.get(key) === value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheNeuralBit is it worthwhile to cache the dictionary lookup in a local variable to avoid the repeated this.dictionary lookups?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, I'll make that change

Copy link
Contributor

@trxcllnt trxcllnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment about naming and another about comparing structured types

js/src/vector.ts Outdated
@@ -28,6 +28,7 @@ export interface View<T extends DataType> {
get(index: number): T['TValue'] | null;
set(index: number, value: T['TValue']): void;
toArray(): IterableArrayLike<T['TValue'] | null>;
find(search: T['TValue']): number | null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheNeuralBit how do you feel about mirroring the Array/TypedArray prototypes' naming convention and calling this indexOf? The built-ins also have find and findIndex methods that take predicate functions, and may help JS folks already familiar with these APIs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I even used indexOf to generate expected values in the unit tests... Not sure why I stubbornly refused to mirror that naming convention, but that's definitely the way to go

public find(search: T['TValue']) {
let index = 0;
for (let value of this) {
if (value === search) { return index; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheNeuralBit I saw the special-cased every() call in FixedSizeListView's find method, so I suspect you're already aware of the question I have here: since get(i) on a NestedView returns a structured clone of the nested element, how do we make this work? I know in C++ they have visitors that compare vectors of the same type together, and that sounds like what we really need here.

@wesm
Copy link
Member

wesm commented Feb 19, 2018

@TheNeuralBit I think Paul is on vacation this week -- is this good to go? I can cut the JS release candidate later today once we get all the pending patches in

@TheNeuralBit
Copy link
Member Author

@wesm yeah I think this is good to go. I was waiting on a final 👍 from Paul, but we've already talked through the changes and I've updated as necessary. I guess I can try to follow the directions here to merge this myself now?

I'd also be interested in helping out with the release process if possible

@wesm
Copy link
Member

wesm commented Feb 19, 2018

Sure, want to try to merge this to see if your gitbox stuff is set up right?

@TheNeuralBit
Copy link
Member Author

Sure! I just tried it out, but I'm getting an auth error from the GitHub API: "Must specify two-factor authentication OTP code."
Did I miss some configuration step?

@wesm wesm deleted the cache-dictionary branch February 19, 2018 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants