-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-2144: [JS] Don't repeat dictionary lookups in DataFrame ops #1599
Conversation
js/src/vector.ts
Outdated
@@ -414,6 +414,16 @@ export class DictionaryVector<T extends DataType = DataType> extends Vector<Dict | |||
} | |||
public getKey(index: number) { return this.indicies.get(index); } | |||
public getValue(key: number) { return this.dictionary.get(key); } | |||
public reverseLookup(value: T): number|undefined { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trxcllnt do you think it makes sense to implement reverseLookup
here, or should it be in the view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheNeuralBit I think this is in the right place, since it's specific to DictionaryVectors. The View interface is intended to provide a minimal common interface for views over the the Vector buffer layouts (aka the Vector Data types), somewhat analogously to the DataView/TypedArray relationship with ArrayBuffer.
js/src/vector.ts
Outdated
public reverseLookup(value: T): number|undefined { | ||
let key = -1; | ||
for (let len = this.dictionary.length; ++key < len;) { | ||
if (this.dictionary.get(key) === value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheNeuralBit is it worthwhile to cache the dictionary lookup in a local variable to avoid the repeated this.dictionary
lookups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, I'll make that change
81b4a5f
to
e6f34a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment about naming and another about comparing structured types
js/src/vector.ts
Outdated
@@ -28,6 +28,7 @@ export interface View<T extends DataType> { | |||
get(index: number): T['TValue'] | null; | |||
set(index: number, value: T['TValue']): void; | |||
toArray(): IterableArrayLike<T['TValue'] | null>; | |||
find(search: T['TValue']): number | null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheNeuralBit how do you feel about mirroring the Array/TypedArray prototypes' naming convention and calling this indexOf
? The built-ins also have find
and findIndex
methods that take predicate functions, and may help JS folks already familiar with these APIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I even used indexOf
to generate expected values in the unit tests... Not sure why I stubbornly refused to mirror that naming convention, but that's definitely the way to go
js/src/vector/nested.ts
Outdated
public find(search: T['TValue']) { | ||
let index = 0; | ||
for (let value of this) { | ||
if (value === search) { return index; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheNeuralBit I saw the special-cased every()
call in FixedSizeListView
's find method, so I suspect you're already aware of the question I have here: since get(i)
on a NestedView returns a structured clone of the nested element, how do we make this work? I know in C++ they have visitors that compare vectors of the same type together, and that sounds like what we really need here.
@TheNeuralBit I think Paul is on vacation this week -- is this good to go? I can cut the JS release candidate later today once we get all the pending patches in |
@wesm yeah I think this is good to go. I was waiting on a final 👍 from Paul, but we've already talked through the changes and I've updated as necessary. I guess I can try to follow the directions here to merge this myself now? I'd also be interested in helping out with the release process if possible |
Sure, want to try to merge this to see if your gitbox stuff is set up right? |
Sure! I just tried it out, but I'm getting an auth error from the GitHub API: "Must specify two-factor authentication OTP code." |
The optimized
Equals
predicate now caches its reverse dictionary lookup for subsequentbind
calls. Also addsDictionaryVector.reverseLookup(value)
andVector.indexOf(value)
.