Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Add ListVector#setInitialCapacity overload that takes an exact inner capacity #37703

Closed
lidavidm opened this issue Sep 13, 2023 · 9 comments · Fixed by #37838
Closed

Comments

@lidavidm
Copy link
Member

Describe the enhancement requested

ListVector (or really, its abstract base class) adds an overload of setInitialCapacity that also preallocates the inner vector based on the "density" of the list items. However sometimes you know the number of inner elements up front, so it would be clearer (and less error-prone) to just tell the vector this fact.

You could do this by manipulating the inner vector directly but given we have the other overload, this one also makes sense.

Component(s)

Java

@jduo
Copy link
Member

jduo commented Sep 18, 2023

take

@jduo
Copy link
Member

jduo commented Sep 18, 2023

Not sure if I follow this issue. There is already setInitialCapacity(int numRecords) on both BaseRepeatedValueVector and ListVector so I'm not clear on what's missing.

@lidavidm
Copy link
Member Author

There's a setInitialCapacity (linked above) that also sets the child vector's capacity at the same time, but only works off an estimate of the child vector's capacity; it would make sense to also have a version that is exact

@jduo
Copy link
Member

jduo commented Sep 18, 2023

This seems like adding a setInitialCapacity(int, int) overload for specifying the exact number of elements in each inner list.

@lidavidm
Copy link
Member Author

Yes, except I think the latter parameter would be the total number of elements across all lists

@jduo
Copy link
Member

jduo commented Sep 18, 2023

Hmm, that seems error prone though. That there's a setInitialCapacity(int, double) where the last parameter is the density/elements per list and another that's setInitialCapacity(int, int) where the last parameter is the total number of elements across all lists.

@lidavidm
Copy link
Member Author

Fair point; we could give the latter a different name? Or we could cancel this and have @davisusanibar document how to explicitly set the list and child capacities separately.

@jduo
Copy link
Member

jduo commented Sep 19, 2023

I'm favoring creating a separate method setInitialTotalCapacity() or something along those lines.

@lidavidm
Copy link
Member Author

Sounds good to me.

jduo added a commit to jduo/arrow that referenced this issue Sep 23, 2023
… in ListVector

Add setInitialTotalCapacity() to BaseRepeatedVector and ListVector to specify
the exact total number of records in the backing vector.

This is an alternative to using the density argument in setInitialCapacity()
that allows the caller to precisely specify the capacity.
jduo added a commit to jduo/arrow that referenced this issue Sep 25, 2023
… in ListVector

Add setInitialTotalCapacity() to BaseRepeatedVector and ListVector to specify
the exact total number of records in the backing vector.

This is an alternative to using the density argument in setInitialCapacity()
that allows the caller to precisely specify the capacity.
jduo added a commit to jduo/arrow that referenced this issue Sep 25, 2023
… in ListVector

Add setInitialTotalCapacity() to BaseRepeatedVector, ListVector and
LargeListVector to specify the exact total number of records in the
backing vector.

This is an alternative to using the density argument in setInitialCapacity()
that allows the caller to precisely specify the capacity.
lidavidm pushed a commit that referenced this issue Sep 26, 2023
…ctor (#37838)

### Rationale for this change
There is currently a setInitialCapacity() function that can be used to set a number of records and density factor when setting the capacity on a ListVector. A developer may want to specify the exact total number of records instead and can use the new methods introduced here.

### What changes are included in this PR?

Add setInitialTotalCapacity() to BaseRepeatedVector, ListVector, DensityAwareVector, and LargeListVector to specify the exact total number of records in the backing vector.

This is an alternative to using the density argument in setInitialCapacity() that allows the caller to precisely specify the capacity.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.

* Closes: #37703

Authored-by: James Duong <duong.james@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@lidavidm lidavidm added this to the 14.0.0 milestone Sep 26, 2023
etseidl pushed a commit to etseidl/arrow that referenced this issue Sep 28, 2023
…ListVector (apache#37838)

### Rationale for this change
There is currently a setInitialCapacity() function that can be used to set a number of records and density factor when setting the capacity on a ListVector. A developer may want to specify the exact total number of records instead and can use the new methods introduced here.

### What changes are included in this PR?

Add setInitialTotalCapacity() to BaseRepeatedVector, ListVector, DensityAwareVector, and LargeListVector to specify the exact total number of records in the backing vector.

This is an alternative to using the density argument in setInitialCapacity() that allows the caller to precisely specify the capacity.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.

* Closes: apache#37703

Authored-by: James Duong <duong.james@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
…ListVector (apache#37838)

### Rationale for this change
There is currently a setInitialCapacity() function that can be used to set a number of records and density factor when setting the capacity on a ListVector. A developer may want to specify the exact total number of records instead and can use the new methods introduced here.

### What changes are included in this PR?

Add setInitialTotalCapacity() to BaseRepeatedVector, ListVector, DensityAwareVector, and LargeListVector to specify the exact total number of records in the backing vector.

This is an alternative to using the density argument in setInitialCapacity() that allows the caller to precisely specify the capacity.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.

* Closes: apache#37703

Authored-by: James Duong <duong.james@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…ListVector (apache#37838)

### Rationale for this change
There is currently a setInitialCapacity() function that can be used to set a number of records and density factor when setting the capacity on a ListVector. A developer may want to specify the exact total number of records instead and can use the new methods introduced here.

### What changes are included in this PR?

Add setInitialTotalCapacity() to BaseRepeatedVector, ListVector, DensityAwareVector, and LargeListVector to specify the exact total number of records in the backing vector.

This is an alternative to using the density argument in setInitialCapacity() that allows the caller to precisely specify the capacity.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.

* Closes: apache#37703

Authored-by: James Duong <duong.james@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…ListVector (apache#37838)

### Rationale for this change
There is currently a setInitialCapacity() function that can be used to set a number of records and density factor when setting the capacity on a ListVector. A developer may want to specify the exact total number of records instead and can use the new methods introduced here.

### What changes are included in this PR?

Add setInitialTotalCapacity() to BaseRepeatedVector, ListVector, DensityAwareVector, and LargeListVector to specify the exact total number of records in the backing vector.

This is an alternative to using the density argument in setInitialCapacity() that allows the caller to precisely specify the capacity.

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.

* Closes: apache#37703

Authored-by: James Duong <duong.james@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants