Skip to content

Conversation

@comphead
Copy link
Contributor

Which issue does this PR close?

Closes #2449.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@comphead comphead closed this May 13, 2022
@comphead comphead reopened this May 13, 2022
@comphead comphead changed the title unbox scalars Fix Redundant ScalarValue Boxed Collection May 13, 2022
@comphead comphead marked this pull request as ready for review May 13, 2022 15:08
@alamb alamb mentioned this pull request May 13, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @comphead

They key test is that size_of_scalar is not changed

https://github.com/apache/arrow-datafusion/blob/41b4e491663029f653e491b110d0b5e74d08a0b6/datafusion/core/src/scalar.rs#L373-L383

However, when I investigated more it turns out that this test was broken (see #2531)

When I reenabled the test it shows that this PR makes the size of ScalarValue increase from 48 bytes to 64 bytes

cc @tustvold

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR removes a non-redundant Box<DataType>. Is it possible this is the cause of the size change?

/// list of nested ScalarValue (boxed to reduce size_of(ScalarValue))
#[allow(clippy::box_collection)]
List(Option<Box<Vec<ScalarValue>>>, Box<DataType>),
List(Option<Vec<ScalarValue>>, DataType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about unboxing DataType here

@tustvold
Copy link
Contributor

So I got this to not change the size by making

List(Option<Vec<ScalarValue>>, Box<DataType>),
Struct(Option<Vec<ScalarValue>>, Box<[Field]>),

@alamb What do you think about Box<[Field]>?

comphead and others added 2 commits May 15, 2022 01:21
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@comphead
Copy link
Contributor Author

comphead commented May 15, 2022

Thanks @alamb @tustvold for the feedback, please let me know the final decision on what exactly has to be boxed.

@alamb
Copy link
Contributor

alamb commented May 16, 2022

@alamb What do you think about Box<[Field]>?

It seems fine to me

@comphead
Copy link
Contributor Author

so we going with changes below?

List(Option<Vec<ScalarValue>>, Box<DataType>),
Struct(Option<Vec<ScalarValue>>, Box<Vec<Field>>),

@tustvold
Copy link
Contributor

If you prefer Box<Vec<Field>> over Box<[Field]>, I don't feel strongly. The latter avoids a pointer indirection, and associated memory allocation, but the former is perhaps more consistent with rest of the enumeration (which uses Vec despite not actually needing to).

@comphead
Copy link
Contributor Author

@tustvold I added boxing back.
For now its

Struct(Option<Vec<ScalarValue>>, Box<Vec<Field>>),

as its more consistent, you are right. If you feel we should go to Box<[Field]> let me know

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great @comphead -- thank you very much for sticking with it.

I also merged this PR locally with apache/master to get #2531 and verified that the tests still pass 🎉

for c in 0..columns.len() {
let column = columns.get_mut(c).unwrap();
column.push(values[c].clone());
for (i, v) in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this is a nicer implementation

Copy link
Contributor Author

@comphead comphead May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clippy complained onfor c in 0..columns.len() :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the better that removing a layer of Boxing lets clippy fix things more easily 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A perhaps more idiomatic way to write this would be something like

for (column, value) in columns.iter_mut().zip(values)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thats more concise. Fixed in 2 places.

@alamb
Copy link
Contributor

alamb commented May 16, 2022

I'll wait until tomorrow to see if @tustvold has any more comments, and if not I'll merge it in

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a minor nit and a questionable test change, this looks good to me 👍

Thank you, and apologies for the back and forth

for c in 0..columns.len() {
let column = columns.get_mut(c).unwrap();
column.push(values[c].clone());
for (i, v) in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A perhaps more idiomatic way to write this would be something like

for (column, value) in columns.iter_mut().zip(values)

Some(Box::new(vec![Int64(Some(1)), Int64(Some(5))])),
Box::new(DataType::Int64),
Some(vec![Int64(Some(1)), Int64(Some(5))]),
Box::new(DataType::Int32),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change in type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Thanks for finding that

@tustvold tustvold merged commit c924536 into apache:master May 17, 2022
@tustvold tustvold added the api change Changes the API exposed to users of the crate label May 17, 2022
@alamb
Copy link
Contributor

alamb commented May 17, 2022

image
👍

@comphead comphead deleted the unbox_scalar_merged branch September 8, 2022 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redundant ScalarValue Boxed Collection

3 participants