Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

RowCoverter::convert that targets an existing Rows #4479

Closed
alamb opened this issue Jul 6, 2023 · 4 comments 路 Fixed by #4541
Closed

RowCoverter::convert that targets an existing Rows #4479

alamb opened this issue Jul 6, 2023 · 4 comments 路 Fixed by #4541
Assignees
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Jul 6, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I am trying to make group by really fast in DataFusion apache/datafusion#6800

The grouping code uses the Arrow Row format 馃憤 and calls RowConverter::convert_columns

My traces imply that some non trivial amount of time is spent zeroing out newly allocated memory for Rows:

Screenshot 2023-07-06 at 5 26 38 PM

Describe the solution you'd like

I would like a method like RowConverter::convert_columns_in_place that writes to an pre-existing rows (clearing it out first)

Perhaps something like

impl RowConverter {
...
    /// Convert [`ArrayRef`] columns into pre-existing [`Rows`], first calling `Rows::reset()` 
    ///
    /// See [`Row`] for information on when [`Row`] can be compared
    ///
    /// # Panics
    ///
    /// Panics if the schema of `columns` does not match that provided to [`RowConverter::new`]
    pub fn convert_columns_in_place(&mut self, columns: &[ArrayRef], dst &mut Rows) -> Result<ArrowError> {

https://docs.rs/arrow-row/43.0.0/src/arrow_row/lib.rs.html#712

Describe alternatives you've considered

Additional context

@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Jul 6, 2023
@tustvold
Copy link
Contributor

tustvold commented Jul 7, 2023

I'm not very familiar with the Instruments profile, but it seems off to me that alloc_zeroed is under update_group_state and not convert_columns? If the allocation was occurring as part of row conversion I would have perhaps expected it to appear under convert_columns? Is it possible there are other allocations taking place here?

@alamb
Copy link
Contributor Author

alamb commented Jul 7, 2023

I couldn't find anything else that does allocations per group, but I agree the profile is somewhat ambiguous

@alamb
Copy link
Contributor Author

alamb commented Jul 7, 2023

(BTW this is like 2% of the total time, so more like a nice to have feature, rather than anything critical or really important)

@alamb alamb self-assigned this Jul 16, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue Jul 18, 2023
tustvold added a commit that referenced this issue Jul 18, 2023
* Add RowConverter::append (#4479)

* Add overwrite test
@tustvold
Copy link
Contributor

label_issue.py automatically added labels {'arrow'} from #4533

@tustvold tustvold added the arrow Changes to the arrow crate label Jul 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
2 participants