Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement function slice for RecordBatch #490

Merged
merged 5 commits into from
Jun 25, 2021

Conversation

b41sh
Copy link
Contributor

@b41sh b41sh commented Jun 22, 2021

Which issue does this PR close?

Closes #460

Rationale for this change

slice can be used to handle part of RecordBatch

What changes are included in this PR?

Implement function slice for RecordBatch

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 22, 2021
pub fn slice(&self, offset: usize, length: usize) -> Result<RecordBatch> {
let schema = self.schema();
let num_columns = self.num_columns();
let new_columns = (0..num_columns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.columns().iter() ^_^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My local version is

    /// Slice all the arrays in the record batch.
    pub fn slice(&self, offset: usize, len: usize) -> Self {
        let columns = self
            .columns()
            .iter()
            .map(|array| array.slice(offset, len))
            .collect();
        Self {
            schema: self.schema.clone(),
            columns,
        }
    }

Does it make sense to go through RecordBatch::try_new()? It incurs some overhead checking that the schema and arrays match, when they already should match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgecarleitao thanks, I have modified, PTAL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nevi-me Thanks for your suggestion, I made a modification. PTAL

@@ -244,6 +244,21 @@ impl RecordBatch {
&self.columns[..]
}

/// Return a new RecordBatch where each column is sliced
/// according to `offset` and `length`
pub fn slice(&self, offset: usize, length: usize) -> Result<RecordBatch> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove the Result wrapping if nothing can fail.

@@ -426,6 +441,29 @@ mod tests {
assert_eq!(5, record_batch.column(1).data().len());
}

#[test]
fn create_record_batch_slice() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test for edge cases, e.g. empty length, over boundary offset, empty columns, empty batch, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimexist I have a question, if an empty RecordBatch calls slice(), the offset and length are zero, should we panic or return an empty RecordBatch slice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should return an empty slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should return an empty slice.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified, PTAL
@Dandandan @jimexist

@alamb
Copy link
Contributor

alamb commented Jun 23, 2021

Thank you @b41sh !

@codecov-commenter
Copy link

codecov-commenter commented Jun 24, 2021

Codecov Report

Merging #490 (9aacab1) into master (4c7d418) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #490      +/-   ##
==========================================
- Coverage   82.65%   82.64%   -0.02%     
==========================================
  Files         165      165              
  Lines       45524    45703     +179     
==========================================
+ Hits        37628    37769     +141     
- Misses       7896     7934      +38     
Impacted Files Coverage Δ
arrow/src/record_batch.rs 89.25% <100.00%> (+3.53%) ⬆️
arrow/src/array/transform/boolean.rs 76.92% <0.00%> (-7.70%) ⬇️
arrow/src/ffi.rs 78.71% <0.00%> (-4.15%) ⬇️
arrow/src/compute/kernels/temporal.rs 88.97% <0.00%> (-0.64%) ⬇️
parquet/src/encodings/encoding.rs 94.85% <0.00%> (-0.20%) ⬇️
arrow/src/array/array_list.rs 92.82% <0.00%> (ø)
arrow/src/compute/kernels/concat.rs 100.00% <0.00%> (ø)
arrow/src/array/equal/mod.rs 93.93% <0.00%> (+0.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c7d418...9aacab1. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @b41sh

@alamb
Copy link
Contributor

alamb commented Jun 24, 2021

Hi @b41sh. Thank you for the contribution.

It appears there are some issues that the CI tests found (clippy): https://github.com/apache/arrow-rs/pull/490/checks?check_run_id=2905850637

Can you please resolve them?

@b41sh
Copy link
Contributor Author

b41sh commented Jun 24, 2021

Hi @b41sh. Thank you for the contribution.

It appears there are some issues that the CI tests found (clippy): https://github.com/apache/arrow-rs/pull/490/checks?check_run_id=2905850637

Can you please resolve them?

@alamb thanks for your review, I have fixed the Clippy, PTAL

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks great. Thank you so much @b41sh !

@nevi-me nevi-me merged commit de62168 into apache:master Jun 25, 2021
@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Jul 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement RecordBatch::slice() to slice RecordBatches
7 participants