Skip to content

[SPARK-41404][SQL] Refactor ColumnVectorUtils#toBatch to make ColumnarBatchSuite#testRandomRows test more primitive dataType#38933

Closed
LuciferYang wants to merge 11 commits intoapache:masterfrom
LuciferYang:toBatch-bugfix
Closed

[SPARK-41404][SQL] Refactor ColumnVectorUtils#toBatch to make ColumnarBatchSuite#testRandomRows test more primitive dataType#38933
LuciferYang wants to merge 11 commits intoapache:masterfrom
LuciferYang:toBatch-bugfix

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Dec 6, 2022

What changes were proposed in this pull request?

This pr refactor ColumnVectorUtils#toBatch to make ColumnarBatchSuite#testRandomRows to test more primitive dataType.

Why are the changes needed?

Support ColumnarBatchSuite#testRandomRows to test more primitive dataType

Does this PR introduce any user-facing change?

No, just for test

How was this patch tested?

Pass GitHub Actions

@github-actions github-actions bot added the SQL label Dec 6, 2022
@LuciferYang LuciferYang changed the title [DONT MERGE][TESTS] Support ColumnarBatchSuite#testRandomRows to test more primitive dataType [DONT MERGE][SQL][TESTS] Support ColumnarBatchSuite#testRandomRows to test more primitive dataType Dec 6, 2022
@LuciferYang LuciferYang changed the title [DONT MERGE][SQL][TESTS] Support ColumnarBatchSuite#testRandomRows to test more primitive dataType [DON'T MERGE][SQL][TESTS] Support ColumnarBatchSuite#testRandomRows to test more primitive dataType Dec 6, 2022
dst.getChild(2).appendLong(c.microseconds);
} else if (t instanceof DateType) {
dst.appendInt(DateTimeUtils.fromJavaDate((Date)o));
if (o instanceof Date) {
Copy link
Contributor Author

@LuciferYang LuciferYang Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan As I mentioned at #38873 (comment)

ColumnVectorUtils#toBatch and related appendValue methods only used by test cases:

  1. this pr add more primitive dataType support for toBatch method and add more test in ColumnarBatchSuite#testRandomRows, do you think this is valuable? I haven't created Jira.

  2. ColumnVectorUtils#toBatch only used by test, should we move it from ColumnVectorUtils to a test only helper class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should support all data types, but probably not all java types since it's test-only. We can require tests to only use Date or LocalDate.

For 2, I think it's fine to keep it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, change to only test Date and Timestamp due they are default

@LuciferYang LuciferYang changed the title [DON'T MERGE][SQL][TESTS] Support ColumnarBatchSuite#testRandomRows to test more primitive dataType [SPARK-41404][SQL][TESTS] Support ColumnarBatchSuite#testRandomRows to test more primitive dataType Dec 6, 2022
@LuciferYang LuciferYang changed the title [SPARK-41404][SQL][TESTS] Support ColumnarBatchSuite#testRandomRows to test more primitive dataType [SPARK-41404][SQL][TESTS] Refactor ColumnVectorUtils#toBatch to make ColumnarBatchSuite#testRandomRows test more dataType Dec 6, 2022
@LuciferYang LuciferYang changed the title [SPARK-41404][SQL][TESTS] Refactor ColumnVectorUtils#toBatch to make ColumnarBatchSuite#testRandomRows test more dataType [SPARK-41404][SQL][TESTS] Refactor ColumnVectorUtils#toBatch to make ColumnarBatchSuite#testRandomRows test more dataType Dec 6, 2022
@LuciferYang LuciferYang changed the title [SPARK-41404][SQL][TESTS] Refactor ColumnVectorUtils#toBatch to make ColumnarBatchSuite#testRandomRows test more dataType [SPARK-41404][SQL][TESTS] Refactor ColumnVectorUtils#toBatch to make ColumnarBatchSuite#testRandomRows test more primitive dataType Dec 7, 2022
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-41404][SQL][TESTS] Refactor ColumnVectorUtils#toBatch to make ColumnarBatchSuite#testRandomRows test more primitive dataType [SPARK-41404][SQL] Refactor ColumnVectorUtils#toBatch to make ColumnarBatchSuite#testRandomRows test more primitive dataType Dec 9, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.
Since this patch still touches sql/core/src/main, I removed [TESTS] from the PR title.

@LuciferYang
Copy link
Contributor Author

+1, LGTM. Since this patch still touches sql/core/src/main, I removed [TESTS] from the PR title.

Thanks @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

Merged to master.

beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…mnarBatchSuite#testRandomRows` test more primitive dataType

### What changes were proposed in this pull request?
This pr refactor `ColumnVectorUtils#toBatch` to make `ColumnarBatchSuite#testRandomRows` to test more primitive dataType.

### Why are the changes needed?
Support `ColumnarBatchSuite#testRandomRows` to test more primitive dataType

### Does this PR introduce _any_ user-facing change?
No, just for test

### How was this patch tested?
Pass GitHub Actions

Closes apache#38933 from LuciferYang/toBatch-bugfix.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments