Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32813][SQL] Get default config of ParquetSource vectorized reader if no active SparkSession #29667

Closed
wants to merge 7 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Sep 8, 2020

What changes were proposed in this pull request?

If no active SparkSession is available, let FileSourceScanExec.needsUnsafeRowConversion look at default SQL config of ParquetSource vectorized reader instead of failing the query execution.

Why are the changes needed?

Fix a bug that if no active SparkSession is available, file-based data source scan for Parquet Source will throw exception.

Does this PR introduce any user-facing change?

Yes, this change fixes the bug.

How was this patch tested?

Unit test.

@HyukjinKwon
Copy link
Member

The change itself looks good.


withTempDir { tempDir =>
try {
val tablePath = tempDir.toString + "/table"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why don't we use tempDir directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will complain ... dir was already existing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, can we try withTempPath?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is ParquetSource-only bug and fix, could you narrow-down the PR title and description? Or, is this applicable for the other code path, @viirya ?

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@SparkQA

This comment has been minimized.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Sep 8, 2020

Test build #128386 has finished for PR 29667 at commit b89eccc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya viirya changed the title [SPARK-32813][SQL] Get default config of vectorized reader if no active SparkSession [SPARK-32813][SQL] Get default config of ParquetSource vectorized reader if no active SparkSession Sep 8, 2020
@viirya
Copy link
Member Author

viirya commented Sep 8, 2020

@dongjoon-hyun I updated the PR title and description. Thank you.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you for updating, @viirya .

@SparkQA
Copy link

SparkQA commented Sep 9, 2020

Test build #128426 has finished for PR 29667 at commit 568176d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

HyukjinKwon pushed a commit that referenced this pull request Sep 9, 2020
…der if no active SparkSession

### What changes were proposed in this pull request?

If no active SparkSession is available, let `FileSourceScanExec.needsUnsafeRowConversion` look at default SQL config of ParquetSource vectorized reader instead of failing the query execution.

### Why are the changes needed?

Fix a bug that if no active SparkSession is available, file-based data source scan for Parquet Source will throw exception.

### Does this PR introduce _any_ user-facing change?

Yes, this change fixes the bug.

### How was this patch tested?

Unit test.

Closes #29667 from viirya/SPARK-32813.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit de0dc52)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@HyukjinKwon
Copy link
Member

Merged to master and branch-3.0.

@viirya
Copy link
Member Author

viirya commented Sep 9, 2020

Thanks all!

holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…der if no active SparkSession

### What changes were proposed in this pull request?

If no active SparkSession is available, let `FileSourceScanExec.needsUnsafeRowConversion` look at default SQL config of ParquetSource vectorized reader instead of failing the query execution.

### Why are the changes needed?

Fix a bug that if no active SparkSession is available, file-based data source scan for Parquet Source will throw exception.

### Does this PR introduce _any_ user-facing change?

Yes, this change fixes the bug.

### How was this patch tested?

Unit test.

Closes apache#29667 from viirya/SPARK-32813.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit de0dc52)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
asiunov pushed a commit to ascend-io/spark that referenced this pull request May 26, 2021
…der if no active SparkSession

### What changes were proposed in this pull request?

If no active SparkSession is available, let `FileSourceScanExec.needsUnsafeRowConversion` look at default SQL config of ParquetSource vectorized reader instead of failing the query execution.

### Why are the changes needed?

Fix a bug that if no active SparkSession is available, file-based data source scan for Parquet Source will throw exception.

### Does this PR introduce _any_ user-facing change?

Yes, this change fixes the bug.

### How was this patch tested?

Unit test.

Closes apache#29667 from viirya/SPARK-32813.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@viirya viirya deleted the SPARK-32813 branch December 27, 2023 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants