Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6566][SQL]: Related changes for newer parquet version #5889

Closed
wants to merge 3 commits into from

Conversation

saucam
Copy link

@saucam saucam commented May 4, 2015

This brings in major improvement in that footers are not read on the driver. This also cleans up the code in parquetTableOperations, where we had to override getSplits to eliminate multiple listStatus calls.

cc @liancheng

are there any other changes we need for this ?

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 4, 2015

Test build #31760 has started for PR 5889 at commit 3e3cbf9.

@SparkQA
Copy link

SparkQA commented May 4, 2015

Test build #31760 has finished for PR 5889 at commit 3e3cbf9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31760/
Test PASSed.

@liancheng
Copy link
Contributor

Thanks for doing this! This has been on my todo list for a long time :) We're currently terribly busy working on features that are expected to be delivered in 1.4 release. Will come back to this once I finish my work at hand.

@liancheng
Copy link
Contributor

(I'm so glad that we can finally remove the messy getSplit hacky stuff!)

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32089 has started for PR 5889 at commit 7e8db22.

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32089 has finished for PR 5889 at commit 7e8db22.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SetInFilter[T <: Comparable[T]](

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32089/
Test PASSed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 27, 2015

Test build #33570 has started for PR 5889 at commit 695f6d9.

@SparkQA
Copy link

SparkQA commented May 27, 2015

Test build #33570 has finished for PR 5889 at commit 695f6d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SetInFilter[T <: Comparable[T]](

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33570/
Test PASSed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@saucam
Copy link
Author

saucam commented Jun 5, 2015

cc @liancheng

I have rebased.
can we retest this ? How to determine what is failing ?

@liancheng
Copy link
Contributor

ok to test

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34285 has started for PR 5889 at commit c9aa042.

@liancheng
Copy link
Contributor

@saucam It seemed to be Jenkins issue rather than the problem of you PR.

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34285 has finished for PR 5889 at commit c9aa042.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SetInFilter[T <: Comparable[T]](

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

if (value == null) {
return false
}
return hSet.contains(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we try to avoid using return in Scala code. (You may find a detailed guide here.) For this method, we'd prefer:

value != null && hSet.contains(value)

BTW, why the name hSet? Maybe valueSet instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, just saw the hset field in InSet, still prefer valueSet here though...

@JoshRosen
Copy link
Contributor

May want to update the title of this PR to be clearer, since it looks like the dep. bump was done in a separate PR.

@saucam saucam changed the title [SPARK-6566][SQL]: Change parquet version to latest release [SPARK-6566][SQL]: Related changes for newer parquet version Jun 5, 2015
@saucam
Copy link
Author

saucam commented Jun 5, 2015

incorporated review comments

retest please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34291 has started for PR 5889 at commit d1bf41e.

@SparkQA
Copy link

SparkQA commented Jun 5, 2015

Test build #34291 has finished for PR 5889 at commit d1bf41e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@saucam
Copy link
Author

saucam commented Jun 11, 2015

@liancheng looks ok to you now ?

@liancheng
Copy link
Contributor

@saucam Sorry for the late reply. This LGTM now. The inefficient code path in Parquet still exists (sequentially retrieving FileStatus), but now it only affects client side metadata retrieving, which is deprecated. So I'm going to merge this to master. Thanks for working on this!

@asfgit asfgit closed this in e428b3a Jun 12, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This brings in major improvement in that footers are not read on the driver. This also cleans up the code in parquetTableOperations, where we had to override getSplits to eliminate multiple listStatus calls.

cc liancheng

are there any other changes we need for this ?

Author: Yash Datta <Yash.Datta@guavus.com>

Closes apache#5889 from saucam/parquet_1.6 and squashes the following commits:

d1bf41e [Yash Datta] SPARK-7340: Fix scalastyle and incorporate review comments
c9aa042 [Yash Datta] SPARK-7340: Use the new user defined filter predicate for pushing down inset into parquet
56bc750 [Yash Datta] SPARK-7340: Change parquet version to latest release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants