Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21750][SQL] Use Arrow 0.6.0 #18974

Closed
wants to merge 3 commits into from
Closed

Conversation

kiszk
Copy link
Member

@kiszk kiszk commented Aug 17, 2017

What changes were proposed in this pull request?

Update Arrow version to 0.6.0
Here is a release note.

How was this patch tested?

Existing tests

@SparkQA
Copy link

SparkQA commented Aug 17, 2017

Test build #80795 has finished for PR 18974 at commit 3f5cf45.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 17, 2017

Test build #80797 has finished for PR 18974 at commit c336d0f.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 17, 2017

Test build #80799 has finished for PR 18974 at commit 440331a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@kiszk
Copy link
Member Author

kiszk commented Aug 18, 2017

ping @srowen @ueshin @BryanCutler

1 similar comment
@kiszk
Copy link
Member Author

kiszk commented Aug 22, 2017

ping @srowen @ueshin @BryanCutler

@BryanCutler
Copy link
Member

Thanks for this @kiszk. I was thinking we would need to do an upgrade for DecimalType support. I'm going to help out with that on the Arrow side, but it still might not be ready until 1 or 2 more releases. I'm not sure what the general Spark stance is on updating dependencies like Arrow, but I can say that I did test 0.6 myself and did not see anything that might cause issues. Maybe someone else can share the policies on upgrading?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.
Since Apache Spark 2.3.0 is the debut of Apache Arrow in Spark, I think we had better to use the latest one.

@dongjoon-hyun
Copy link
Member

According to #18459, I'm pining @holdenk and @cloud-fan , too.

@kiszk
Copy link
Member Author

kiszk commented Aug 22, 2017

I agree with the usage of the latest one in each release.

It would be good to periodically update a library if the library does not introduce drastic change (e.g. changing APIs) for ensuring stability for a longer period.
What do you think?

@cloud-fan
Copy link
Contributor

I'm ok with it if arrow 0.6.0 doesn't introduce new dependencies.

@ueshin
Copy link
Member

ueshin commented Aug 22, 2017

Do we need to upgrade pyarrow in Jenkins environment?
LGTM except for it.

@holdenk
Copy link
Contributor

holdenk commented Aug 22, 2017

as with @Rushing my primary concern with the upgrade here is the Jenkins interaction of when we eventually need to upgrade th Python side as well. It seems like we might not need to yet for 0.6.0 but I'm hesitant about having mixed versions since I believe that is explicitly not an arrow guarantee in general.

@kiszk
Copy link
Member Author

kiszk commented Aug 22, 2017

Thank you for comments. I understand issues regarding Jenkins interaction in the Python side. I close this PR for now since the Python side is not ready for upgrading Arrow.

@dongjoon-hyun
Copy link
Member

Hi, All.
Two more Arrow releases seem to be out. How about the Python side? Can we catch up some?

  • 0.7.1 (1 October 2017)
  • 0.7.0 (17 September 2017)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants