-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7314][SPARK-3524][PySpark] upgrade Pyrolite to 4.4 #5850
Conversation
Test build #31615 has finished for PR 5850 at commit
|
test this please |
Test build #31633 has finished for PR 5850 at commit
|
@irmen Is Pyrolite 4.4 fully compatible with Python 2.6.6? We saw a test failure for Python 2.6.6:
|
Could it be that you're seeing the effect of a bugfix (done in Pyrolite 4.2) regarding the previously incorrect encoding of floats? More info here irmen/Pyrolite#11 |
If you have the same error in a version before 4.2 it is not that particular fix that is to be blamed. However if the problems started in 4.2 it could very well be this particular bugfix. I suspect that your Sparks code has some form of workaround or wrapper around the previously incorrect decoding of floats. Now that pyrolite has fixed this, i guess the workaround or wrapper may mess things up if it's still active? If it is not that: I don't know what your failing unit test is. Maybe you should look into it, what does it test, what does it expect? Can you get hold of the offending pickle data, because that would be a nice place to start investigating. Pyrolite has a bunch of testcases itself too, do they all pass on your system? |
Test build #31702 has finished for PR 5850 at commit
|
@mengxr You might want to look into https://issues.apache.org/jira/browse/SPARK-3524, a followup JIRA that @davies created to remove a workaround introduced in #2365 for array unpickling. I suspect that rolling back that change might fix the test failure that you're seeing here. As long as you're upgrading Pyrolite, it would also be good to confirm whether the upgrade fixes these related issues (and link / close those JIRAs if this is the case): |
Yeah, I think it'd be good to add pyspark test cases for roundtripping datetimes with timezones at least as regression tests. https://issues.apache.org/jira/browse/SPARK-6917 might also be fixed by this. |
Regarding https://issues.apache.org/jira/browse/SPARK-6289 (PySpark doesn't maintain SQL date Types) .... I don't understand what Pyrolite may be doing wrong here? Because java.sql.Timestamp extends Date and also contains time information, not only a date. |
Thanks for the references! I reverted the machine code patch in #2365 and the tests are passing on my local machine. I'm looking at the datetime issue now. |
Test build #31721 has finished for PR 5850 at commit
|
It seems that we need more time to solve the timezone issue with date/datetime. I left the details on SPARK-6411. @irmen Did you try the bintray -> jcenter -> sonotype oss -> maven central route? If it takes time to publish on maven central, we can ask @pwendell to put the artifact to maven central under |
I haven't spent more time yet on getting the libs (serpent, pyrolite) onto maven central. |
@irmen Yes, the correct way is to publish serpent first and then pyrolite. Given the Spark 1.4 release window, maybe we are going to publish Pyrolite 4.4 on Maven Central under |
Will you be able to switch later, to a |
@irmen It is simple to switch to use your official artifacts in a later release. We can exclude @JoshRosen I'm using the 4.4 release on Maven Central now (thanks to @brkyvz). Could you make a final pass? It seems that the timezone issue is not yet fully resolved in the latest Pyrolite/Spark. We need another PR for it, which may miss 1.4. |
Test build #31799 has finished for PR 5850 at commit
|
Test build #31796 has finished for PR 5850 at commit
|
Test build #31798 has finished for PR 5850 at commit
|
LGTM. |
This PR upgrades Pyrolite to 4.4, which contains the bug fix for SPARK-3524 and some other performance improvements (e.g., SPARK-6288). The artifact is still under `org.spark-project` on Maven Central since there is no official release published there. Author: Xiangrui Meng <meng@databricks.com> Closes #5850 from mengxr/SPARK-7314 and squashes the following commits: 2ed4a95 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7314 da3c2dd [Xiangrui Meng] remove my repo fe7e29b [Xiangrui Meng] switch to maven central 6ddac0e [Xiangrui Meng] reverse the machine code for float/double d2d5b5b [Xiangrui Meng] change back to 4.4 7824a9c [Xiangrui Meng] use Pyrolite 3.1 cc3903a [Xiangrui Meng] upgrade Pyrolite to 4.4-0 for testing (cherry picked from commit e9b16e6) Signed-off-by: Xiangrui Meng <meng@databricks.com>
This PR upgrades Pyrolite to 4.4, which contains the bug fix for SPARK-3524 and some other performance improvements (e.g., SPARK-6288). The artifact is still under `org.spark-project` on Maven Central since there is no official release published there. Author: Xiangrui Meng <meng@databricks.com> Closes apache#5850 from mengxr/SPARK-7314 and squashes the following commits: 2ed4a95 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7314 da3c2dd [Xiangrui Meng] remove my repo fe7e29b [Xiangrui Meng] switch to maven central 6ddac0e [Xiangrui Meng] reverse the machine code for float/double d2d5b5b [Xiangrui Meng] change back to 4.4 7824a9c [Xiangrui Meng] use Pyrolite 3.1 cc3903a [Xiangrui Meng] upgrade Pyrolite to 4.4-0 for testing
This PR upgrades Pyrolite to 4.4, which contains the bug fix for SPARK-3524 and some other performance improvements (e.g., SPARK-6288). The artifact is still under `org.spark-project` on Maven Central since there is no official release published there. Author: Xiangrui Meng <meng@databricks.com> Closes apache#5850 from mengxr/SPARK-7314 and squashes the following commits: 2ed4a95 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7314 da3c2dd [Xiangrui Meng] remove my repo fe7e29b [Xiangrui Meng] switch to maven central 6ddac0e [Xiangrui Meng] reverse the machine code for float/double d2d5b5b [Xiangrui Meng] change back to 4.4 7824a9c [Xiangrui Meng] use Pyrolite 3.1 cc3903a [Xiangrui Meng] upgrade Pyrolite to 4.4-0 for testing
This PR upgrades Pyrolite to 4.4, which contains the bug fix for SPARK-3524 and some other performance improvements (e.g., SPARK-6288). The artifact is still under `org.spark-project` on Maven Central since there is no official release published there. Author: Xiangrui Meng <meng@databricks.com> Closes apache#5850 from mengxr/SPARK-7314 and squashes the following commits: 2ed4a95 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7314 da3c2dd [Xiangrui Meng] remove my repo fe7e29b [Xiangrui Meng] switch to maven central 6ddac0e [Xiangrui Meng] reverse the machine code for float/double d2d5b5b [Xiangrui Meng] change back to 4.4 7824a9c [Xiangrui Meng] use Pyrolite 3.1 cc3903a [Xiangrui Meng] upgrade Pyrolite to 4.4-0 for testing
This PR upgrades Pyrolite to 4.4, which contains the bug fix for SPARK-3524 and some other performance improvements (e.g., SPARK-6288). The artifact is still under
org.spark-project
on Maven Central since there is no official release published there.