[SPARK-10642][PySpark] Fix crash when calling rdd.lookup() on tuple keys#8796
[SPARK-10642][PySpark] Fix crash when calling rdd.lookup() on tuple keys#8796viirya wants to merge 2 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Just asking the dumb question here, but is this intended to return an int? sys.maxsize does not appear to be the max positive 32-bit int.
There was a problem hiding this comment.
Should it use sys.maxint instead?
There was a problem hiding this comment.
CC @davies the real question is whether this hash is intended to be 32-bit or 64-bit, and my Python knowledge is too limited to reason about this. It appears that it's computing a 64-bit hash given the size of sys.maxsize but maybe that's platform dependent or something. Anyway: I kind of suspect you're right that it's 32-bit, but I think that has to be verified first.
There was a problem hiding this comment.
The h become a long when h *= 1000003, even after h &= sys.maxsize or (maxint).
The fix looks good to me.
|
Test build #42589 has finished for PR 8796 at commit
|
|
retest this please. |
|
Test build #42596 has finished for PR 8796 at commit
|
|
retest this please. |
|
Test build #42608 has finished for PR 8796 at commit
|
…keys JIRA: https://issues.apache.org/jira/browse/SPARK-10642 When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #8796 from viirya/fix-pyrdd-lookup. (cherry picked from commit 136c77d) Signed-off-by: Davies Liu <davies.liu@gmail.com>
…keys JIRA: https://issues.apache.org/jira/browse/SPARK-10642 When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #8796 from viirya/fix-pyrdd-lookup. (cherry picked from commit 136c77d) Signed-off-by: Davies Liu <davies.liu@gmail.com>
…keys JIRA: https://issues.apache.org/jira/browse/SPARK-10642 When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #8796 from viirya/fix-pyrdd-lookup. (cherry picked from commit 136c77d) Signed-off-by: Davies Liu <davies.liu@gmail.com>
…keys JIRA: https://issues.apache.org/jira/browse/SPARK-10642 When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #8796 from viirya/fix-pyrdd-lookup. (cherry picked from commit 136c77d) Signed-off-by: Davies Liu <davies.liu@gmail.com>
|
LGTM, merging into master and 1.5, 1.4, 1.3, 1.2 branches |
…keys JIRA: https://issues.apache.org/jira/browse/SPARK-10642 When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`. Author: Liang-Chi Hsieh <viirya@appier.com> Closes apache#8796 from viirya/fix-pyrdd-lookup. (cherry picked from commit 136c77d) Signed-off-by: Davies Liu <davies.liu@gmail.com> (cherry picked from commit 9f8fb33)
JIRA: https://issues.apache.org/jira/browse/SPARK-10642
When calling
rdd.lookup()on a RDD with tuple keys,portable_hashwill return a long. That causesDAGScheduler.submitJobto throwjava.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer.