-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35211][PYTHON] Proper NumericType conversion for applySchemaToPythonRDD #32327
Conversation
@sadhen, SPARK-35211 is not merged yet, and it makes less sense to call it a followup. Let's make a separate JIRA for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also probably update https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L4922-L4940 table.
The problem is that:
- the type coercion isn't really consistent with what we have in SQL
- it has a different type coercion rule with pandas: https://github.com/apache/spark/blob/master/python/pyspark/sql/pandas/functions.py#L305-L323
We should probably define one standard to follow.
ok to test |
I guess this change is okay though .. |
Test build #137904 has finished for PR 32327 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #137912 has finished for PR 32327 at commit
|
I don't know about standard. That's why I only do conversion for numeric type: int/long/byte/short/float/double. Could you describe the problem in detail? |
I am okay with this change but you'll have to update https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L4922-L4940 as well by using the codes here: 9b9d81b |
cc @ueshin @BryanCutler @viirya too FYI |
Hmm, first I am confused by why there are many PRs for the same JIRA? Are they for the same issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these statements seem to follow a pattern of only converting similar types. What exactly was causing the problem from the example in the description? Was it trying to convert an integer to a double?
} | ||
|
||
case ShortType => (obj: Any) => nullSafeConvert(obj) { | ||
case c: Byte => c.toShort | ||
case c: Short => c | ||
case c: Int => c.toShort | ||
case c: Long => c.toShort | ||
case c: Float => c.toShort | ||
case c: Double => c.toShort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd that we would want to silently convert a double to a short. Are we sure that is the correct behavior?
Yes. They are for the same issues but from different part of the code path. Well, now I've created another JIRA ticket (https://issues.apache.org/jira/browse/SPARK-36283) to track this issue. |
What changes were proposed in this pull request?
Before:
After
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?