Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and -0.0 in UnsafeWriter #23265

Closed
wants to merge 1 commit into from

Commits on Dec 9, 2018

  1. [SPARK-26021][SQL][FOLLOWUP] only deal with NaN and -0.0 in UnsafeWriter

    A followup of apache#23043
    
    There are 4 places we need to deal with NaN and -0.0:
    1. comparison expressions. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same.
    2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same.
    3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. Different NaNs should be assigned to the same group.
    4. window partition keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same.
    
    The case 1 is OK. Our comparison already handles NaN and -0.0, and for struct/array/map, we will recursively compare the fields/elements.
    
    Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0.
    
    To fix it, a simple solution is: normalize float/double when building unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't need to worry about it anymore.
    
    Following this direction, this PR moves the handling of NaN and -0.0 from `Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not handle them, which reduces the perf overhead. It's also easier to add comments explaining why we do it in `UnsafeWriter`.
    
    existing tests
    
    Closes apache#23239 from cloud-fan/minor.
    
    Authored-by: Wenchen Fan <wenchen@databricks.com>
    Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
    cloud-fan committed Dec 9, 2018
    Configuration menu
    Copy the full SHA
    6a837c0 View commit details
    Browse the repository at this point in the history