New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-3835] Fix global sort issues #3779
Conversation
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3093/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1371/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3096/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1374/ |
sortColumnDataTypes = sortColumnDataTypes.map { datatype => | ||
val updatedType = datatype match { | ||
case StringType => ByteType | ||
case TimestampType | DateType => LongType | ||
case _ => datatype | ||
} | ||
updatedType | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about to change it as following?
sortColumnDataTypes = sortColumnDataTypes.map {
case StringType => ByteType
case TimestampType | DateType => LongType
case datatype => datatype
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3098/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1376/ |
f460cfb
to
40ec3d7
Compare
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1378/ |
retest this please |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3101/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1379/ |
Fix global sort column as partition column load failure issue
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1380/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3103/ |
LGTM |
1 similar comment
LGTM |
Why is this PR needed? For global sort without partition, string comes as byte[], if we use string comparator (StringSerializableComparator) it will convert byte[] to toString which gives address and comparison goes wrong. For global sort with partition, when sort column is partition column, it was sorting on first column instead of partition columns. What changes were proposed in this PR? change data type to byte before choosing a comparator. get the sorted column based on index, don't just take from first Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes #3779
Why is this PR needed? For global sort without partition, string comes as byte[], if we use string comparator (StringSerializableComparator) it will convert byte[] to toString which gives address and comparison goes wrong. For global sort with partition, when sort column is partition column, it was sorting on first column instead of partition columns. What changes were proposed in this PR? change data type to byte before choosing a comparator. get the sorted column based on index, don't just take from first Does this PR introduce any user interface change? No Is any new testcase added? Yes This closes apache#3779
Why is this PR needed?
For global sort without partition, string comes as byte[], if we use string comparator (StringSerializableComparator) it will convert byte[] to toString which gives address and comparison goes wrong.
For global sort with partition, when sort column is partition column.
it was sorting on first column instead of partition columns.
What changes were proposed in this PR?
Does this PR introduce any user interface change?
Is any new testcase added?