Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3835] Fix global sort issues #3779

Closed
wants to merge 1 commit into from

Conversation

ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented May 30, 2020

Why is this PR needed?

  1. For global sort without partition, string comes as byte[], if we use string comparator (StringSerializableComparator) it will convert byte[] to toString which gives address and comparison goes wrong.

  2. For global sort with partition, when sort column is partition column.
    it was sorting on first column instead of partition columns.

What changes were proposed in this PR?

  1. change data type to byte before choosing a comparator.
  2. get the sorted column based on index, don't just take from first

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • Yes

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3093/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1371/

@ajantha-bhat ajantha-bhat changed the base branch from master to branch-2.0 May 30, 2020 06:09
@ajantha-bhat ajantha-bhat changed the base branch from branch-2.0 to master May 30, 2020 06:10
@ajantha-bhat ajantha-bhat changed the title [TEMP] comparator changes [WIP] Fix global sort string comparator issue May 30, 2020
@ajantha-bhat ajantha-bhat changed the title [WIP] Fix global sort string comparator issue [CARBONDATA-3835] Fix global sort string comparator issue May 30, 2020
@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3096/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1374/

Comment on lines 242 to 246
sortColumnDataTypes = sortColumnDataTypes.map { datatype =>
val updatedType = datatype match {
case StringType => ByteType
case TimestampType | DateType => LongType
case _ => datatype
}
updatedType
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about to change it as following?

     sortColumnDataTypes = sortColumnDataTypes.map { 
        case StringType => ByteType
        case TimestampType | DateType => LongType
        case datatype => datatype
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3098/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1376/

@ajantha-bhat ajantha-bhat force-pushed the master branch 2 times, most recently from f460cfb to 40ec3d7 Compare May 30, 2020 16:45
@ajantha-bhat ajantha-bhat changed the title [CARBONDATA-3835] Fix global sort string comparator issue [CARBONDATA-3835] Fix global issues May 30, 2020
@ajantha-bhat ajantha-bhat changed the title [CARBONDATA-3835] Fix global issues [CARBONDATA-3835] Fix global sort issues May 30, 2020
@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1378/

@ajantha-bhat
Copy link
Member Author

retest this please

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3101/

@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1379/

Fix global sort column as partition column load failure issue
@CarbonDataQA1
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1380/

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3103/

@jackylk
Copy link
Contributor

jackylk commented May 31, 2020

LGTM

1 similar comment
@QiangCai
Copy link
Contributor

LGTM

@asfgit asfgit closed this in 76e5489 May 31, 2020
asfgit pushed a commit that referenced this pull request May 31, 2020
Why is this PR needed?
For global sort without partition, string comes as byte[], if we use string comparator (StringSerializableComparator) it will convert byte[] to toString which gives address and comparison goes wrong.
For global sort with partition, when sort column is partition column, it was sorting on first column instead of partition columns.

What changes were proposed in this PR?
change data type to byte before choosing a comparator.
get the sorted column based on index, don't just take from first

Does this PR introduce any user interface change?
No

Is any new testcase added?
Yes

This closes #3779
marchpure pushed a commit to marchpure/carbondata that referenced this pull request Jun 17, 2020
Why is this PR needed?
For global sort without partition, string comes as byte[], if we use string comparator (StringSerializableComparator) it will convert byte[] to toString which gives address and comparison goes wrong.
For global sort with partition, when sort column is partition column, it was sorting on first column instead of partition columns.

What changes were proposed in this PR?
change data type to byte before choosing a comparator.
get the sorted column based on index, don't just take from first

Does this PR introduce any user interface change?
No

Is any new testcase added?
Yes

This closes apache#3779
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants