-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7078] [SPARK-7079] Binary processing sort for Spark SQL #6444
Closed
Closed
Changes from all commits
Commits
Show all changes
67 commits
Select commit
Hold shift + click to select a range
d5d3106
WIP towards external sorter for Spark SQL.
JoshRosen 2bd8c9a
Import my original tests and get them to pass.
JoshRosen 58f36d0
Merge in a sketch of a unit test for the new sorter (now failing).
JoshRosen dda6752
Commit some missing code from an old git stash.
JoshRosen c8792de
Remove some debug logging
JoshRosen 9cc98f5
Move more code to Java; fix bugs in UnsafeRowConverter length type.
JoshRosen 73cc761
Fix whitespace
JoshRosen dfdb93f
SparkFunSuite change
JoshRosen b420a71
Move most of the existing SMJ code into Java.
JoshRosen 1b841ca
WIP towards copying
JoshRosen 269cf86
Back out SMJ operator change; isolate changes to selection of sort op.
JoshRosen d468a88
Update for InternalRow refactoring
JoshRosen 7eafecf
Port test to SparkPlanTest
JoshRosen 21d7d93
Back out of BlockObjectWriter change
JoshRosen 62f0bb8
Update to reflect SparkPlanTest changes
JoshRosen 26c8931
Back out some Hive changes that aren't needed anymore
JoshRosen 206bfa2
Add some missing newlines at the ends of files
JoshRosen ebf9eea
Harmonization with shuffle's unsafe sorter
JoshRosen 1db845a
Many more changes to harmonize with shuffle sorter
JoshRosen 82bb0ec
Fix IntelliJ complaint due to negated if condition
JoshRosen 9869ec2
Clean up Exchange code a bit
JoshRosen 6d6a1e6
Centralize logic for picking sort operator implementations
JoshRosen 90c2b6a
Update test name
JoshRosen 41b8881
Get UnsafeInMemorySorterSuite to pass (WIP)
JoshRosen 7f875f9
Commit failing test demonstrating bug in handling objects in spills
JoshRosen 6b156fb
Some WIP work on prefix comparison.
JoshRosen d246e29
Fix consideration of column types when choosing sort implementation.
JoshRosen 6890863
Fix memory leak on empty inputs.
JoshRosen 4c37ba6
Add tests for sorting on all primitive types.
JoshRosen 95058d9
Add missing SortPrefixUtils file
JoshRosen b310c88
Integrate prefix comparators for Int and Long (others coming soon)
JoshRosen 66a813e
Prefix comparators for float and double
JoshRosen 0dfe919
Implement prefix sort for strings (albeit inefficiently).
JoshRosen 939f824
Remove code gen experiment.
JoshRosen 5822e6f
Fix test compilation issue
JoshRosen 9969c14
Merge remote-tracking branch 'origin/master' into sql-external-sort
JoshRosen 7c3c864
Undo part of a SparkPlanTest change in #7162 that broke my test.
JoshRosen 0a79d39
Revert "Undo part of a SparkPlanTest change in #7162 that broke my te…
JoshRosen f27be09
Fix tests by binding attributes.
JoshRosen 88b72db
Test ascending and descending sort orders.
JoshRosen 82e21c1
Force spilling in UnsafeExternalSortSuite.
JoshRosen 8d7fbe7
Fixes to multiple spilling-related bugs.
JoshRosen 87b6ed9
Fix critical issues in test which led to false negatives.
JoshRosen 5d6109d
Fix inconsistent handling / encoding of record lengths.
JoshRosen b81a920
Temporarily enable only the passing sort tests
JoshRosen 1c7bad8
Make sorting of answers explicit in SparkPlanTest.checkAnswer().
JoshRosen b86e684
Set global = true in UnsafeExternalSortSuite.
JoshRosen 08701e7
Fix prefix comparison of null primitives.
JoshRosen 1d7ffaa
Somewhat hacky fix for descending sorts
JoshRosen 613e16f
Test with larger data.
JoshRosen 88aff18
NULL_PREFIX has to be negative infinity for floating point types
JoshRosen 9d00afc
Clean up prefix comparators for integral types
JoshRosen f99a612
Fix bugs in string prefix comparison.
JoshRosen 293f109
Add missing license header.
JoshRosen 844f4ca
Merge remote-tracking branch 'origin/master' into sql-external-sort
JoshRosen d31f180
Re-enable NullType sorting test now that SPARK-8868 is fixed
JoshRosen c56ec18
Clean up final row copying code.
JoshRosen 845bea3
Remove unnecessary zeroing of row conversion buffer
JoshRosen d13ac55
Hacky approach to copying of UnsafeRows for sort followed by limit.
JoshRosen 3947fc1
Merge remote-tracking branch 'origin/master' into sql-external-sort
JoshRosen cd05866
Fix scalastyle
JoshRosen d1e28bc
Merge remote-tracking branch 'origin/master' into sql-external-sort
JoshRosen 2f48777
Add test and fix bug for sorting empty arrays
JoshRosen 5135200
Fix spill reading for large rows; add test
JoshRosen 35dad9f
Make sortAnswers = false the default in SparkPlanTest
JoshRosen 2bbac9c
Merge remote-tracking branch 'origin/master' into sql-external-sort
JoshRosen 6beb467
Remove a bunch of overloaded methods to avoid default args. issue
JoshRosen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
29 changes: 29 additions & 0 deletions
29
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparator.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.util.collection.unsafe.sort; | ||
|
||
import org.apache.spark.annotation.Private; | ||
|
||
/** | ||
* Compares 8-byte key prefixes in prefix sort. Subclasses may implement type-specific | ||
* comparisons, such as lexicographic comparison for strings. | ||
*/ | ||
@Private | ||
public abstract class PrefixComparator { | ||
public abstract int compare(long prefix1, long prefix2); | ||
} |
109 changes: 109 additions & 0 deletions
109
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.util.collection.unsafe.sort; | ||
|
||
import com.google.common.base.Charsets; | ||
import com.google.common.primitives.Longs; | ||
import com.google.common.primitives.UnsignedBytes; | ||
|
||
import org.apache.spark.annotation.Private; | ||
import org.apache.spark.unsafe.types.UTF8String; | ||
|
||
@Private | ||
public class PrefixComparators { | ||
private PrefixComparators() {} | ||
|
||
public static final StringPrefixComparator STRING = new StringPrefixComparator(); | ||
public static final IntegralPrefixComparator INTEGRAL = new IntegralPrefixComparator(); | ||
public static final FloatPrefixComparator FLOAT = new FloatPrefixComparator(); | ||
public static final DoublePrefixComparator DOUBLE = new DoublePrefixComparator(); | ||
|
||
public static final class StringPrefixComparator extends PrefixComparator { | ||
@Override | ||
public int compare(long aPrefix, long bPrefix) { | ||
// TODO: can done more efficiently | ||
byte[] a = Longs.toByteArray(aPrefix); | ||
byte[] b = Longs.toByteArray(bPrefix); | ||
for (int i = 0; i < 8; i++) { | ||
int c = UnsignedBytes.compare(a[i], b[i]); | ||
if (c != 0) return c; | ||
} | ||
return 0; | ||
} | ||
|
||
public long computePrefix(byte[] bytes) { | ||
if (bytes == null) { | ||
return 0L; | ||
} else { | ||
byte[] padded = new byte[8]; | ||
System.arraycopy(bytes, 0, padded, 0, Math.min(bytes.length, 8)); | ||
return Longs.fromByteArray(padded); | ||
} | ||
} | ||
|
||
public long computePrefix(String value) { | ||
return value == null ? 0L : computePrefix(value.getBytes(Charsets.UTF_8)); | ||
} | ||
|
||
public long computePrefix(UTF8String value) { | ||
return value == null ? 0L : computePrefix(value.getBytes()); | ||
} | ||
} | ||
|
||
/** | ||
* Prefix comparator for all integral types (boolean, byte, short, int, long). | ||
*/ | ||
public static final class IntegralPrefixComparator extends PrefixComparator { | ||
@Override | ||
public int compare(long a, long b) { | ||
return (a < b) ? -1 : (a > b) ? 1 : 0; | ||
} | ||
|
||
public final long NULL_PREFIX = Long.MIN_VALUE; | ||
} | ||
|
||
public static final class FloatPrefixComparator extends PrefixComparator { | ||
@Override | ||
public int compare(long aPrefix, long bPrefix) { | ||
float a = Float.intBitsToFloat((int) aPrefix); | ||
float b = Float.intBitsToFloat((int) bPrefix); | ||
return (a < b) ? -1 : (a > b) ? 1 : 0; | ||
} | ||
|
||
public long computePrefix(float value) { | ||
return Float.floatToIntBits(value) & 0xffffffffL; | ||
} | ||
|
||
public final long NULL_PREFIX = computePrefix(Float.NEGATIVE_INFINITY); | ||
} | ||
|
||
public static final class DoublePrefixComparator extends PrefixComparator { | ||
@Override | ||
public int compare(long aPrefix, long bPrefix) { | ||
double a = Double.longBitsToDouble(aPrefix); | ||
double b = Double.longBitsToDouble(bPrefix); | ||
return (a < b) ? -1 : (a > b) ? 1 : 0; | ||
} | ||
|
||
public long computePrefix(double value) { | ||
return Double.doubleToLongBits(value); | ||
} | ||
|
||
public final long NULL_PREFIX = computePrefix(Double.NEGATIVE_INFINITY); | ||
} | ||
} |
37 changes: 37 additions & 0 deletions
37
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/RecordComparator.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.util.collection.unsafe.sort; | ||
|
||
/** | ||
* Compares records for ordering. In cases where the entire sorting key can fit in the 8-byte | ||
* prefix, this may simply return 0. | ||
*/ | ||
public abstract class RecordComparator { | ||
|
||
/** | ||
* Compare two records for order. | ||
* | ||
* @return a negative integer, zero, or a positive integer as the first record is less than, | ||
* equal to, or greater than the second. | ||
*/ | ||
public abstract int compare( | ||
Object leftBaseObject, | ||
long leftBaseOffset, | ||
Object rightBaseObject, | ||
long rightBaseOffset); | ||
} |
31 changes: 31 additions & 0 deletions
31
...src/main/java/org/apache/spark/util/collection/unsafe/sort/RecordPointerAndKeyPrefix.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.util.collection.unsafe.sort; | ||
|
||
final class RecordPointerAndKeyPrefix { | ||
/** | ||
* A pointer to a record; see {@link org.apache.spark.unsafe.memory.TaskMemoryManager} for a | ||
* description of how these addresses are encoded. | ||
*/ | ||
public long recordPointer; | ||
|
||
/** | ||
* A key prefix, for use in comparisons. | ||
*/ | ||
public long keyPrefix; | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have comparator for Byte/Short/Boolean/BinaryType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will be trivial to add. I'll do it now.