Skip to content

Commit

Permalink
[MINOR][DOC] Update the condition description of serialized shuffle
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?
`1. The shuffle dependency specifies no aggregation or output ordering.`
If the shuffle dependency specifies aggregation, but it only aggregates at the reduce-side, serialized shuffle can still be used.
`3. The shuffle produces fewer than 16777216 output partitions.`
If the number of output partitions is 16777216 , we can use serialized shuffle.

We can see this mothod: `canUseSerializedShuffle`
## How was this patch tested?
N/A

Closes #23228 from 10110346/SerializedShuffle_doc.

Authored-by: liuxian <liu.xian3@zte.com.cn>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
10110346 authored and cloud-fan committed Dec 10, 2018
1 parent 42e8c38 commit 9794923
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ import org.apache.spark.shuffle._
* Sort-based shuffle has two different write paths for producing its map output files:
*
* - Serialized sorting: used when all three of the following conditions hold:
* 1. The shuffle dependency specifies no aggregation or output ordering.
* 1. The shuffle dependency specifies no map-side combine.
* 2. The shuffle serializer supports relocation of serialized values (this is currently
* supported by KryoSerializer and Spark SQL's custom serializers).
* 3. The shuffle produces fewer than 16777216 output partitions.
* 3. The shuffle produces fewer than or equal to 16777216 output partitions.
* - Deserialized sorting: used to handle all other cases.
*
* -----------------------
Expand Down

0 comments on commit 9794923

Please sign in to comment.