Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
PhysicalPlan reviewed
  • Loading branch information
JerryLead committed Jul 31, 2015
1 parent 0303677 commit b17e226
Show file tree
Hide file tree
Showing 7 changed files with 252 additions and 2 deletions.
Binary file modified .DS_Store
Binary file not shown.
250 changes: 250 additions & 0 deletions EnglishVersion/3-JobPhysicalPlan.md

Large diffs are not rendered by default.

Binary file modified Graphs/PhysicalPlan.graffle
Binary file not shown.
Binary file modified PNGfigures/ComplexTask.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified markdown/.DS_Store
Binary file not shown.
2 changes: 1 addition & 1 deletion markdown/3-JobPhysicalPlan.md
Expand Up @@ -14,7 +14,7 @@

仔细观察一下逻辑执行图会发现:在每个 RDD 中,每个 partition 是独立的,也就是说在 RDD 内部,每个 partition 的数据依赖各自不会相互干扰。因此,一个大胆的想法是将整个流程图看成一个 stage,为最后一个 finalRDD 中的每个 partition 分配一个 task。图示如下:

![ComplexTask](PNGfigures/ComplexTask.png)
![ComplexTask](../PNGfigures/ComplexTask.png)

所有的粗箭头组合成第一个 task,该 task 计算结束后顺便将 CoGroupedRDD 中已经计算得到的第二个和第三个 partition 存起来。之后第二个 task(细实线)只需计算两步,第三个 task(细虚线)也只需要计算两步,最后得到结果。

Expand Down
2 changes: 1 addition & 1 deletion readme.md
Expand Up @@ -29,7 +29,7 @@ We start from the creation of a Spark job, and then discuss its execution. Final

1. [Overview](https://github.com/JerryLead/SparkInternals/blob/master/EnglishVersion/1-Overview.md) Overview of Apache Spark
2. [Job logical plan](https://github.com/JerryLead/SparkInternals/blob/master/EnglishVersion/2-JobLogicalPlan.md) Logical plan of a job (data dependency graph)
3. [Job physical plan](https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/3-JobPhysicalPlan.md) Physical plan
3. [Job physical plan](https://github.com/JerryLead/SparkInternals/blob/master/EnglishVersion/3-JobPhysicalPlan.md) Physical plan
4. [Shuffle details](https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/4-shuffleDetails.md) Shuffle process
5. [Architecture](https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/5-Architecture.md) Coordination of system modules in job execution
6. [Cache and Checkpoint](https://github.com/JerryLead/SparkInternals/blob/master/markdown/english/6-CacheAndCheckpoint.md) Cache and Checkpoint
Expand Down

0 comments on commit b17e226

Please sign in to comment.