[NEMO-139, 6] Logic in the scheduler for appending jobs, Support RDD caching #111

sanha · 2018-08-20T05:10:01Z

JIRA: NEMO-139: Logic in the scheduler for appending jobs
JIRA: NEMO-6: Support RDD caching

Major changes:

add a logic in the scheduler for appending plans (NEMO-139)
- implement PlanAppender that appends submitted PhysicalPlan to a original PhysicalPlan
- refactor PlanStateManager, BatchScheduler, BlockManagerMaster, and TaskDispatcher to reflect that all plans from a single job are appended to a single PhysicalPlan through PlanAppender
support RDD caching (NEMO-6)
- add CacheIdProperty property and GhostProperty
  - When a Spark user program call cache() or persist() for a RDD, the RDD creates a ghost vertex and connect the vertex having the RDD to the ghost vertex. This edge to the ghost vertex is annotated with an ID of cache (cacheIdProperty). When a plan with this edge is executed in our runtime, the data to cache will be stored in the edge as the required StorageLevel format. (Any extra feature is not required in our runtime to produce or sustain this data.)
  - When the BatchScheduler encounter a task that annotated with the GhostProperty, the vertex will not be scheduled but just regarded as a completed task.
- implement Optimizer that conducts optimization by using OptimizationPasses from our UserApplicationRunner to separate the roll.
  - When an IR DAG that contains any edge with cacheIdProperty is submitted and there was any already executed IR DAG that contains an edge with the identical cacheIdProperty, the Optimizer crops the IR DAG before the cache edge and adds a CachedSourceVertex before the edge.
- make PlanAppender properly handle the caching
  - Make PlanAppender append the PhysicalPlan constructed from the cropped IR DAG with caching edge to the original PhysicalPlan and add a new edge from the vertex that has the actual edge to a ghost vertex and the new CachedSourceVertex. In runtime, when the CachedSourceVertex requires the data, the cached data that produced and stored in the edge to the ghost vertex will be read through our DuplicateEdgeGroupProperty logic.

Minor changes to note:

N/A.

Tests for the changes:

add an integration test that tests SparkCachingWordCount application
- SparkCachingWordCount caches a shuffle data and calculates that which keys have identical count by using the cached data.

Other comments:

I'm sorry for the late. This issue is a part of our first release and the target due was August 16th, but it is delayed to resolve conflicts.

Closes #111

wonook

I did my first pass. I'll focus on the caching logic on my next pass.

wonook · 2018-08-20T07:20:04Z

runtime/master/src/main/java/edu/snu/nemo/runtime/master/scheduler/BatchScheduler.java

+      final PhysicalPlan appendedPlan =
+          PlanAppender.appendPlan(planStateManager.getPhysicalPlan(), submittedPhysicalPlan);
+      updatePlan(appendedPlan, maxScheduleAttempt);
+      planStateManager.storeJSON("appended");


We can have multiple appended plans. Can we have a unique id for each of them?

wonook · 2018-08-20T07:29:06Z

runtime/executor/src/main/java/edu/snu/nemo/runtime/executor/task/TaskExecutor.java

-            vertexHarness, isToSideInput)); // Parent-task read
-      });
+            vertexHarness, isToSideInput)) // Parent-task read
+      );


Let's push this line up.

wonook · 2018-08-20T08:01:37Z

common/src/main/java/edu/snu/nemo/common/ir/vertex/transform/DummyTransform.java

+ * A {@link Transform} does not emit any output.
+ * @param <T> input/output type.
+ */
+public final class DummyTransform<T> implements Transform<T, T> {


We already have an EmptyTransform in EmptyComponents class

It seems that the EmptyComponents are under the test package. Should it be moved to non-test package?

That would be better than having duplicate code

wonook · 2018-08-20T08:02:10Z

common/src/main/java/edu/snu/nemo/common/ir/vertex/executionproperty/GhostProperty.java

+ * This kind of vertices is needed when some data have to be written before it's usage is not determined yet
+ * (e.g., for caching).
+ */
+public final class GhostProperty extends VertexExecutionProperty<Boolean> {


How about keeping all these components in the EmptyComponents class?

In my thought, having multiple empty classes in a single class file is not a good design.
Also, it would be good to have ExecutionPropertys in a single package especially.

jeongyooneo

I've done my pass and left a minor comment. Thank you!

jeongyooneo · 2018-08-20T09:44:16Z

common/src/main/java/edu/snu/nemo/common/ir/vertex/executionproperty/GhostProperty.java

+ * This kind of vertices is needed when some data have to be written before it's usage is not determined yet
+ * (e.g., for caching).
+ */
+public final class GhostProperty extends VertexExecutionProperty<Boolean> {


How about MarkerProperty which has enum values such as CachedDataMarker?

As the class comment says, vertex with this property is kind of a dummy vertex(in that it doesn't process data) used as a 'marker', in this case marks the existence of cached data. I think this type of vertex can also be used in contexts other than caching.

Why should the value be an enum? Just having MARKER_PROPERTY as now is not enough?

I see. MARKER_PROPERTY would be enough for now. Thank you!

johnyangk

Left comments to ask about the relationships between the new properties, and existing properties/vertex.

johnyangk · 2018-08-21T00:06:15Z

common/src/main/java/edu/snu/nemo/common/ir/edge/executionproperty/CacheIDProperty.java

+/**
+ * Cache ID ExecutionProperty.
+ */
+public final class CacheIDProperty extends EdgeExecutionProperty<UUID> {


This is more of a question than a suggestion:
Would it be a good idea to somehow merge this into the existing PersistenceProperty?

In my thought, it would be good to separate them because the PersistenceProperty can be used for changing the persistence itself but not for caching. For example, our large shuffle optimization change the persistence to conduct shuffle on memory. If we merge the cache ID with the persistence property, the Pass developer should decide to maintain or discard the cache ID or not while modifying the PersistenceProperty..

Thanks! That makes sense. Can you add a class-level comment about this?

johnyangk · 2018-08-21T00:09:54Z

common/src/main/java/edu/snu/nemo/common/ir/vertex/executionproperty/MarkerProperty.java

+ * This kind of vertices is needed when some data have to be written before it's usage is not determined yet
+ * (e.g., for caching).
+ */
+public final class MarkerProperty extends VertexExecutionProperty<Boolean> {


DoNotScheduleProperty?

How is this related to the 'barrier' aspects in the existing MetricCollectionBarrierVertex? Would it be a good idea to reuse this property for that as well?

If a vertex is annotated as a MarkerProperty (or DoNotScheduleProperty), the vertex should not be scheduled to any executor (forever). It will act as a simple marker to construct an edge (and the data in the edge).
In contrast, the barrier vertex is scheduled to an executor, collects the metric of data, sends the metric data to master, becomes ON_HOLD state, and is completed after a dynamic optimization with the metric.
It seems that there is no reason to merge the two property.

I see. So vertices with this property is simply 'ignored' when scheduling, whereas the barrier vertex is scheduled but 'blocks' the scheduling of downstream vertices.

How about renaming it to something like IgnoreSchedulingProperty? (consistent with the existing ClonedSchedulingProperty) The term Marker seems a bit generic.

@johnyangk I understand your point, but additional to being skipped in scheduling, this property acts as a marker used in PlanAppender and NemoOptimizer when scanning Plan to find cached data. Maybe acting as a marker(indicator?) is a primary feature and being skipped in scheduling is just a side effect(skipped since it's a dummy vertex).

I agree with that the name is generic. Every annotation acts as a kind of marker. If @jeongyooneo agree, I will rename it to IgnoreSchedulingProperty.

I think the explanation is now in the class-level comment, so going for IgnoreSchedulingProperty looks fine! 😄

Thanks @jeongyooneo @sanha for your explanation. I understood that this property is actually being interpreted in two different ways:

(1) Ignore(skip) scheduling
(2) Some kind of an indicator when scanning for cached data (as in https://github.com/apache/incubator-nemo/pull/111/files/cfcbf0f6c16347d377e2e671a88739d65bbc36b1#diff-b29364b5afa203a9c5f3f3af1717b866R68)

IMHO (1) and (2) are orthogonal, and I feel we should have a separate property for each.
I feel that (1) is straightforward, whereas (2) assumes certain things (e.g., Cached edge toward a ghost is a representative edge).

If we go with a single property that does both (1) and (2), then I'd suggest using a name that clearly describes either (1) or (2) and maybe leave a TODO or comment to indicate that it is being used for the other purpose as well. (with certain assumptions)

@johnyangk I chose to change it's name and update the comments for now. Thanks!

johnyangk · 2018-08-21T01:33:43Z

common/src/main/java/edu/snu/nemo/common/ir/vertex/executionproperty/MarkerProperty.java

+ * This kind of vertices is needed when some data have to be written before it's usage is not determined yet
+ * (e.g., for caching).
+ */
+public final class MarkerProperty extends VertexExecutionProperty<Boolean> {


Thanks @jeongyooneo @sanha for your explanation. I understood that this property is actually being interpreted in two different ways:

(1) Ignore(skip) scheduling
(2) Some kind of an indicator when scanning for cached data (as in https://github.com/apache/incubator-nemo/pull/111/files/cfcbf0f6c16347d377e2e671a88739d65bbc36b1#diff-b29364b5afa203a9c5f3f3af1717b866R68)

IMHO (1) and (2) are orthogonal, and I feel we should have a separate property for each.
I feel that (1) is straightforward, whereas (2) assumes certain things (e.g., Cached edge toward a ghost is a representative edge).

If we go with a single property that does both (1) and (2), then I'd suggest using a name that clearly describes either (1) or (2) and maybe leave a TODO or comment to indicate that it is being used for the other purpose as well. (with certain assumptions)

johnyangk · 2018-08-21T01:35:20Z

runtime/master/src/main/java/edu/snu/nemo/runtime/master/PlanAppender.java

+    final Map<UUID, StageEdge> cachedEdges = new HashMap<>();
+    originalPlan.getStageDAG().getVertices().forEach(
+        stage -> originalPlan.getStageDAG().getIncomingEdgesOf(stage).stream()
+            // Cached edge toward a ghost is a representative edge.


This seems like an important assumption.
Can you add a pointer to comments in {@link MarkerProperty}, or a TODO to emphasize/explain this assumption?

sanha · 2018-08-21T02:14:02Z

@johnyangk @jeongyooneo Thanks for the review! Please take a look.

johnyangk · 2018-08-21T02:18:40Z

LGTM 😄

sanha added 15 commits August 9, 2018 18:49

add ghost property

c642c29

add ghost vertex while caching

d35fa75

adding cache ghost vertex and append (in progress)

3cd1ae2

Merge branch 'master' into 6-RDDCache

6aecc44

crop and append

dd81872

fix unit test failure

1fff3e2

add IT case expected result

390833b

implement java RDD caching

8dd222f

merge master, add issue numbers, encount ir opt conflict

60aa382

temporary contain null in PhysPlan

42215fd

resolve conflicts

6b3b653

resolve conflict

c0b8ee0

fix minor bugs

85ffa2c

Merge branch 'master' into 6-RDDCache

88a5291

resolve unit test failure

cfa14b5

sanha added the enhancement New feature or request label Aug 20, 2018

sanha self-assigned this Aug 20, 2018

sanha requested a review from jeongyooneo August 20, 2018 05:10

merge master

541e200

wonook reviewed Aug 20, 2018

View reviewed changes

jeongyooneo reviewed Aug 20, 2018

View reviewed changes

address comments

cfcbf0f

sanha force-pushed the 6-RDDCache branch from fbab04a to cfcbf0f Compare August 20, 2018 11:44

johnyangk reviewed Aug 21, 2018

View reviewed changes

address comments

daf0362

johnyangk approved these changes Aug 21, 2018

View reviewed changes

Merge branch 'master' into 6-RDDCache

cfb6aa7

wonook merged commit e6c3616 into apache:master Aug 21, 2018

wonook deleted the 6-RDDCache branch August 21, 2018 03:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEMO-139, 6] Logic in the scheduler for appending jobs, Support RDD caching #111

[NEMO-139, 6] Logic in the scheduler for appending jobs, Support RDD caching #111

sanha commented Aug 20, 2018 •

edited

wonook left a comment

wonook Aug 20, 2018

wonook Aug 20, 2018

wonook Aug 20, 2018

sanha Aug 20, 2018

wonook Aug 21, 2018

wonook Aug 20, 2018

sanha Aug 20, 2018

jeongyooneo left a comment

jeongyooneo Aug 20, 2018

sanha Aug 20, 2018

jeongyooneo Aug 21, 2018

johnyangk left a comment

johnyangk Aug 21, 2018

sanha Aug 21, 2018 •

edited

johnyangk Aug 21, 2018

johnyangk Aug 21, 2018

sanha Aug 21, 2018

johnyangk Aug 21, 2018

jeongyooneo Aug 21, 2018

sanha Aug 21, 2018

jeongyooneo Aug 21, 2018 •

edited

johnyangk Aug 21, 2018

sanha Aug 21, 2018

johnyangk Aug 21, 2018

johnyangk Aug 21, 2018

sanha commented Aug 21, 2018

johnyangk commented Aug 21, 2018

[NEMO-139, 6] Logic in the scheduler for appending jobs, Support RDD caching #111

[NEMO-139, 6] Logic in the scheduler for appending jobs, Support RDD caching #111

Conversation

sanha commented Aug 20, 2018 • edited

wonook left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeongyooneo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnyangk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanha Aug 21, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeongyooneo Aug 21, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanha commented Aug 21, 2018

johnyangk commented Aug 21, 2018

sanha commented Aug 20, 2018 •

edited

sanha Aug 21, 2018 •

edited

jeongyooneo Aug 21, 2018 •

edited