Skip to content

Commit

Permalink
Clarify Scala programming guide on caching ...
Browse files Browse the repository at this point in the history
... with regards to saved map output. Wording taken partially from Matei Zaharia's email to the Spark user list. http://apache-spark-user-list.1001560.n3.nabble.com/performance-improvement-on-second-operation-without-caching-td5227.html
  • Loading branch information
esjewett committed May 6, 2014
1 parent 3c64750 commit 171e670
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions docs/scala-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -278,10 +278,13 @@ iterative algorithms with Spark and for interactive use from the interpreter.
You can mark an RDD to be persisted using the `persist()` or `cache()` methods on it. The first time
it is computed in an action, it will be kept in memory on the nodes. The cache is fault-tolerant --
if any partition of an RDD is lost, it will automatically be recomputed using the transformations
that originally created it.
that originally created it. Note: in a multi-stage job, Spark saves the map output files from map
stages to the filesystem, so it only needs to rerun the last reduce stage. This means that multi-stage
jobs that are rerun will often not recompute the full dependency graph. The lack of recomputation,
in this case, does not indicate that RDDs are cached.

In addition, each RDD can be stored using a different *storage level*, allowing you, for example, to
persist the dataset on disk, or persist it in memory but as serialized Java objects (to save space),
In addition, each cached RDD can be stored using a different *storage level*, allowing you, for example,
to persist the dataset on disk, or persist it in memory but as serialized Java objects (to save space),
or replicate it across nodes, or store the data in off-heap memory in [Tachyon](http://tachyon-project.org/).
These levels are chosen by passing a
[`org.apache.spark.storage.StorageLevel`](api/scala/index.html#org.apache.spark.storage.StorageLevel)
Expand Down

0 comments on commit 171e670

Please sign in to comment.