Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28541][WEBUI] Document Storage page #25445

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Binary file added docs/img/webui-storage-detail.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/webui-storage-tab.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
47 changes: 47 additions & 0 deletions docs/web-ui.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,53 @@ The Storage tab displays the persisted RDDs and DataFrames, if any, in the appli
page shows the storage levels, sizes and partitions of all RDDs, and the details page shows the
sizes and using executors for all partitions in an RDD or DataFrame.

{% highlight scala %}
scala> import org.apache.spark.storage.StorageLevel._
import org.apache.spark.storage.StorageLevel._

scala> val rdd = sc.range(0, 100, 1, 5).setName("rdd")
rdd: org.apache.spark.rdd.RDD[Long] = rdd MapPartitionsRDD[1] at range at <console>:27

scala> rdd.persist(MEMORY_ONLY_SER)
res0: rdd.type = rdd MapPartitionsRDD[1] at range at <console>:27

scala> rdd.count
res1: Long = 100

scala> val df = Seq((1, "andy"), (2, "bob"), (2, "andy")).toDF("count", "name")
df: org.apache.spark.sql.DataFrame = [count: int, name: string]

scala> df.persist(DISK_ONLY)
res2: df.type = [count: int, name: string]

scala> df.count
res3: Long = 3
{% endhighlight %}

<p style="text-align: center;">
<img src="img/webui-storage-tab.png"
title="Storage tab"
alt="Storage tab"
width="100%" />
<!-- Images are downsized intentionally to improve quality on retina displays -->
</p>

After running above example, we can found two RDDs listed in the Storage tab. Basic information like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above example -> the above example

found -> find

storage level, number of partitions and memory overhead are provided. Note that the newly persisted RDDs
or DataFrames are not shown in the tab before they are materialized, to monitor a specific RDD or DataFrame,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"materialized, to" -> "materialized. To

make sure an action operation has been triggered.

<p style="text-align: center;">
<img src="img/webui-storage-detail.png"
title="Storage detail"
alt="Storage detail"
width="100%" />
<!-- Images are downsized intentionally to improve quality on retina displays -->
</p>

Cliking the RDD name 'rdd' displays the details of data persistance, such as the data distribution on the cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you check the display of 'rdd'?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I do not get the point. Is there a rendering issue?
In the example, I set the name of the first RDD to rdd.
Thanks for reviewing!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "You can click the RDD name 'rdd' for obtaining the details .... " ?



## Environment Tab
The Environment tab displays the values for the different environment and configuration variables,
including JVM, Spark, and system properties.
Expand Down