-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28541][WEBUI] Document Storage page #25445
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,6 +45,53 @@ The Storage tab displays the persisted RDDs and DataFrames, if any, in the appli | |
page shows the storage levels, sizes and partitions of all RDDs, and the details page shows the | ||
sizes and using executors for all partitions in an RDD or DataFrame. | ||
|
||
{% highlight scala %} | ||
scala> import org.apache.spark.storage.StorageLevel._ | ||
import org.apache.spark.storage.StorageLevel._ | ||
|
||
scala> val rdd = sc.range(0, 100, 1, 5).setName("rdd") | ||
rdd: org.apache.spark.rdd.RDD[Long] = rdd MapPartitionsRDD[1] at range at <console>:27 | ||
|
||
scala> rdd.persist(MEMORY_ONLY_SER) | ||
res0: rdd.type = rdd MapPartitionsRDD[1] at range at <console>:27 | ||
|
||
scala> rdd.count | ||
res1: Long = 100 | ||
|
||
scala> val df = Seq((1, "andy"), (2, "bob"), (2, "andy")).toDF("count", "name") | ||
df: org.apache.spark.sql.DataFrame = [count: int, name: string] | ||
|
||
scala> df.persist(DISK_ONLY) | ||
res2: df.type = [count: int, name: string] | ||
|
||
scala> df.count | ||
res3: Long = 3 | ||
{% endhighlight %} | ||
|
||
<p style="text-align: center;"> | ||
<img src="img/webui-storage-tab.png" | ||
title="Storage tab" | ||
alt="Storage tab" | ||
width="100%" /> | ||
<!-- Images are downsized intentionally to improve quality on retina displays --> | ||
</p> | ||
|
||
After running above example, we can found two RDDs listed in the Storage tab. Basic information like | ||
storage level, number of partitions and memory overhead are provided. Note that the newly persisted RDDs | ||
or DataFrames are not shown in the tab before they are materialized, to monitor a specific RDD or DataFrame, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "materialized, to" -> "materialized. To |
||
make sure an action operation has been triggered. | ||
|
||
<p style="text-align: center;"> | ||
<img src="img/webui-storage-detail.png" | ||
title="Storage detail" | ||
alt="Storage detail" | ||
width="100%" /> | ||
<!-- Images are downsized intentionally to improve quality on retina displays --> | ||
</p> | ||
|
||
Cliking the RDD name 'rdd' displays the details of data persistance, such as the data distribution on the cluster. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I do not get the point. Is there a rendering issue? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about "You can click the RDD name 'rdd' for obtaining the details .... " ? |
||
|
||
|
||
## Environment Tab | ||
The Environment tab displays the values for the different environment and configuration variables, | ||
including JVM, Spark, and system properties. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above example -> the above example
found -> find