[SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers by xinrong-meng · Pull Request #45269 · apache/spark

xinrong-meng · 2024-02-27T00:56:12Z

What changes were proposed in this pull request?

Documentation for SparkSession-based Profilers.

Why are the changes needed?

For easier user onboarding and better usability.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual test. Screenshots of built htmls are as shown below.

Was this patch authored or co-authored using generative AI tooling?

No.

HyukjinKwon

Shall we add the API into python/docs/source/reference/pyspark.sql/spark_session.rst as well?

xinrong-meng · 2024-02-28T19:19:31Z

I was looking for the API doc.. thank you @HyukjinKwon !

HyukjinKwon · 2024-02-29T02:01:32Z

python/docs/source/reference/pyspark.sql/spark_session.rst

    SparkSession.createDataFrame
    SparkSession.getActiveSession
    SparkSession.newSession
+    SparkSession.profile


I think we should also have a dedicated section for profile.show, profile.dump.

Sounds good. Updated here https://github.com/apache/spark/pull/45269/files#diff-1d5123b540315e1c678a3c7f5af287076c8296f71230592990c344933d02f664R90.

I hit

[autosummary] failed to import pyspark.sql.SparkSession.profile.dump. Possible hints: * AttributeError: 'property' object has no attribute 'dump' * ImportError: * ModuleNotFoundError: No module named 'pyspark.sql.SparkSession'

The profile property returns a Profile class instance, Sphinx might have difficulty accessing it. Do you happen to know the best way to resolve that?

Need

:template: autosummary/accessor_method.rst

?

See #44012 (comment)

Hmm I was thinking the same but it kept failing with the error message..

I think SparkSession.builder works because it is a classproperty whereas profile is a property of SparkSession.

I have a workaround 76e7387 by using autoclass, but it doesn't look consistent with the rest of the page, as shown below.

I'm wondering if we should have a follow-up designated for that part.

HyukjinKwon · 2024-02-29T02:04:53Z

python/docs/source/development/debugging.rst

-Python/Pandas UDFs, which can be enabled by setting ``spark.python.profile`` configuration to ``true``.
+Python/Pandas UDFs.
+
+SparkContext-based


I think you can just remove this, and just add one additional section called runtime profiler

cc @ueshin do you have other thoughts?

How about put the new doc to the first place?

Identifying Hot Loops (Python Profilers)

Driver Side
...

Executor Side

Python/Pandas UDF
Show the new profiler usage

Legacy (for RDD or non-Spark Connect)
Put the current doc here

I believe there are many existing users of SparkContext-based profilers. Shall we keep it in the debugging guide until SparkSession-based profilers gain more adoption and positive feedbacks? I'll adjust the order to show SparkSession-based profilers first as @ueshin suggested. What do you think @HyukjinKwon?

We will remove "legacy" profilers for readability and clarity and start preparing migration guide.

xinrong-meng · 2024-03-05T20:00:25Z

Marked WIP to wait for #45378 merged first and then adjusted.

HyukjinKwon · 2024-03-08T00:42:31Z

Merged to master.

SparkSession-based

6b2cd95

github-actions bot added DOCS PYTHON labels Feb 27, 2024

HyukjinKwon reviewed Feb 27, 2024

View reviewed changes

xinrong-meng added 2 commits February 28, 2024 11:18

doc for spark.profile

9d25145

api doc

8360d31

github-actions bot added SQL CONNECT labels Feb 28, 2024

xinrong-meng changed the title ~~[WIP] Documentation for SparkSession-based Profilers~~ [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers Feb 28, 2024

xinrong-meng marked this pull request as ready for review February 28, 2024 19:37

HyukjinKwon reviewed Feb 29, 2024

View reviewed changes

xinrong-meng added 5 commits March 1, 2024 13:27

profile section

d985599

reorg debugging

d73fa7a

refine

c254ba7

remove legacy

5eb1b15

avoid non-Spark-Connect

27b6416

xinrong-meng marked this pull request as draft March 5, 2024 20:00

xinrong-meng changed the title ~~[SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers~~ [WIP][SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers Mar 5, 2024

xinrong-meng added 3 commits March 5, 2024 13:42

autoclass on Profile

76e7387

rmv Runtime Profiling for now

3c9bc19

add "clear"

8fefc3c

xinrong-meng changed the title ~~[WIP][SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers~~ [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers Mar 7, 2024

restruct

4edd241

xinrong-meng marked this pull request as ready for review March 7, 2024 21:40

HyukjinKwon approved these changes Mar 8, 2024

View reviewed changes

ueshin approved these changes Mar 8, 2024

View reviewed changes

HyukjinKwon closed this in 7a5bb5d Mar 8, 2024

Conversation

xinrong-meng commented Feb 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

xinrong-meng commented Feb 28, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xinrong-meng commented Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Mar 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xinrong-meng commented Feb 27, 2024 •

edited

Loading

HyukjinKwon Feb 29, 2024 •

edited

Loading

xinrong-meng commented Mar 5, 2024 •

edited

Loading