[SPARK-48798][PYTHON] Introduce `spark.profile.render` for SparkSession-based profiling #47202

ueshin · 2024-07-03T21:24:00Z

What changes were proposed in this pull request?

Introduces spark.profile.render for SparkSession-based profiling.

It uses flameprof for the default renderer.

$ pip install flameprof

run pyspark on Jupyter notebook:

from pyspark.sql.functions import pandas_udf

spark.conf.set("spark.sql.pyspark.udf.profiler", "perf")

df = spark.range(10)

@pandas_udf("long")
def add1(x):
    return x + 1

added = df.select(add1("id"))
added.show()

spark.profile.render(id=2)

On CLI, it will return svg source string.

'<?xml version="1.0" standalone="no"?>\n<!DOCTYPE svg  ...

Currently only renderer="flameprof" for type="perf" is supported as a builtin renderer.

You can also pass an arbitrary renderer.

def render_perf(stats):
    ...
spark.profile.render(id=2, type="perf", renderer=render_perf)

def render_memory(codemap):
    ...
spark.profile.render(id=2, type="memory", renderer=render_memory)

Why are the changes needed?

Better debuggability.

Does this PR introduce any user-facing change?

Yes, spark.profile.render will be available.

How was this patch tested?

Added/updated the related tests, and manually.

Was this patch authored or co-authored using generative AI tooling?

No.

python/pyspark/sql/profiler.py

xinrong-meng · 2024-07-03T22:33:50Z

LGTM, thank you for working on that!

ueshin · 2024-07-08T22:46:05Z

The test failures seem not related to this PR.

ueshin · 2024-07-08T22:46:21Z

Thanks! merging to master.

…on-based profiling ### What changes were proposed in this pull request? Introduces `spark.profile.render` for SparkSession-based profiling. It uses [`flameprof`](https://github.com/baverman/flameprof/) for the default renderer. ``` $ pip install flameprof ``` run `pyspark` on Jupyter notebook: ```py from pyspark.sql.functions import pandas_udf spark.conf.set("spark.sql.pyspark.udf.profiler", "perf") df = spark.range(10) pandas_udf("long") def add1(x): return x + 1 added = df.select(add1("id")) added.show() spark.profile.render(id=2) ``` <img width="1103" alt="pyspark-udf-profile" src="https://github.com/apache/spark/assets/506656/795972e8-f7eb-4b89-89fc-3d8d18b86541"> On CLI, it will return `svg` source string. ```py '<?xml version="1.0" standalone="no"?>\n<!DOCTYPE svg ... ``` Currently only `renderer="flameprof"` for `type="perf"` is supported as a builtin renderer. You can also pass an arbitrary renderer. ```py def render_perf(stats): ... spark.profile.render(id=2, type="perf", renderer=render_perf) def render_memory(codemap): ... spark.profile.render(id=2, type="memory", renderer=render_memory) ``` ### Why are the changes needed? Better debuggability. ### Does this PR introduce _any_ user-facing change? Yes, `spark.profile.render` will be available. ### How was this patch tested? Added/updated the related tests, and manually. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47202 from ueshin/issues/SPARK-48798/render. Authored-by: Takuya Ueshin <ueshin@databricks.com> Signed-off-by: Takuya Ueshin <ueshin@databricks.com>

Introduce spark.profile.render for SparkSession-based profiling

1c88051

github-actions bot added SQL BUILD DOCS CORE PYTHON CONNECT labels Jul 3, 2024

ueshin requested review from HyukjinKwon and xinrong-meng July 3, 2024 21:32

HyukjinKwon approved these changes Jul 3, 2024

View reviewed changes

xinrong-meng approved these changes Jul 3, 2024

View reviewed changes

Fix.

5452406

xinrong-meng reviewed Jul 3, 2024

View reviewed changes

python/pyspark/sql/profiler.py Show resolved Hide resolved

ueshin added 2 commits July 3, 2024 15:34

Fix.

f6d6c92

Merge branch 'master' into issues/SPARK-48798/render

2242342

ueshin closed this in b062d44 Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48798][PYTHON] Introduce `spark.profile.render` for SparkSession-based profiling #47202

[SPARK-48798][PYTHON] Introduce `spark.profile.render` for SparkSession-based profiling #47202

ueshin commented Jul 3, 2024 •

edited

Loading

xinrong-meng commented Jul 3, 2024

ueshin commented Jul 8, 2024

ueshin commented Jul 8, 2024

[SPARK-48798][PYTHON] Introduce spark.profile.render for SparkSession-based profiling #47202

[SPARK-48798][PYTHON] Introduce spark.profile.render for SparkSession-based profiling #47202

Conversation

ueshin commented Jul 3, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

xinrong-meng commented Jul 3, 2024

ueshin commented Jul 8, 2024

ueshin commented Jul 8, 2024

[SPARK-48798][PYTHON] Introduce `spark.profile.render` for SparkSession-based profiling #47202

[SPARK-48798][PYTHON] Introduce `spark.profile.render` for SparkSession-based profiling #47202

ueshin commented Jul 3, 2024 •

edited

Loading