New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41717][CONNECT] Deduplicate print and _repr_html_ at LogicalPlan #39223
Conversation
195b718
to
5cf8b0c
Compare
5cf8b0c
to
ae2e544
Compare
cc @amaliujia @zhengruifeng @grundprinzip FYI |
ae2e544
to
bc75693
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for doing this!
Related tests passed. Merged to master. |
@@ -95,14 +96,68 @@ def to_proto(self, session: "SparkConnectClient", debug: bool = False) -> proto. | |||
|
|||
return plan | |||
|
|||
# TODO(SPARK-41717): Implement the command logic for print and _repr_html_ | |||
def _parameters_to_print(self, parameters: Mapping[str, Any]) -> Mapping[str, Any]: | |||
params = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of this method is complicated enough to deserve a docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me add some docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here #39237
@@ -207,7 +240,7 @@ def print(self, indent: int = 0) -> str: | |||
def _repr_html_(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this method not removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't removed because LocalRelation
contains data
that's PyArrow table, and this isn't printed out pretty together with our format out of the box. So it was manually overridden here.
@@ -265,13 +277,13 @@ class Project(LogicalPlan): | |||
|
|||
def __init__(self, child: Optional["LogicalPlan"], *columns: "ColumnOrName") -> None: | |||
super().__init__(child) | |||
self._raw_columns = list(columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this one was only the exception that does not follow the variable naming style in this file. All are either name
or _name
, and the automatic printing logic relies on this naming.
Thanks for doing it. Please prepare a follow-up PR that adds documentation to the parameter generating function because that is not "trivial" and thus deserves more explanation. |
…s_to_print, print and _repr_html_ at LogicalPlan ### What changes were proposed in this pull request? This PR is a followup of #39223 that addresses #39223 (comment), that adds some docstring for the fixed methods. ### Why are the changes needed? For better readability. ### Does this PR introduce _any_ user-facing change? No, these aren't API. It adds some doc for developers. ### How was this patch tested? Linter should verify them. Closes #39237 from HyukjinKwon/SPARK-41717-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
… Pandas Functions API with Spark Connect ### What changes were proposed in this pull request? This PR proposes to rename `_func` to `_functions` in the protobuf instances for Pandas Functions API with Spark Connect so the string presentation includes them (see also #39223). ### Why are the changes needed? In order to have the pretty string format for protobuf messages in Python side. ### Does this PR introduce _any_ user-facing change? Yes, ```bash ./bin/pyspark --remote local ``` ```python df = spark.range(1) print(df.mapInPandas(lambda x: x, df.schema)._plan.print()) ``` **Before:** ``` <MapPartitions is_barrier='False'> <Range start='0', end='1', step='1', num_partitions='None'> ``` **After:** ``` <MapPartitions function='<lambda>(id)', is_barrier='False'> <Range start='0', end='1', step='1', num_partitions='None'> ``` ### How was this patch tested? Manually tested as above. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43991 from HyukjinKwon/fix-print-proto. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR deduplicates
print
and_repr_html_
logic by having a comment logic to handle them by looking up the signature and type hints (similar with what we're doing at Spark SQL's query string, e.g.,TreeNode.stringArgs
).Why are the changes needed?
To make it easier to maintain.
Does this PR introduce any user-facing change?
Virtually no.
How was this patch tested?
I tested almost all cases, and complex cases too manually. In addition, I added a unittest too.