Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41717][CONNECT] Deduplicate print and _repr_html_ at LogicalPlan #39223

Closed
wants to merge 2 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Dec 26, 2022

What changes were proposed in this pull request?

This PR deduplicates print and _repr_html_ logic by having a comment logic to handle them by looking up the signature and type hints (similar with what we're doing at Spark SQL's query string, e.g., TreeNode.stringArgs).

Why are the changes needed?

To make it easier to maintain.

Does this PR introduce any user-facing change?

Virtually no.

How was this patch tested?

I tested almost all cases, and complex cases too manually. In addition, I added a unittest too.

@HyukjinKwon HyukjinKwon changed the title [WIP][SPARK-41717][CONNECT] Implement the command logic for print and _repr_html_ [SPARK-41717][CONNECT] Implement the command logic for print and _repr_html_ Dec 27, 2022
@HyukjinKwon HyukjinKwon marked this pull request as ready for review December 27, 2022 00:32
@HyukjinKwon
Copy link
Member Author

cc @amaliujia @zhengruifeng @grundprinzip FYI

@HyukjinKwon HyukjinKwon changed the title [SPARK-41717][CONNECT] Implement the command logic for print and _repr_html_ [SPARK-41717][CONNECT] Deduplicate print and _repr_html_ at LogicalPlan Dec 27, 2022
Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for doing this!

@HyukjinKwon
Copy link
Member Author

Related tests passed.

Merged to master.

@@ -95,14 +96,68 @@ def to_proto(self, session: "SparkConnectClient", debug: bool = False) -> proto.

return plan

# TODO(SPARK-41717): Implement the command logic for print and _repr_html_
def _parameters_to_print(self, parameters: Mapping[str, Any]) -> Mapping[str, Any]:
params = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of this method is complicated enough to deserve a docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me add some docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here #39237

@@ -207,7 +240,7 @@ def print(self, indent: int = 0) -> str:
def _repr_html_(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this method not removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't removed because LocalRelation contains data that's PyArrow table, and this isn't printed out pretty together with our format out of the box. So it was manually overridden here.

@@ -265,13 +277,13 @@ class Project(LogicalPlan):

def __init__(self, child: Optional["LogicalPlan"], *columns: "ColumnOrName") -> None:
super().__init__(child)
self._raw_columns = list(columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this one was only the exception that does not follow the variable naming style in this file. All are either name or _name, and the automatic printing logic relies on this naming.

@grundprinzip
Copy link
Contributor

Thanks for doing it. Please prepare a follow-up PR that adds documentation to the parameter generating function because that is not "trivial" and thus deserves more explanation.

HyukjinKwon added a commit that referenced this pull request Dec 27, 2022
…s_to_print, print and _repr_html_ at LogicalPlan

### What changes were proposed in this pull request?
This PR is a followup of #39223 that addresses #39223 (comment), that adds some docstring for the fixed methods.

### Why are the changes needed?
For better readability.

### Does this PR introduce _any_ user-facing change?
No, these aren't API. It adds some doc for developers.

### How was this patch tested?
Linter should verify them.

Closes #39237 from HyukjinKwon/SPARK-41717-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon added a commit that referenced this pull request Nov 24, 2023
… Pandas Functions API with Spark Connect

### What changes were proposed in this pull request?

This PR proposes to rename `_func` to `_functions` in the protobuf instances for Pandas Functions API with Spark Connect so the string presentation includes them (see also #39223).

### Why are the changes needed?

In order to have the pretty string format for protobuf messages in Python side.

### Does this PR introduce _any_ user-facing change?

Yes,

```bash
./bin/pyspark --remote local
```

```python
df = spark.range(1)
print(df.mapInPandas(lambda x: x, df.schema)._plan.print())
```

**Before:**
```
<MapPartitions is_barrier='False'>
  <Range start='0', end='1', step='1', num_partitions='None'>
```

**After:**

```
<MapPartitions function='<lambda>(id)', is_barrier='False'>
  <Range start='0', end='1', step='1', num_partitions='None'>
```

### How was this patch tested?

Manually tested as above.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43991 from HyukjinKwon/fix-print-proto.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants