Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41101][PYTHON][PROTOBUF] Message classname support for PYSPARK-PROTOBUF #38603

Conversation

SandishKumarHN
Copy link
Contributor

@SandishKumarHN SandishKumarHN commented Nov 10, 2022

What changes were proposed in this pull request?

  • Adding message classname support for pyspark-protobuf
  • functions from_protobuf and to_protobuf

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

  • Python Doctests.
  • To avoid adding an extra jar, I used the existing spark-protobuf jar because it has a shadded google-protobuf and used a Timestamp message as an example protobuf messageClassName in Doctests

@rangadi @HyukjinKwon

@SandishKumarHN SandishKumarHN force-pushed the classname-support-pyspark-protobuf branch 4 times, most recently from 30a3a66 to 640fb5b Compare November 11, 2022 00:40
@SandishKumarHN SandishKumarHN force-pushed the classname-support-pyspark-protobuf branch from 640fb5b to dc5f7b0 Compare November 11, 2022 02:04
@SandishKumarHN SandishKumarHN force-pushed the classname-support-pyspark-protobuf branch from ec8a9ca to e886ada Compare November 11, 2022 05:24
the protobuf message name to look for in descriptor file.
messageName: str, optional
the protobuf message name to look for in descriptor file. Or
The Protobuf class name. E.g. ``org.spark.examples.protobuf.ExampleEvent``,
Copy link
Contributor

@rangadi rangadi Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add a comment about shading requirement. In scala, I am changing this name to shadedMessageClassName. See scaladoc update in open PR here: https://github.com/apache/spark/pull/38384/files#diff-93adb6d1143432aff776baed60d056133658635a2d8f9d29d925018fada64ff7R76-R83

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -32,7 +32,7 @@
def from_protobuf(
data: "ColumnOrName",
messageName: str,
descFilePath: str,
descFilePath: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update documentation to reflect change in contract.
@HyukjinKwon is it common to use Optional[] for optional arguments in PySpark?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be fine.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

Merged to master.

SandishKumarHN added a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…-PROTOBUF

### What changes were proposed in this pull request?

- Adding message classname support for pyspark-protobuf
- functions from_protobuf and to_protobuf

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- Python Doctests.
- To avoid adding an extra jar, I used the existing spark-protobuf jar because it has a shadded google-protobuf and used a Timestamp message as an example protobuf messageClassName in Doctests

 rangadi HyukjinKwon

Closes apache#38603 from SandishKumarHN/classname-support-pyspark-protobuf.

Authored-by: SandishKumarHN <sanysandish@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants