-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Set up Python Protobuf codegen for Delta Connect #3125
[Spark] Set up Python Protobuf codegen for Delta Connect #3125
Conversation
Failed tests are in |
connect/common/src/main/buf.gen.yaml
Outdated
@@ -0,0 +1,21 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest the directory name of spark-connect
to capture that connect is only for spark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything under python/ is fine, because everything under python is already only for spark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdas Hmm, from an outside look, even though the content of the python
folder is only Spark, there is nothing in the naming that suggests that it is for Spark no, or do we rule out somewhere that everything under the python
folder is already only for Spark like you said? Shall we also do python/delta/spark-connect
instead of python/delta/connect
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the entire content of python/delta is for spark. I agree that ideally this entire python/delta should be under spark/, but thats an existing problem that we will solve holistically.
its not clear to me yet, why all the python files need to be under python/delta/connect/ and not directly under python/delta/
can you tell me more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont have a strong opinion. for now we can keep it here... and we can move it when we have more clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdas Because all the files under python/delta/connect
are for Delta Connect, perhaps we can change the name to delta-connect
also.
I think naming name1/name2
might be better than name1-name2
, so delta/connect
instead of delta-connect
, spark/connect
instead of spark-connect
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments.
@@ -17,6 +17,6 @@ | |||
[pycodestyle] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file used for lint or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. 71606f4
dev/connect-gen-protos.sh
Outdated
# the one generated by proto. | ||
for f in `find gen/proto/python/delta/connect -name "*.py*"`; do | ||
# First fix the imports. | ||
if [[ $f == *_pb2.py || $f == *_pb2_grpc.py ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only part we modify the generated code. How do we know the replacements are correct? Perhaps add that to "How was this patch tested section"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a Python script to check if the generated codes are out of sync with current proto message.
In the future, when we set up the Delta Connect server and client, we will have tests for that, and if the generated codes are wrong then it will break those Delta Connect tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left one comment.
@@ -41,6 +41,10 @@ jobs: | |||
sudo apt-get update | |||
sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python-openssl git | |||
sudo apt install libedit-dev | |||
curl -LO https://github.com/bufbuild/buf/releases/download/v1.28.1/buf-Linux-x86_64.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add this in the requirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not because the requirements.txt
is only for Python dependencies, used for the Python venv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Which Delta project/connector is this regarding?
Description
Added the very first protobuf messages for
DeltaTable
,Clone
andScan
.This is the first PR for Delta Connect, which adds support for Delta's
DeltaTable
interface to Spark Connect. This is needed to support Delta table operations outside of SQL directly on Spark Connect clients.This PR sets up the Python code generation for the Protobufs of Delta Connect. For this I created a new Buf workspace and I added a few initial Protobuf messages to confirm that everything works. This is the ground work of the project, before we move on to setting up the server and client library.
What we are doing here is similar to the Spark Connect's protobuf development guide.
How was this patch tested?
Added the
check-delta-connect-codegen-python.py
to the automated testing, making sure the Python Protobuf Generated codes stay in sync with the proto messages.Does this PR introduce any user-facing changes?
No.