New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42721][CONNECT] RPC logging interceptor #40342
Conversation
Need run |
@LuciferYang thanks. Fixed. |
@@ -155,6 +155,12 @@ | |||
<version>${protobuf.version}</version> | |||
<scope>compile</scope> | |||
</dependency> | |||
<dependency> | |||
<groupId>com.google.protobuf</groupId> | |||
<artifactId>protobuf-java-util</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this pr, the server module depended on protobuf-java-util:3.19.2
. Should protobuf-java-util
and protobuf-java
always use the same version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that should be fine. I didn't realize it already included protobuf-java-util
. Removed this change.
Could you point me to where this dependency comes from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can check it through
build/mvn dependency:tree -pl connector/connect/server
protobuf-java-util
is the transitive dependency of grpc-services
, but it is runtime
scope.
[INFO] +- io.grpc:grpc-services:jar:1.47.0:compile
[INFO] | \- com.google.protobuf:protobuf-java-util:jar:3.19.2:runtime
So I think we should explicit add this dependency for compilation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done. Updated the PR.
headers: Metadata, | ||
next: ServerCallHandler[ReqT, RespT]): ServerCall.Listener[ReqT] = { | ||
|
||
val id = Random.nextInt(Int.MaxValue) // Assign a random id for this RPC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change to use UUID
? I think it has less conflict probability than Random.nextInt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. This is just for debugging. UUID looks very long and makes the logs harder to read. I intentionally even avoided negative numbers :).
This reverts commit 698741f.
fine to me, cc @zhenlineo @amaliujia @hvanhovell FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
### What changes were proposed in this pull request? This adds an gRPC interceptor in spark-connect server. It logs all the incoming RPC requests and responses. - How to enable: Set interceptor config. e.g. ./sbin/start-connect-server.sh --conf spark.connect.grpc.interceptor.classes=org.apache.spark.sql.connect.service.LoggingInterceptor --jars connector/connect/server/target/spark-connect_*-SNAPSHOT.jar - Sample output: 23/03/08 10:54:37 INFO LoggingInterceptor: Received RPC Request spark.connect.SparkConnectService/ExecutePlan (id 1868663481): { "client_id": "6844bc44-4411-4481-8109-a10e3a836f97", "user_context": { "user_id": "raghu" }, "plan": { "root": { "common": { "plan_id": "37" }, "show_string": { "input": { "common": { "plan_id": "36" }, "read": { "data_source": { "format": "csv", "schema": "", "paths": ["file:///tmp/x-in"] } } }, "num_rows": 20, "truncate": 20 } } }, "client_type": "_SPARK_CONNECT_PYTHON" } ### Why are the changes needed? This is useful in development. It might be useful to debug some problems in production as well. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - Manually in development - Unit test Closes #40342 from rangadi/logging-interceptor. Authored-by: Raghu Angadi <raghu.angadi@databricks.com> Signed-off-by: Herman van Hovell <herman@databricks.com> (cherry picked from commit 19cb8d7) Signed-off-by: Herman van Hovell <herman@databricks.com>
Hi, @rangadi and @hvanhovell . There is a |
@dongjoon-hyun thank you! I should have checked. missed it. |
### What changes were proposed in this pull request? This adds an gRPC interceptor in spark-connect server. It logs all the incoming RPC requests and responses. - How to enable: Set interceptor config. e.g. ./sbin/start-connect-server.sh --conf spark.connect.grpc.interceptor.classes=org.apache.spark.sql.connect.service.LoggingInterceptor --jars connector/connect/server/target/spark-connect_*-SNAPSHOT.jar - Sample output: 23/03/08 10:54:37 INFO LoggingInterceptor: Received RPC Request spark.connect.SparkConnectService/ExecutePlan (id 1868663481): { "client_id": "6844bc44-4411-4481-8109-a10e3a836f97", "user_context": { "user_id": "raghu" }, "plan": { "root": { "common": { "plan_id": "37" }, "show_string": { "input": { "common": { "plan_id": "36" }, "read": { "data_source": { "format": "csv", "schema": "", "paths": ["file:///tmp/x-in"] } } }, "num_rows": 20, "truncate": 20 } } }, "client_type": "_SPARK_CONNECT_PYTHON" } ### Why are the changes needed? This is useful in development. It might be useful to debug some problems in production as well. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - Manually in development - Unit test Closes apache#40342 from rangadi/logging-interceptor. Authored-by: Raghu Angadi <raghu.angadi@databricks.com> Signed-off-by: Herman van Hovell <herman@databricks.com> (cherry picked from commit 19cb8d7) Signed-off-by: Herman van Hovell <herman@databricks.com>
What changes were proposed in this pull request?
This adds an gRPC interceptor in spark-connect server. It logs all the incoming RPC requests and responses.
How to enable: Set interceptor config. e.g.
Sample output:
Why are the changes needed?
This is useful in development. It might be useful to debug some problems in production as well.
Does this PR introduce any user-facing change?
no
How was this patch tested?