-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34806][SQL] Add Observation helper for Dataset.observe #33422
Conversation
ok to test |
} | ||
|
||
/** | ||
* (Scala-specific) Create a named or anonymous instance of Observation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add since
* @since 3.3.0
cc @hvanhovell too |
Kubernetes integration test starting |
Kubernetes integration test status success |
/** | ||
* Observation constructor for creating an anonymous observation. | ||
*/ | ||
def apply(): Observation = new Observation(UUID.randomUUID().toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we also add a default parameter in the constructor? then java users can also get benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, Observation
companion won't work anyway, and the default parameter doesn't work in Java side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we can define another constructor at the class though for Java side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I have added the unnamed constructor to the Observation class.
I have added a test to JavaDataFrameSuite
to test the interaction from Java with observations. It shows that the Dataset.observe
API is not really Java-friendly (see JavaDataFrameSuite.testObservation), but this shouldn't prevent us from making Observation
Java-friendly.
The Dataset.observe
methods could be made Java-friendly (by adding @varargs
) in a separate PR. @cloud-fan @HyukjinKwon @hvanhovell What is you opinion on that effort?
Test build #141285 has finished for PR 33422 at commit
|
I have no more comments on that otherwise. |
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #141322 has finished for PR 33422 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
* @group typedrel | ||
* @since 3.3.0 | ||
*/ | ||
def observe(observation: Observation, expr: Column, exprs: Column*): Dataset[T] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's simply adding an annotation @varargs
, shall we just do it in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about observe(String, Column, Column*)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. Let's don't add it for now
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141330 has finished for PR 33422 at commit
|
Test build #141389 has finished for PR 33422 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
4d78f0c
to
3b2d9ae
Compare
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141399 has finished for PR 33422 at commit
|
I hope this now really means we are good to go. |
Yeah, looks like Javadoc build passed too at https://github.com/G-Research/spark/runs/3122095226?check_suite_focus=true |
thanks, merging to master! |
Test build #141478 has finished for PR 33422 at commit
|
@cloud-fan @HyukjinKwon thanks for your time and valuable input! |
What changes were proposed in this pull request?
This pull request introduces a helper class that simplifies usage of
Dataset.observe()
for batch datasets:Why are the changes needed?
Currently, users are required to implement the
QueryExecutionListener
interface to retrieve the metrics, as well as apply some knowledge on threading and locking to pull the metrics over to the main thread. With the helper class, metrics can be retrieved from batch dataset processing with three lines of code (the action on the observed dataset does not count as a line of code here).Does this PR introduce any user-facing change?
Yes, one new class and one `Dataset`` method.
How was this patch tested?
Adds a unit test to
DataFrameSuite
, similar to"get observable metrics by callback"
inDataFrameCallbackSuite
.