-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update spark job name to reflect spark application name and execution node #1191
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1191 +/- ##
=========================================
Coverage 74.44% 74.44%
Complexity 803 803
=========================================
Files 180 180
Lines 4790 4790
Branches 368 368
=========================================
Hits 3566 3566
Misses 852 852
Partials 372 372 Continue to review full report at Codecov.
|
… node Signed-off-by: Michael Collado <mike@datakin.com>
Signed-off-by: Michael Collado <mike@datakin.com>
602de58
to
807212c
Compare
@@ -2,7 +2,7 @@ | |||
"eventType": "COMPLETE", | |||
"eventTime": "2021-01-01T00:00:00Z", | |||
"run": { | |||
"runId": "ea445b5c-22eb-457a-8007-01c7c52b6e54", | |||
"runId": "fake_run_id", | |||
"facets": { | |||
"parent": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Did you also want to update the parent runID used in the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, @collado-mike! The naming convention now more closely aligns with our Airflow integration. Excited to see how a spark job launched via Airflow can be linked via to its parent runID (= the operator that submitted the spark job) and displayed in our lineage graph.
This change changes the job naming behavior creating a new job name for each query execution in the spark application. For each query execution, a new job is created, with a unique
runId
and a parent facet pointing to the run identified by the parameters passed into the agent.As an example, one job name I generated in testing was
orders_dump_to_gcs.execute_insert_into_hadoop_fs_relation_command
, whereorders_dump_to_gcs
is the spark application name andexecute_insert_into_hadoop_fs_relation_command
is the node name returned by theDataWritingCommandExec
physical plan node.