Update spark job name to reflect spark application name and execution node #1191

collado-mike · 2021-04-06T17:22:08Z

This change changes the job naming behavior creating a new job name for each query execution in the spark application. For each query execution, a new job is created, with a unique runId and a parent facet pointing to the run identified by the parameters passed into the agent.

As an example, one job name I generated in testing was orders_dump_to_gcs.execute_insert_into_hadoop_fs_relation_command, where orders_dump_to_gcs is the spark application name and execute_insert_into_hadoop_fs_relation_command is the node name returned by the DataWritingCommandExec physical plan node.

codecov · 2021-04-06T17:22:24Z

Codecov Report

Merging #1191 (807212c) into main (4438c5d) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main    #1191   +/-   ##
=========================================
  Coverage     74.44%   74.44%           
  Complexity      803      803           
=========================================
  Files           180      180           
  Lines          4790     4790           
  Branches        368      368           
=========================================
  Hits           3566     3566           
  Misses          852      852           
  Partials        372      372

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4438c5d...807212c. Read the comment docs.

… node Signed-off-by: Michael Collado <mike@datakin.com>

Signed-off-by: Michael Collado <mike@datakin.com>

wslulciuc · 2021-04-06T17:31:14Z

integrations/spark/integrations/sparksql/4.json

@@ -2,7 +2,7 @@
  "eventType": "COMPLETE",
  "eventTime": "2021-01-01T00:00:00Z",
  "run": {
-    "runId": "ea445b5c-22eb-457a-8007-01c7c52b6e54",
+    "runId": "fake_run_id",
    "facets": {
      "parent": {


Minor: Did you also want to update the parent runID used in the test?

wslulciuc

This looks great, @collado-mike! The naming convention now more closely aligns with our Airflow integration. Excited to see how a spark job launched via Airflow can be linked via to its parent runID (= the operator that submitted the spark job) and displayed in our lineage graph.

collado-mike requested review from julienledem and wslulciuc April 6, 2021 17:22

collado-mike added 2 commits April 6, 2021 10:22

Update spark job name to reflect spark application name and execution…

f62301a

… node Signed-off-by: Michael Collado <mike@datakin.com>

Added test cases for job name regex

807212c

Signed-off-by: Michael Collado <mike@datakin.com>

collado-mike force-pushed the spark_job_name branch from 602de58 to 807212c Compare April 6, 2021 17:22

wslulciuc reviewed Apr 6, 2021

View reviewed changes

wslulciuc approved these changes Apr 6, 2021

View reviewed changes

wslulciuc merged commit 9e167e5 into main Apr 6, 2021

wslulciuc deleted the spark_job_name branch April 6, 2021 23:12

wslulciuc added this to Review in Marquez 0.14.0 via automation Apr 8, 2021

wslulciuc moved this from Review to Done in Marquez 0.14.0 Apr 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update spark job name to reflect spark application name and execution node #1191

Update spark job name to reflect spark application name and execution node #1191

collado-mike commented Apr 6, 2021

codecov bot commented Apr 6, 2021 •

edited

Loading

wslulciuc Apr 6, 2021

wslulciuc left a comment

Update spark job name to reflect spark application name and execution node #1191

Update spark job name to reflect spark application name and execution node #1191

Conversation

collado-mike commented Apr 6, 2021

codecov bot commented Apr 6, 2021 • edited Loading

Codecov Report

wslulciuc Apr 6, 2021

Choose a reason for hiding this comment

wslulciuc left a comment

Choose a reason for hiding this comment

codecov bot commented Apr 6, 2021 •

edited

Loading