Add TaskFailureListener at JVM side #41

wjxiz1992 · 2022-06-30T08:29:31Z

Signed-off-by: Allen Xu allxu@nvidia.com

To enable the listener, append this config to the submit command:

--conf spark.extraListeners=com.nvidia.spark.rapids.listener.TaskFailureListener

Don't forget to append NDSBenchmarkListener-1.0-SNAPSHOT.jar to the --jars config.
When this is not set, the listener will not be registered to SparkContext.

Update, performance tests w/ and w/o this new listener by the same Spark submit configs on Dataproc :

index	power test time with listener/ms	power test time without listener /ms
1	2734682	2640671
2	2617317	2757564
3	2715048	2779721
4	2772292	2664533
avg	2709834.75	2710622.25

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 · 2022-06-30T10:18:35Z

Currently tested on Dataproc environment with 3TB data run, no RPC errors when shutting down SparkContext.

22/06/30 10:13:12 INFO org.sparkproject.jetty.server.AbstractConnector: Stopped Spark@6d7430e3{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
====== Power Test Time: 2859843 milliseconds ======
====== Total Time: 2929142 milliseconds ======

Signed-off-by: Allen Xu <allxu@nvidia.com>

nds/python_listener/PythonListener.py

...istener/sparklistener/src/main/scala/com/nvidia/spark/rapids/PythonTaskFailureListener.scala

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 · 2022-07-05T07:16:19Z

@GaryShen2008 After discussion with @pxLi , I changed the version from 22.06-SNAPSHOT to 1.0-SNAPSHOT so that we can build it manually and deploy to URM as these scala code are not planned to change much in the future. What do you think?

Signed-off-by: Allen Xu <allxu@nvidia.com>

GaryShen2008 · 2022-07-14T08:41:35Z

nds/PysparkBenchReport.py

+        spark_env = dict(self.spark_session.sparkContext.getConf().getAll())
+        if 'spark.extraListeners' in spark_env.keys() and 'com.nvidia.spark.rapids.listener.TaskFailureListener' in spark_env['spark.extraListeners']:
+            listener = python_listener.PythonListener()
+            listener.register()


nit: can we print out something to tell the user that the listener is not enabled if there's no spark.extraListeners?

GaryShen2008 · 2022-07-14T08:50:34Z

nds/PysparkBenchReport.py

            start_time = int(time.time() * 1000)
            fn(*args)
            end_time = int(time.time() * 1000)
-            if len(listener.failures) != 0:
+            if listener and len(listener.failures) != 0:
+                # NOTE: when listener is not used, the queryStatus field will always be "Completed" in json summary
                self.summary['queryStatus'].append("CompletedWithTaskFailures")
            else:
                self.summary['queryStatus'].append("Completed")


If there's no listener, I'm not sure why we can say it's completed. The logic has been changed here.
Previous, the queryStatus will be always reported with listener. So, the "Completed" status always means the query ran succeeded without any failed task.
So, at least print a message to tell the user the Completed status doesn't mean no failed task when the listener is not registered.

I can even add a new value unknown for queryStatus when listener is not in use. Is it worth adding it or just leave a message to say this field is valid?

I think a message should be enough. No need new status.

nds/python_listener/PythonListener.py

nds/PysparkBenchReport.py

Signed-off-by: Allen Xu <allxu@nvidia.com>

Add PythonTaskFailureListener at JVM side

074f67b

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 self-assigned this Jun 30, 2022

add missing file and refine

479e283

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 added 3 commits July 4, 2022 14:44

refine code structure

f1fe05f

Signed-off-by: Allen Xu <allxu@nvidia.com>

remove comment

19c4a61

Signed-off-by: Allen Xu <allxu@nvidia.com>

refine

d331651

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 marked this pull request as ready for review July 4, 2022 10:04

refine file structure

6f4c907

Signed-off-by: Allen Xu <allxu@nvidia.com>

GaryShen2008 reviewed Jul 5, 2022

View reviewed changes

nds/python_listener/PythonListener.py Outdated Show resolved Hide resolved

GaryShen2008 reviewed Jul 5, 2022

View reviewed changes

...istener/sparklistener/src/main/scala/com/nvidia/spark/rapids/PythonTaskFailureListener.scala Outdated Show resolved Hide resolved

wjxiz1992 mentioned this pull request Jul 5, 2022

When running NDS 2.0 on Dataproc, the Spark eventlog may have duplicate query IDs #33

Closed

resolve comments

767faa6

Signed-off-by: Allen Xu <allxu@nvidia.com>

modify all template file to use new listener

a71d8a5

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 changed the title ~~Add PythonTaskFailureListener at JVM side~~ Add TaskFailureListener at JVM side Jul 5, 2022

change artifact id

48b4bc2

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 mentioned this pull request Jul 6, 2022

increase the sleep duration #40

Closed

wjxiz1992 added 2 commits July 8, 2022 11:45

allow not to use listener

f358fab

Signed-off-by: Allen Xu <allxu@nvidia.com>

Merge remote-tracking branch 'github/branch-22.06' into HEAD

3c4a9cd

GaryShen2008 reviewed Jul 14, 2022

View reviewed changes

nds/python_listener/PythonListener.py Show resolved Hide resolved

GaryShen2008 reviewed Jul 14, 2022

View reviewed changes

nds/PysparkBenchReport.py Show resolved Hide resolved

wjxiz1992 added 2 commits July 14, 2022 17:24

Add missing file

ad0d2c1

Signed-off-by: Allen Xu <allxu@nvidia.com>

add a print message to show if listener is working

0a3540e

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 mentioned this pull request Jul 14, 2022

make json_summary_folder optional #56

Merged

add warning message

25fdf58

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 requested a review from GaryShen2008 July 19, 2022 02:09

GaryShen2008 approved these changes Jul 19, 2022

View reviewed changes

wjxiz1992 merged commit 29c4627 into NVIDIA:branch-22.06 Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TaskFailureListener at JVM side #41

Add TaskFailureListener at JVM side #41

wjxiz1992 commented Jun 30, 2022 •

edited

Loading

wjxiz1992 commented Jun 30, 2022

wjxiz1992 commented Jul 5, 2022

GaryShen2008 Jul 14, 2022

wjxiz1992 Jul 15, 2022

GaryShen2008 Jul 14, 2022

wjxiz1992 Jul 14, 2022 •

edited

Loading

GaryShen2008 Jul 14, 2022

wjxiz1992 Jul 15, 2022

Add TaskFailureListener at JVM side #41

Add TaskFailureListener at JVM side #41

Conversation

wjxiz1992 commented Jun 30, 2022 • edited Loading

wjxiz1992 commented Jun 30, 2022

wjxiz1992 commented Jul 5, 2022

GaryShen2008 Jul 14, 2022

Choose a reason for hiding this comment

wjxiz1992 Jul 15, 2022

Choose a reason for hiding this comment

GaryShen2008 Jul 14, 2022

Choose a reason for hiding this comment

wjxiz1992 Jul 14, 2022 • edited Loading

Choose a reason for hiding this comment

GaryShen2008 Jul 14, 2022

Choose a reason for hiding this comment

wjxiz1992 Jul 15, 2022

Choose a reason for hiding this comment

wjxiz1992 commented Jun 30, 2022 •

edited

Loading

wjxiz1992 Jul 14, 2022 •

edited

Loading