Add GTest Event Listener with CUDA validation after TEST #2516

klecki · 2020-12-01T20:06:16Z

Signed-off-by: Krzysztof Lecki klecki@nvidia.com

Why we need this PR?

It fixes a bug/It adds new feature: Automatic error checking for CUDA after each Test Case.

What happened in this PR?

What solution was applied:
Used event listener to allow per test case error checking without the need to modify code.
Affected modules and functionalities:
GTest
Key points relevant for the review:
N/A
Validation and testing:
CI, adjustment in the testing framework
Documentation (including examples):
N/A

JIRA TASK: [Use DALI-1724 or NA]

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki · 2020-12-01T20:08:12Z

!build

dali-automaton · 2020-12-01T20:10:33Z

CI MESSAGE: [1850302]: BUILD STARTED

szalpal · 2020-12-01T20:12:35Z

dali/test/dali_cuda_finalize_test.h

+ */
+class CudaFinalizeEventListener : public ::testing::EmptyTestEventListener {
+  void OnTestEnd(const ::testing::TestInfo& test_info) override {
+    auto sync_result = cudaDeviceSynchronize();


sync_result is always cudaSuccess IIRC

Also I'm wondering, could this actually catch the error from another thread or this synchronization secures this scenario

The docs is pretty clear:

Blocks until the device has completed all preceding requested tasks. cudaDeviceSynchronize() returns an error if one of the preceding tasks has failed.

It theoretically can catch error from other thread (apparently use of threading is on by default), but this sync should limit the leaking of error to something close. And I observed a scenario when cudaDeviceSynchronize returned success when there was launch error and a case where both calls failed.

I think the error is reported only once. If the launch error appeared then it won't be reported again. @mzient

I forced a kernel launch failure and cudaDeviceSynchronize didn't report it, but the cudaGetLastError did (and this clears it for further calls).

szalpal · 2020-12-01T20:12:41Z

dali/test/dali_cuda_finalize_test.h

+class CudaFinalizeEventListener : public ::testing::EmptyTestEventListener {
+  void OnTestEnd(const ::testing::TestInfo& test_info) override {
+    auto sync_result = cudaDeviceSynchronize();
+    EXPECT_EQ(sync_result, cudaSuccess) << "CUDA error: \"" << cudaGetErrorName(sync_result)


I'd add also OnTestStart and check cudaGetLastError there also, if by any chance it's not cudaSuccess

That would yield maybe an early exit in some cases, but still, the error will be propagated to the point where it is checked - so either the test will do it explicitly (some does) or it will be checked after.
I will leave the error checking in one point only.

What I had in mind is that with this some race conditions or something like that, that aren't easy to extract a single unit test that break, may be easier to debug. I'm fine either way

dali-automaton · 2020-12-01T21:33:37Z

CI MESSAGE: [1850302]: BUILD FAILED

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki · 2020-12-02T11:13:53Z

!build

dali-automaton · 2020-12-02T11:15:36Z

CI MESSAGE: [1852620]: BUILD STARTED

JanuszL · 2020-12-02T11:36:40Z

dali/test/dali_cuda_finalize_test.h

+ */
+class CudaFinalizeEventListener : public ::testing::EmptyTestEventListener {
+  void OnTestEnd(const ::testing::TestInfo& test_info) override {
+    if (std::strstr(test_info.test_suite_name(), "CpuOnlyTest") == nullptr) {


Maybe we can do something like:

(cuDeviceGet(&dummy, 0) != CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND)

to make sure that driver is accessible. I'm just afraid that CpuOnlyTest asking every test without GPU to stick to this name patter maybe too much.

I followed the pattern from our tests that assumes that substring.

dali-automaton · 2020-12-02T12:39:17Z

CI MESSAGE: [1852620]: BUILD FAILED

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki · 2020-12-02T15:13:12Z

!build

dali-automaton · 2020-12-02T15:16:08Z

CI MESSAGE: [1854128]: BUILD STARTED

dali-automaton · 2020-12-02T16:40:08Z

CI MESSAGE: [1854128]: BUILD FAILED

dali-automaton · 2020-12-02T17:15:04Z

CI MESSAGE: [1854128]: BUILD PASSED

JanuszL · 2020-12-03T10:43:44Z

dali/test/dali_cuda_finalize_test.h

+ */
+class CudaFinalizeEventListener : public ::testing::EmptyTestEventListener {
+  void OnTestEnd(const ::testing::TestInfo& test_info) override {
+    if (cuInitChecked()) {


You can add a comment why this is tested here

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki · 2020-12-03T10:47:54Z

!build

dali-automaton · 2020-12-03T10:50:44Z

CI MESSAGE: [1857925]: BUILD STARTED

dali-automaton · 2020-12-03T12:22:18Z

CI MESSAGE: [1857925]: BUILD PASSED

klecki force-pushed the coverity-medium branch from 672d4a5 to ac77b50 Compare December 1, 2020 20:07

Add GTest Event Listener with CUDA validation after TEST

7e8ce3d

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki force-pushed the coverity-medium branch from ac77b50 to 7e8ce3d Compare December 1, 2020 20:08

szalpal reviewed Dec 1, 2020

View reviewed changes

szalpal self-assigned this Dec 1, 2020

szalpal approved these changes Dec 2, 2020

View reviewed changes

Ignore CPU only tests

aed6c80

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

JanuszL reviewed Dec 2, 2020

View reviewed changes

Check if cuInit was sucessful

61dd648

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki force-pushed the coverity-medium branch from 3846b12 to 61dd648 Compare December 2, 2020 15:13

awolant assigned JanuszL Dec 3, 2020

JanuszL reviewed Dec 3, 2020

View reviewed changes

JanuszL approved these changes Dec 3, 2020

View reviewed changes

Add comment

dccbb9f

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki merged commit 1f9329a into NVIDIA:master Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GTest Event Listener with CUDA validation after TEST #2516

Add GTest Event Listener with CUDA validation after TEST #2516

klecki commented Dec 1, 2020

klecki commented Dec 1, 2020

dali-automaton commented Dec 1, 2020

szalpal Dec 1, 2020

JanuszL Dec 1, 2020

klecki Dec 2, 2020

JanuszL Dec 2, 2020

klecki Dec 2, 2020

szalpal Dec 1, 2020

klecki Dec 2, 2020

szalpal Dec 2, 2020

dali-automaton commented Dec 1, 2020

klecki commented Dec 2, 2020

dali-automaton commented Dec 2, 2020

JanuszL Dec 2, 2020 •

edited

Loading

klecki Dec 2, 2020

dali-automaton commented Dec 2, 2020

klecki commented Dec 2, 2020

dali-automaton commented Dec 2, 2020

dali-automaton commented Dec 2, 2020

dali-automaton commented Dec 2, 2020

JanuszL Dec 3, 2020

klecki Dec 3, 2020

klecki commented Dec 3, 2020

dali-automaton commented Dec 3, 2020

dali-automaton commented Dec 3, 2020

Add GTest Event Listener with CUDA validation after TEST #2516

Add GTest Event Listener with CUDA validation after TEST #2516

Conversation

klecki commented Dec 1, 2020

Why we need this PR?

What happened in this PR?

klecki commented Dec 1, 2020

dali-automaton commented Dec 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Dec 1, 2020

klecki commented Dec 2, 2020

dali-automaton commented Dec 2, 2020

JanuszL Dec 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Dec 2, 2020

klecki commented Dec 2, 2020

dali-automaton commented Dec 2, 2020

dali-automaton commented Dec 2, 2020

dali-automaton commented Dec 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klecki commented Dec 3, 2020

dali-automaton commented Dec 3, 2020

dali-automaton commented Dec 3, 2020

JanuszL Dec 2, 2020 •

edited

Loading