cv::cuda::reduce bug #6148

Aravind-Suresh · 2016-02-20T20:43:24Z

Aravind-Suresh · 2016-02-20T20:46:11Z

Please tell whether this solves the problem or not.

Aravind-Suresh · 2016-02-20T21:21:11Z

I am getting the following error ( buildbot ):

[ RUN      ] OCL_ImgProc/AccumulateSquare.Mat/45
C:\builds_ocv\precommit_opencl\opencv\modules\imgproc\test\ocl\test_accumulate.cpp(145): error: Expected: (TestUtils::checkNorm2(dst, udst, _mask)) <= (1e-2), actual: 208.761 vs 0.01
Size: [3 x 3]

[  FAILED  ] OCL_ImgProc/AccumulateSquare.Mat/45, where GetParam() = ((CV_32F, CV_64F), Channels(3), true) (46 ms)

Can anyone tell me what it is?

Aravind-Suresh · 2016-02-21T04:31:32Z

I fixed that error. Please tell me whether this solves the problem or not.

Aravind-Suresh · 2016-02-22T01:46:26Z

Please tell me whether it fixes the problem or not.

vinograd47 · 2016-02-24T07:30:47Z

modules/cudev/include/opencv2/cudev/grid/detail/reduce_to_column.hpp

@@ -101,7 +101,7 @@ namespace grid_reduce_to_vec_detail

        __shared__ work_elem_type smem[cn][BLOCK_SIZE];

-        const int y = blockIdx.x;
+        const int y = blockIdx.x*cols;


In generic case dst matrix will not be contiguous, so this index is not valid.

Original solution (dst.create(1, rows);) was used to create contiguous matrix, so that it can be accessed by 1D index.

Aravind-Suresh · 2016-02-24T08:42:10Z

So, a simpler fix will be to transpose dst and return. Should I do that, or should I modify the code, so that it works for a columnar matrix?

vinograd47 · 2016-02-24T09:02:30Z

I guess, it will be better to use dst.create(1, rows); and then call GpuMat::reshape method.

Aravind-Suresh · 2016-02-24T09:19:58Z

Okay. I will do that.

vinograd47 · 2016-02-24T09:29:22Z

modules/cudev/include/opencv2/cudev/grid/reduce_to_vec.hpp

@@ -189,6 +189,7 @@ __host__ void gridReduceToColumn_(const SrcPtr& src, GpuMat_<ResType>& dst, cons
                                                                shrinkPtr(mask),
                                                                rows, cols,
                                                                StreamAccessor::getStream(stream));
+    dst.reshape(dst.channels(), rows);


reshape method creates new Mat object. You need to use something like dst = dst.reshape(...);.

Oops. Fixed that.

vinograd47 · 2016-02-24T10:14:51Z

modules/cudev/include/opencv2/cudev/grid/reduce_to_vec.hpp

@@ -189,6 +189,7 @@ __host__ void gridReduceToColumn_(const SrcPtr& src, GpuMat_<ResType>& dst, cons
                                                                shrinkPtr(mask),
                                                                rows, cols,
                                                                StreamAccessor::getStream(stream));
+    dst = dst.reshape(dst.channels(), rows);


Please explicitly cast the reshape result to GpuMat_:

dst = GpuMat_<ResType>(dst.reshape(dst.channels(), rows));

Otherwise compilation fails.

Ohh okay. Thanks for mentioning. Changed that and pushed.

vinograd47 · 2016-02-24T11:10:46Z

Please fix failing tests:

[opencv_test_cudev] [----------] 4 tests from ReduceToColumn
[opencv_test_cudev] [ RUN      ] ReduceToColumn.Sum
[opencv_test_cudev] /home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/cudev/test/test_reduction.cu:235: Failure
[opencv_test_cudev] Matrices "dst_gold" and "dst" have different sizes : "dst_gold" [205x1] vs "dst" [1x205]
[opencv_test_cudev] [  FAILED  ] ReduceToColumn.Sum (1 ms)
[opencv_test_cudev] [ RUN      ] ReduceToColumn.Avg
[opencv_test_cudev] /home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/cudev/test/test_reduction.cu:254: Failure
[opencv_test_cudev] Matrices "dst_gold" and "dst" have different sizes : "dst_gold" [173x1] vs "dst" [1x173]
[opencv_test_cudev] [  FAILED  ] ReduceToColumn.Avg (0 ms)
[opencv_test_cudev] [ RUN      ] ReduceToColumn.Min
[opencv_test_cudev] /home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/cudev/test/test_reduction.cu:273: Failure
[opencv_test_cudev] Matrices "dst_gold" and "dst" have different sizes : "dst_gold" [217x1] vs "dst" [1x217]
[opencv_test_cudev] [  FAILED  ] ReduceToColumn.Min (1 ms)
[opencv_test_cudev] [ RUN      ] ReduceToColumn.Max
[opencv_test_cudev] /home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/cudev/test/test_reduction.cu:292: Failure
[opencv_test_cudev] Matrices "dst_gold" and "dst" have different sizes : "dst_gold" [268x1] vs "dst" [1x268]
[opencv_test_cudev] [  FAILED  ] ReduceToColumn.Max (2 ms)
[opencv_test_cudev] [----------] 4 tests from ReduceToColumn (4 ms total)

[opencv_test_cudaarithm] [ RUN      ] CUDA_Arithm/Reduce.Cols/1
[opencv_test_cudaarithm] unknown file: Failure
[opencv_test_cudaarithm] C++ exception with description "/home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/core/src/cuda_gpu_mat.cpp:179: error: (-13) The matrix is not continuous, thus its number of rows can not be changed in function reshape
[opencv_test_cudaarithm] " thrown in the test body.
[opencv_test_cudaarithm] [  FAILED  ] CUDA_Arithm/Reduce.Cols/1, where GetParam() = (GM107 CS1, 128x128, CV_8U, Channels(1), cv::REDUCE_SUM, sub matrix) (166 ms)

Aravind-Suresh · 2016-02-24T11:38:47Z

I think it is because GpuMat is not continuous. So I used createContinuous(rows, 1, dst.type(), dst)
I think that will fix the failing tests.

vinograd47 · 2016-02-24T12:02:42Z

modules/cudev/include/opencv2/cudev/grid/reduce_to_vec.hpp

@@ -183,12 +183,14 @@ __host__ void gridReduceToColumn_(const SrcPtr& src, GpuMat_<ResType>& dst, cons
    CV_Assert( getRows(mask) == rows && getCols(mask) == cols );

    dst.create(1, rows);
+    cuda::createContinuous(rows, 1, dst.type(), dst);


In that case you need to remove dst.create and dst.reshape calls. They are redundant.

Yeah yeah. I got that. Made a final push. Hope this works :).

But, it is mentioned in the link below that the last param will change only if it has the same type and size. What does that mean? Should I create it and then createContinuous?

http://docs.opencv.org/2.4/modules/gpu/doc/data_structures.html#gpu-createcontinuous

vinograd47 · 2016-02-24T12:57:20Z

The same tests still fails. You need to update the tests itself, since currently they are targeted for previous code version (with 1 x rows matrix).

Please check ./opencv_test_cudev --gtest_filter=*ReduceToColumn* and ./opencv_test_cudaarithm --gtest_filter=*Reduce.Cols*.

Aravind-Suresh · 2016-02-24T15:50:27Z

All checks have passed. I didn't modify tests. How come none of the tests fail, as they are targeted for previous code version? Tell me what to do next.

alalek · 2016-02-24T16:00:50Z

All checks have passed

BTW, public buildbot doesn't run CUDA tests. You need to build and run them manually on your machine.

Aravind-Suresh · 2016-02-24T16:14:57Z

Ohh okay. I will do that. Can you tell me where I can find the test scripts?

alalek · 2016-02-24T16:22:40Z

There is command-line sample for test launch a few comments above

Aravind-Suresh · 2016-02-25T10:03:32Z

Please tell me whether this fixes the problem or not.

vinograd47 · 2016-02-25T10:17:16Z

There is still failures in ./opencv_test_cudaarithm --gtest_filter=*Reduce.Cols*. I think you need to modify the test and do not create ROI for output image.

https://github.com/Itseez/opencv/blob/master/modules/cudaarithm/test/test_reductions.cpp#L880

Aravind-Suresh · 2016-02-25T10:37:28Z

What is the use of createMat with useRoi? It just creates the mat and chooses ROI of the given size after creation and returns it.

Aravind-Suresh · 2016-02-25T10:54:06Z

Is this snippet fine? Or should I use false instead of useRoi.

https://gist.github.com/Aravind-Suresh/18e78576f29bb92e741b

vinograd47 · 2016-02-25T11:53:27Z

I think it will be better to remove createMat call for dst at all:

cv::cuda::GpuMat dst;
cv::cuda::reduce(loadMat(src, useRoi), dst, 1, reduceOp, dst_depth);

Aravind-Suresh · 2016-02-25T12:02:39Z

Okay. Should I do the same for CUDA_TEST_P(Reduce, Rows) ? Because I didn't modify any of reduce_to_row methods.

vinograd47 · 2016-02-25T12:06:40Z

No, only for CUDA_TEST_P(Reduce, Cols).

Aravind-Suresh · 2016-02-25T15:57:12Z

Please let me know if this fixes the problem.

vinograd47 · 2016-02-26T08:44:22Z

Now it fails with

[opencv_test_cudaarithm] [ RUN      ] CUDA_Arithm/Reduce.Cols/0
[opencv_test_cudaarithm] /home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/cudaarithm/test/test_reductions.cpp:886: Failure
[opencv_test_cudaarithm] Matrices "dst_gold" and "dst" have different sizes : "dst_gold" [1x128] vs "dst" [128x1]
[opencv_test_cudaarithm] [  FAILED  ] CUDA_Arithm/Reduce.Cols/0, where GetParam() = (GM107 CS1, 128x128, CV_8U, Channels(1), cv::REDUCE_SUM, whole matrix) (1 ms)

Aravind-Suresh · 2016-02-26T09:28:32Z

I thought it is a problem with createContinuous. So I used reshape instead. But I have no idea why the output sizes don't match. Tell me if I am thinking in the right direction.

vinograd47 · 2016-02-26T09:52:57Z

No, createContinuous is OK. I think the problem is in cv::cuda::reduce implementation. Please check https://github.com/Itseez/opencv/blob/master/modules/cudaarithm/src/cuda/reduce.cu file.

vinograd47 · 2016-02-26T09:53:42Z

Probably in that line : https://github.com/Itseez/opencv/blob/master/modules/cudaarithm/src/cuda/reduce.cu#L140

Aravind-Suresh · 2016-02-26T09:55:01Z

Ohh yeah. Found that. The functions are improperly wrapped. I will change that and push.

Aravind-Suresh · 2016-02-26T13:45:10Z

I am getting a build fail in Win 10 x64 VS2015. Please tell me what is the problem with my code.

alalek · 2016-02-26T13:53:25Z

This failure is not related to this patch

Aravind-Suresh · 2016-02-26T13:54:27Z

Ohh okay. Then what should I do? Should I leave it as it is and it will get fixed later?

alalek · 2016-02-26T14:15:16Z

This build was restarted and completed successfully.

Aravind-Suresh · 2016-02-26T14:16:20Z

Thanks :)

vinograd47 · 2016-02-26T14:21:58Z

Now it fails in sanity checks (./opencv_perf_cudaarithm --gtest_filter=*Reduce* --perf_min_samples=1 --perf_force_samples=1 --perf_verify_sanity):

[opencv_perf_cudaarithm] [ RUN      ] Sz_Depth_Cn_Code_Dim_Reduce.Reduce/1
[opencv_perf_cudaarithm] /home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/ts/src/ts_perf.cpp:368: Failure
[opencv_perf_cudaarithm] Value of: actual.size.p[1]
[opencv_perf_cudaarithm]   Actual: 1
[opencv_perf_cudaarithm] Expected: expect_cols
[opencv_perf_cudaarithm] Which is: 720
[opencv_perf_cudaarithm] Argument "gpu_dst" has unexpected number of columns
[opencv_perf_cudaarithm] 
[opencv_perf_cudaarithm] params    = (1280x720, CV_8U, Gray, REDUCE_SUM, Cols)
[opencv_perf_cudaarithm] termination reason:  reached maximum number of iterations
[opencv_perf_cudaarithm] bytesIn   =     921600
[opencv_perf_cudaarithm] bytesOut  =          0
[opencv_perf_cudaarithm] samples   =          1
[opencv_perf_cudaarithm] outliers  =          0
[opencv_perf_cudaarithm] frequency = 1000000000
[opencv_perf_cudaarithm] min       =    7437438 = 7.44ms
[opencv_perf_cudaarithm] median    =    7437438 = 7.44ms
[opencv_perf_cudaarithm] gmean     =    7437438 = 7.44ms
[opencv_perf_cudaarithm] gstddev   = 0.00000000 = 0.00ms for 97% dispersion interval
[opencv_perf_cudaarithm] mean      =    7437438 = 7.44ms
[opencv_perf_cudaarithm] stddev    =          0 = 0.00ms
[opencv_perf_cudaarithm] [  FAILED  ] Sz_Depth_Cn_Code_Dim_Reduce.Reduce/1, where GetParam() = (1280x720, CV_8U, Gray, REDUCE_SUM, Cols) (26 ms)

I think it will be better to reshape the output image back to 1 row matrix before sanity check:
https://github.com/Itseez/opencv/blob/master/modules/cudaarithm/perf/perf_reductions.cpp#L371

Aravind-Suresh · 2016-02-27T05:05:20Z

Please let me know if this works fine. Or should I change anything else?

vinograd47 · 2016-02-29T07:12:00Z

Now it works OK. Thank you for your contribution!

vinograd47 · 2016-02-29T07:12:04Z

👍

Aravind-Suresh · 2016-02-29T18:09:41Z

Okay. Thanks for your guidance :)

Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch 2 times, most recently from 756bb84 to cea37a4 Compare February 20, 2016 21:53

vinograd47 reviewed Feb 24, 2016
View reviewed changes

Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch 2 times, most recently from 088678d to 77707f8 Compare February 24, 2016 09:17

vinograd47 reviewed Feb 24, 2016
View reviewed changes

Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch from 77707f8 to 41bfe8d Compare February 24, 2016 09:52

vinograd47 reviewed Feb 24, 2016
View reviewed changes

Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch from 41bfe8d to e3d3f90 Compare February 24, 2016 10:25

Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch from e3d3f90 to bbdd4b6 Compare February 24, 2016 11:35

vinograd47 reviewed Feb 24, 2016
View reviewed changes

Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch from bbdd4b6 to cfdb3b5 Compare February 24, 2016 12:24

Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch from cfdb3b5 to 1051349 Compare February 24, 2016 17:29

Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch 2 times, most recently from b3db97d to c332a97 Compare February 25, 2016 13:44

Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch from c332a97 to 60891d9 Compare February 26, 2016 09:27

Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch 2 times, most recently from 33fe56f to f4d0a63 Compare February 26, 2016 10:14

Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch from f4d0a63 to c539f46 Compare February 27, 2016 02:28

Fixed cv::cuda::reduce bug.

f4f1561

Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch from c539f46 to f4f1561 Compare February 27, 2016 03:00

opencv-pushbot merged commit f4f1561 into opencv:master Feb 29, 2016

opencv-pushbot pushed a commit that referenced this pull request Feb 29, 2016

Merge pull request #6148 from Aravind-Suresh:cv-cuda-reduce-bug-fix

318671d

cv::cuda::reduce bug #6148

cv::cuda::reduce bug #6148

Conversation

Aravind-Suresh commented Feb 20, 2016

Aravind-Suresh commented Feb 20, 2016

Aravind-Suresh commented Feb 20, 2016

Aravind-Suresh commented Feb 21, 2016

Aravind-Suresh commented Feb 22, 2016

Choose a reason for hiding this comment

Aravind-Suresh commented Feb 24, 2016

vinograd47 commented Feb 24, 2016

Aravind-Suresh commented Feb 24, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinograd47 commented Feb 24, 2016

Aravind-Suresh commented Feb 24, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinograd47 commented Feb 24, 2016

Aravind-Suresh commented Feb 24, 2016

alalek commented Feb 24, 2016

Aravind-Suresh commented Feb 24, 2016

alalek commented Feb 24, 2016

Aravind-Suresh commented Feb 25, 2016

vinograd47 commented Feb 25, 2016

Aravind-Suresh commented Feb 25, 2016

Aravind-Suresh commented Feb 25, 2016

vinograd47 commented Feb 25, 2016

Aravind-Suresh commented Feb 25, 2016

vinograd47 commented Feb 25, 2016

Aravind-Suresh commented Feb 25, 2016

vinograd47 commented Feb 26, 2016

Aravind-Suresh commented Feb 26, 2016

vinograd47 commented Feb 26, 2016

vinograd47 commented Feb 26, 2016

Aravind-Suresh commented Feb 26, 2016

Aravind-Suresh commented Feb 26, 2016

alalek commented Feb 26, 2016

Aravind-Suresh commented Feb 26, 2016

alalek commented Feb 26, 2016

Aravind-Suresh commented Feb 26, 2016

vinograd47 commented Feb 26, 2016

Aravind-Suresh commented Feb 27, 2016

vinograd47 commented Feb 29, 2016

vinograd47 commented Feb 29, 2016

Aravind-Suresh commented Feb 29, 2016