Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cv::cuda::reduce bug #6148

Merged
merged 1 commit into from
Feb 29, 2016

Conversation

Aravind-Suresh
Copy link
Contributor

Fixes #6126.

@Aravind-Suresh
Copy link
Contributor Author

Please tell whether this solves the problem or not.

@Aravind-Suresh
Copy link
Contributor Author

I am getting the following error ( buildbot ):

[ RUN      ] OCL_ImgProc/AccumulateSquare.Mat/45
C:\builds_ocv\precommit_opencl\opencv\modules\imgproc\test\ocl\test_accumulate.cpp(145): error: Expected: (TestUtils::checkNorm2(dst, udst, _mask)) <= (1e-2), actual: 208.761 vs 0.01
Size: [3 x 3]

[  FAILED  ] OCL_ImgProc/AccumulateSquare.Mat/45, where GetParam() = ((CV_32F, CV_64F), Channels(3), true) (46 ms)

Can anyone tell me what it is?

@Aravind-Suresh Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch 2 times, most recently from 756bb84 to cea37a4 Compare February 20, 2016 21:53
@Aravind-Suresh
Copy link
Contributor Author

I fixed that error. Please tell me whether this solves the problem or not.

@Aravind-Suresh
Copy link
Contributor Author

Please tell me whether it fixes the problem or not.

@@ -101,7 +101,7 @@ namespace grid_reduce_to_vec_detail

__shared__ work_elem_type smem[cn][BLOCK_SIZE];

const int y = blockIdx.x;
const int y = blockIdx.x*cols;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In generic case dst matrix will not be contiguous, so this index is not valid.

Original solution (dst.create(1, rows);) was used to create contiguous matrix, so that it can be accessed by 1D index.

@Aravind-Suresh
Copy link
Contributor Author

So, a simpler fix will be to transpose dst and return. Should I do that, or should I modify the code, so that it works for a columnar matrix?

@vinograd47
Copy link
Contributor

I guess, it will be better to use dst.create(1, rows); and then call GpuMat::reshape method.

@Aravind-Suresh Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch 2 times, most recently from 088678d to 77707f8 Compare February 24, 2016 09:17
@Aravind-Suresh
Copy link
Contributor Author

Okay. I will do that.

@@ -189,6 +189,7 @@ __host__ void gridReduceToColumn_(const SrcPtr& src, GpuMat_<ResType>& dst, cons
shrinkPtr(mask),
rows, cols,
StreamAccessor::getStream(stream));
dst.reshape(dst.channels(), rows);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reshape method creates new Mat object. You need to use something like dst = dst.reshape(...);.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Fixed that.

@@ -189,6 +189,7 @@ __host__ void gridReduceToColumn_(const SrcPtr& src, GpuMat_<ResType>& dst, cons
shrinkPtr(mask),
rows, cols,
StreamAccessor::getStream(stream));
dst = dst.reshape(dst.channels(), rows);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explicitly cast the reshape result to GpuMat_:

dst = GpuMat_<ResType>(dst.reshape(dst.channels(), rows));

Otherwise compilation fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh okay. Thanks for mentioning. Changed that and pushed.

@vinograd47
Copy link
Contributor

Please fix failing tests:

[opencv_test_cudev] [----------] 4 tests from ReduceToColumn
[opencv_test_cudev] [ RUN      ] ReduceToColumn.Sum
[opencv_test_cudev] /home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/cudev/test/test_reduction.cu:235: Failure
[opencv_test_cudev] Matrices "dst_gold" and "dst" have different sizes : "dst_gold" [205x1] vs "dst" [1x205]
[opencv_test_cudev] [  FAILED  ] ReduceToColumn.Sum (1 ms)
[opencv_test_cudev] [ RUN      ] ReduceToColumn.Avg
[opencv_test_cudev] /home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/cudev/test/test_reduction.cu:254: Failure
[opencv_test_cudev] Matrices "dst_gold" and "dst" have different sizes : "dst_gold" [173x1] vs "dst" [1x173]
[opencv_test_cudev] [  FAILED  ] ReduceToColumn.Avg (0 ms)
[opencv_test_cudev] [ RUN      ] ReduceToColumn.Min
[opencv_test_cudev] /home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/cudev/test/test_reduction.cu:273: Failure
[opencv_test_cudev] Matrices "dst_gold" and "dst" have different sizes : "dst_gold" [217x1] vs "dst" [1x217]
[opencv_test_cudev] [  FAILED  ] ReduceToColumn.Min (1 ms)
[opencv_test_cudev] [ RUN      ] ReduceToColumn.Max
[opencv_test_cudev] /home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/cudev/test/test_reduction.cu:292: Failure
[opencv_test_cudev] Matrices "dst_gold" and "dst" have different sizes : "dst_gold" [268x1] vs "dst" [1x268]
[opencv_test_cudev] [  FAILED  ] ReduceToColumn.Max (2 ms)
[opencv_test_cudev] [----------] 4 tests from ReduceToColumn (4 ms total)
[opencv_test_cudaarithm] [ RUN      ] CUDA_Arithm/Reduce.Cols/1
[opencv_test_cudaarithm] unknown file: Failure
[opencv_test_cudaarithm] C++ exception with description "/home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/core/src/cuda_gpu_mat.cpp:179: error: (-13) The matrix is not continuous, thus its number of rows can not be changed in function reshape
[opencv_test_cudaarithm] " thrown in the test body.
[opencv_test_cudaarithm] [  FAILED  ] CUDA_Arithm/Reduce.Cols/1, where GetParam() = (GM107 CS1, 128x128, CV_8U, Channels(1), cv::REDUCE_SUM, sub matrix) (166 ms)

@Aravind-Suresh
Copy link
Contributor Author

I think it is because GpuMat is not continuous. So I used createContinuous(rows, 1, dst.type(), dst)
I think that will fix the failing tests.

@@ -183,12 +183,14 @@ __host__ void gridReduceToColumn_(const SrcPtr& src, GpuMat_<ResType>& dst, cons
CV_Assert( getRows(mask) == rows && getCols(mask) == cols );

dst.create(1, rows);
cuda::createContinuous(rows, 1, dst.type(), dst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case you need to remove dst.create and dst.reshape calls. They are redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah yeah. I got that. Made a final push. Hope this works :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, it is mentioned in the link below that the last param will change only if it has the same type and size. What does that mean? Should I create it and then createContinuous?

http://docs.opencv.org/2.4/modules/gpu/doc/data_structures.html#gpu-createcontinuous

@vinograd47
Copy link
Contributor

The same tests still fails. You need to update the tests itself, since currently they are targeted for previous code version (with 1 x rows matrix).

Please check ./opencv_test_cudev --gtest_filter=*ReduceToColumn* and ./opencv_test_cudaarithm --gtest_filter=*Reduce.Cols*.

@Aravind-Suresh
Copy link
Contributor Author

All checks have passed. I didn't modify tests. How come none of the tests fail, as they are targeted for previous code version? Tell me what to do next.

@alalek
Copy link
Member

alalek commented Feb 24, 2016

All checks have passed

BTW, public buildbot doesn't run CUDA tests. You need to build and run them manually on your machine.

@Aravind-Suresh
Copy link
Contributor Author

Ohh okay. I will do that. Can you tell me where I can find the test scripts?

@alalek
Copy link
Member

alalek commented Feb 24, 2016

There is command-line sample for test launch a few comments above

@Aravind-Suresh
Copy link
Contributor Author

Please tell me whether this fixes the problem or not.

@vinograd47
Copy link
Contributor

There is still failures in ./opencv_test_cudaarithm --gtest_filter=*Reduce.Cols*. I think you need to modify the test and do not create ROI for output image.

https://github.com/Itseez/opencv/blob/master/modules/cudaarithm/test/test_reductions.cpp#L880

@Aravind-Suresh
Copy link
Contributor Author

What is the use of createMat with useRoi? It just creates the mat and chooses ROI of the given size after creation and returns it.

@Aravind-Suresh
Copy link
Contributor Author

Is this snippet fine? Or should I use false instead of useRoi.

https://gist.github.com/Aravind-Suresh/18e78576f29bb92e741b

@vinograd47
Copy link
Contributor

I think it will be better to remove createMat call for dst at all:

cv::cuda::GpuMat dst;
cv::cuda::reduce(loadMat(src, useRoi), dst, 1, reduceOp, dst_depth);

@Aravind-Suresh
Copy link
Contributor Author

Okay. Should I do the same for CUDA_TEST_P(Reduce, Rows) ? Because I didn't modify any of reduce_to_row methods.

@vinograd47
Copy link
Contributor

No, only for CUDA_TEST_P(Reduce, Cols).

@Aravind-Suresh Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch 2 times, most recently from b3db97d to c332a97 Compare February 25, 2016 13:44
@Aravind-Suresh
Copy link
Contributor Author

Please let me know if this fixes the problem.

@vinograd47
Copy link
Contributor

Now it fails with

[opencv_test_cudaarithm] [ RUN      ] CUDA_Arithm/Reduce.Cols/0
[opencv_test_cudaarithm] /home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/cudaarithm/test/test_reductions.cpp:886: Failure
[opencv_test_cudaarithm] Matrices "dst_gold" and "dst" have different sizes : "dst_gold" [1x128] vs "dst" [128x1]
[opencv_test_cudaarithm] [  FAILED  ] CUDA_Arithm/Reduce.Cols/0, where GetParam() = (GM107 CS1, 128x128, CV_8U, Channels(1), cv::REDUCE_SUM, whole matrix) (1 ms)

@Aravind-Suresh
Copy link
Contributor Author

I thought it is a problem with createContinuous. So I used reshape instead. But I have no idea why the output sizes don't match. Tell me if I am thinking in the right direction.

@vinograd47
Copy link
Contributor

No, createContinuous is OK. I think the problem is in cv::cuda::reduce implementation. Please check https://github.com/Itseez/opencv/blob/master/modules/cudaarithm/src/cuda/reduce.cu file.

@vinograd47
Copy link
Contributor

@Aravind-Suresh
Copy link
Contributor Author

Ohh yeah. Found that. The functions are improperly wrapped. I will change that and push.

@Aravind-Suresh Aravind-Suresh force-pushed the cv-cuda-reduce-bug-fix branch 2 times, most recently from 33fe56f to f4d0a63 Compare February 26, 2016 10:14
@Aravind-Suresh
Copy link
Contributor Author

I am getting a build fail in Win 10 x64 VS2015. Please tell me what is the problem with my code.

@alalek
Copy link
Member

alalek commented Feb 26, 2016

This failure is not related to this patch

@Aravind-Suresh
Copy link
Contributor Author

Ohh okay. Then what should I do? Should I leave it as it is and it will get fixed later?

@alalek
Copy link
Member

alalek commented Feb 26, 2016

This build was restarted and completed successfully.

@Aravind-Suresh
Copy link
Contributor Author

Thanks :)

@vinograd47
Copy link
Contributor

Now it fails in sanity checks (./opencv_perf_cudaarithm --gtest_filter=*Reduce* --perf_min_samples=1 --perf_force_samples=1 --perf_verify_sanity):

[opencv_perf_cudaarithm] [ RUN      ] Sz_Depth_Cn_Code_Dim_Reduce.Reduce/1
[opencv_perf_cudaarithm] /home/jenkins/workspace/OpenCV/3.0/Ubuntu-12.04-x64-Release-Shared-Package-WithTests/opencv/modules/ts/src/ts_perf.cpp:368: Failure
[opencv_perf_cudaarithm] Value of: actual.size.p[1]
[opencv_perf_cudaarithm]   Actual: 1
[opencv_perf_cudaarithm] Expected: expect_cols
[opencv_perf_cudaarithm] Which is: 720
[opencv_perf_cudaarithm] Argument "gpu_dst" has unexpected number of columns
[opencv_perf_cudaarithm] 
[opencv_perf_cudaarithm] params    = (1280x720, CV_8U, Gray, REDUCE_SUM, Cols)
[opencv_perf_cudaarithm] termination reason:  reached maximum number of iterations
[opencv_perf_cudaarithm] bytesIn   =     921600
[opencv_perf_cudaarithm] bytesOut  =          0
[opencv_perf_cudaarithm] samples   =          1
[opencv_perf_cudaarithm] outliers  =          0
[opencv_perf_cudaarithm] frequency = 1000000000
[opencv_perf_cudaarithm] min       =    7437438 = 7.44ms
[opencv_perf_cudaarithm] median    =    7437438 = 7.44ms
[opencv_perf_cudaarithm] gmean     =    7437438 = 7.44ms
[opencv_perf_cudaarithm] gstddev   = 0.00000000 = 0.00ms for 97% dispersion interval
[opencv_perf_cudaarithm] mean      =    7437438 = 7.44ms
[opencv_perf_cudaarithm] stddev    =          0 = 0.00ms
[opencv_perf_cudaarithm] [  FAILED  ] Sz_Depth_Cn_Code_Dim_Reduce.Reduce/1, where GetParam() = (1280x720, CV_8U, Gray, REDUCE_SUM, Cols) (26 ms)

I think it will be better to reshape the output image back to 1 row matrix before sanity check:
https://github.com/Itseez/opencv/blob/master/modules/cudaarithm/perf/perf_reductions.cpp#L371

@Aravind-Suresh
Copy link
Contributor Author

Please let me know if this works fine. Or should I change anything else?

@vinograd47
Copy link
Contributor

Now it works OK. Thank you for your contribution!

@vinograd47
Copy link
Contributor

👍

@Aravind-Suresh
Copy link
Contributor Author

Okay. Thanks for your guidance :)

@opencv-pushbot opencv-pushbot merged commit f4f1561 into opencv:master Feb 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants