Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum disparity has to be 128 #4

Closed
arrfou90 opened this issue Feb 2, 2017 · 12 comments
Closed

Maximum disparity has to be 128 #4

arrfou90 opened this issue Feb 2, 2017 · 12 comments

Comments

@arrfou90
Copy link

arrfou90 commented Feb 2, 2017

Hello Daniel,
I want to increase the disparity level to more than 256. But I assume this a limitation (I guess it is the GPU memory size limitation), Is that possible to increase the disparity level even with low image resolution.
Thanks in advance
Arrfou

@dhernandez0
Copy link
Owner

This is limited to 128 because of some specific optimizations, basically, each thread computed 4 disparity values, so, 128/4 = 32. Therefore, we only need 32 threads (1 warp) doing computation, so, we can use: SIMD instructions, warp shuffle, etc...

This is possible to change, I think (I'm not sure) it should not be difficult to change it to 256 by using uint64_t instead of uint32_t and chaging some code. However, I don't have time to do this now, if you can do it, I could help you if you have any questions.

ZhangSongyi added a commit to ZhangSongyi/sgm that referenced this issue Aug 16, 2017
@smmmmi
Copy link

smmmmi commented Apr 26, 2018

Any updates @arrfou90 ?

@yaobaishen
Copy link

Disparity 256 would be very helpful for long baseline stereo camera to detect near objects.

@yaobaishen
Copy link

I just give a try that latest code could support MAX_DISPARITY 256, but the disparity view of very near object don't look great, the ground also may have many noise, does the sgm algorithm is sensitive to the light condition when image captured?

@yaobaishen
Copy link

@dhernandez0 I look into the code and find that if change cost_t from uint32_t to uint64_t, I have to change several inline functions in the util.h as they are written with 32bit asm, if there are any other tips that I should pay attention to? thanks

@dhernandez0
Copy link
Owner

dhernandez0 commented Nov 13, 2018

Hi @yaobaishen
See my original response to this request, the assumption that maximum disparity is 128 is used in all the code. But the cost_t represents cost computed by the census transform, so it does not change if you change the maximum disparity.

You should change the way memory is read from d_transform1 and d_transform0. Right now, we use a 32 threads for each pixel, each thread then computes the 4 disparities. You should change the code to make each thread compute 8 disparities. For example, right now we read a uint32_t from the cost cube which is the vector of 4 costs that correspond to the thread:

*old_values = ld_gbl_ca(reinterpret_cast<const uint32_t*>(&d_cost[index]));

@yaobaishen
Copy link

@dhernandez0 Thanks for advice!

@yaobaishen
Copy link

@dhernandez0 During implement the code I encounter another problem, as my current knowledge, CUDA doesn't support 64bit asm operation regarding to the code you paste above, so I just read twice, for example:
old_values = ld_gbl_ca(reinterpret_cast<const uint32_t>(&d_cost[index]));
old_values2 = ld_gbl_ca(reinterpret_cast<const uint32_t>(&d_cost[index + 4]));
So I don''t use uint64_t at all, but copy each parameter twice, and modify the CostAggregationGenericIteration() to computes 8 disparities. After this, the disparity map messed up, I am still debugging, so I want to confirm if I misunderstand something... Thanks a lot

@peterjiangjun
Copy link

@dhernandez0 During implement the code I encounter another problem, as my current knowledge, CUDA doesn't support 64bit asm operation regarding to the code you paste above, so I just read twice, for example:
_old_values = ld_gbl_ca(reinterpret_cast(&d_cost[index]));
_old_values2 = ld_gbl_ca(reinterpret_cast(&d_cost[index + 4]));
So I don''t use uint64_t at all, but copy each parameter twice, and modify the CostAggregationGenericIteration() to computes 8 disparities. After this, the disparity map messed up, I am still debugging, so I want to confirm if I misunderstand something... Thanks a lot

您好,请问你有把最大视差值超过128的功能完成吗?如果完成,不知道方不方便分享一下,新手,不会改,只能求助您帮助啦。

@peterjiangjun
Copy link

@dhernandez0 During implement the code I encounter another problem, as my current knowledge, CUDA doesn't support 64bit asm operation regarding to the code you paste above, so I just read twice, for example:
_old_values = ld_gbl_ca(reinterpret_cast(&d_cost[index]));
_old_values2 = ld_gbl_ca(reinterpret_cast(&d_cost[index + 4]));
So I don''t use uint64_t at all, but copy each parameter twice, and modify the CostAggregationGenericIteration() to computes 8 disparities. After this, the disparity map messed up, I am still debugging, so I want to confirm if I misunderstand something... Thanks a lot

您好,请问你有把最大视差值超过128的功能完成吗?如果完成,不知道方不方便分享一下,新手,不会改,只能求助您帮助啦。

@dhernandez0 During implement the code I encounter another problem, as my current knowledge, CUDA doesn't support 64bit asm operation regarding to the code you paste above, so I just read twice, for example:
_old_values = ld_gbl_ca(reinterpret_cast(&d_cost[index]));
_old_values2 = ld_gbl_ca(reinterpret_cast(&d_cost[index + 4]));
So I don''t use uint64_t at all, but copy each parameter twice, and modify the CostAggregationGenericIteration() to computes 8 disparities. After this, the disparity map messed up, I am still debugging, so I want to confirm if I misunderstand something... Thanks a lot

@ynma-hanvo
Copy link

ynma-hanvo commented Oct 8, 2022

Hi @yaobaishen See my original response to this request, the assumption that maximum disparity is 128 is used in all the code. But the cost_t represents cost computed by the census transform, so it does not change if you change the maximum disparity.

You should change the way memory is read from d_transform1 and d_transform0. Right now, we use a 32 threads for each pixel, each thread then computes the 4 disparities. You should change the code to make each thread compute 8 disparities. For example, right now we read a uint32_t from the cost cube which is the vector of 4 costs that correspond to the thread:

*old_values = ld_gbl_ca(reinterpret_cast<const uint32_t*>(&d_cost[index]));

hi, thanks for the clarify.
if i want to change MAX_DISPARITY to 64, any advice for it

@peterjiangjun
Copy link

peterjiangjun commented Oct 8, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants