Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add maximum limit for grid of reduce, elementwise, gather and scatter #40813

Merged
merged 3 commits into from Mar 25, 2022

Conversation

FlyingQianMM
Copy link
Contributor

PR types

Bug fixes

PR changes

OPs

Describe

The grid number of reduce、elementwise and masked_select has not been limited, which may raise a bug like:

:parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument. 

So we add a maximum limit for grid of reduce, elementwise, gather and scatter kernel.

@@ -128,6 +128,8 @@ inline GpuLaunchConfig GetGpuLaunchConfig1D(
// Number of threads per block shall be larger than 64.
threads = std::max(64, threads);
int blocks = DivUp(DivUp(numel, vec_size), threads);
int limit_blocks = context.GetCUDAMaxGridDimSize()[0];
if (blocks > limit_blocks) blocks = limit_blocks;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

根据C++ Style要求,if条件最好加上大括号吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已加上{},感谢~

@@ -1044,7 +1056,7 @@ void ReduceKernel(const KPDevice& dev_ctx,

auto x_dim = phi::vectorize<int>(x.dims());
auto config = ReduceConfig<Ty>(origin_reduce_dims, x_dim);
config.Run();
config.Run(x.place());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可不可以把这个LimitGridDim写在外部,就可以直接用dev_ctx了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config.Run()里面有对block数量做限制,所以把thread数量限制一起放在config.Run()里面了,这样可被复用

Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FlyingQianMM FlyingQianMM merged commit 608a5f5 into PaddlePaddle:develop Mar 25, 2022
@Xreki
Copy link
Contributor

Xreki commented Mar 25, 2022

能不能给一下出错的算子配置、修改前的线程数和修改后的线程数?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants