Skip to content

Using number of compute units to set gridSize#754

Merged
zjing14 merged 2 commits into
developfrom
check_cu_count
Jun 15, 2023
Merged

Using number of compute units to set gridSize#754
zjing14 merged 2 commits into
developfrom
check_cu_count

Conversation

@qianfengz
Copy link
Copy Markdown
Contributor

Fix to Issue 266

return (count);
};

hip_check_error(hipExtStreamGetCUMask(stream_config.stream_id_, MAX_MASK_DWORDS, &cuMask[0]));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&cuMask[0] => cuMask

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just getting the multiProcessorCount property from the device ? https://docs.amd.com/bundle/HIP-API-Guide-v5.4.1/page/a01151.html#add8d9d2ad52aece9fd1dbe25c18d9d57

Copy link
Copy Markdown
Contributor Author

@qianfengz qianfengz Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just getting the multiProcessorCount property from the device ? https://docs.amd.com/bundle/HIP-API-Guide-v5.4.1/page/a01151.html#add8d9d2ad52aece9fd1dbe25c18d9d57

Cause the CUs usable by a stream can be masked by the hipExtStreamCreateWithCUMask(), multiple streams can be created on the same device, we should concern about the stream, rather the device.

Copy link
Copy Markdown
Collaborator

@rocking5566 rocking5566 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good solution

@rocking5566 rocking5566 self-requested a review June 14, 2023 07:50
@zjing14 zjing14 merged commit c5f6ec8 into develop Jun 15, 2023
@qianfengz qianfengz deleted the check_cu_count branch June 16, 2023 01:16
hyoon1 pushed a commit to hyoon1/composable_kernel that referenced this pull request Mar 19, 2026
…m#754)

* Add split-k, M<->H to varseq path

* skip M<->H when dropout>0, fix LSE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants