Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually join rocprof runs #125

Merged
merged 7 commits into from
May 16, 2023
Merged

Manually join rocprof runs #125

merged 7 commits into from
May 16, 2023

Conversation

coleramos425
Copy link
Collaborator

In an attempt to address kernel merging issues with default rocprofiler implementation (i.e. $ROCPROF_DIR/libexec/rocprofiler/tblextr.py), we've created join_prof().

  • pmc_perf_split() separates pmc_perf.txt into 17 input files (one for each line)

  • 17 separate output files (pmc_perf_*.csv) are generated by rocprof and fed into join_prof()

  • join_prof() merges all 17 files into pmc_perf.csv by combining counters across kernels and averaging timestamps

Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
@coleramos425
Copy link
Collaborator Author

Everything is looking good in my tests. Only issue I've noticed is that gpu-id, which is used in analyze mode (i.e. for gpu-id based filtering), is not accounted for at the moment
https://github.com/AMDResearch/omniperf/blob/8c173446d2909db48c6cfe024ab54205fbfef1e8/src/utils/perfagg.py#L107-L132

If gpu-id will always be consistent across all runs then it's simple to merge. However, I'm thinking there might be some edge cases where this isn't true... (i.e. MPI runs?) which may introduce some complications. Any thoughts @arghdos ?

@skyreflectedinmirrors
Copy link
Contributor

skyreflectedinmirrors commented May 8, 2023

However, I'm thinking there might be some edge cases where this isn't true... (i.e. MPI runs?) which may introduce some complications. Any thoughts @arghdos ?

You're probably right that there could be some cases where different MPI runs target different GPUs per process. We could probably code up some strategies the user can select from for joing such an app, but in the short term I'm less concerned about that.

Edit: this is assuming that e.g., if I do hipSetDevice(0) on two subsequent MPI runs, I get the same "gpu-id" in rocprof? Which, I think I should

src/utils/perfagg.py Outdated Show resolved Hide resolved
Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
@coleramos425
Copy link
Collaborator Author

I've redefined "boring duplicates" such that gpu-id required in filtering, along with wgr, vgpr, sgpr, lds, scr, grd required in wavefront launch are no longer ignored.
https://github.com/AMDResearch/omniperf/blob/444102d92726baf286d1cb321fc2eb3f1bc30ee3/src/utils/perfagg.py#L119-L144

Copy link
Contributor

@skyreflectedinmirrors skyreflectedinmirrors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, minor comment about "perhaps we wanna save the individual files" for logging purposes.

Thanks @coleramos425 !

src/utils/perfagg.py Outdated Show resolved Hide resolved
src/utils/perfagg.py Outdated Show resolved Hide resolved
Signed-off-by: coleramos425 <colramos@amd.com>
@coleramos425
Copy link
Collaborator Author

Thanks for the review. Looks good to me. Merging this into dev for others to try out.

@coleramos425 coleramos425 merged commit c5a19ca into dev May 16, 2023
9 checks passed
@coleramos425 coleramos425 deleted the separate-pmc-perf branch May 16, 2023 21:51
feizheng10 pushed a commit to feizheng10/omniperf that referenced this pull request Dec 6, 2023
Manually join rocprof runs
Signed-off-by: fei.zheng <fei.zheng@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants