-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manually join rocprof runs #125
Conversation
Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
Everything is looking good in my tests. Only issue I've noticed is that If |
You're probably right that there could be some cases where different MPI runs target different GPUs per process. We could probably code up some strategies the user can select from for joing such an app, but in the short term I'm less concerned about that. Edit: this is assuming that e.g., if I do |
Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
I've redefined "boring duplicates" such that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, minor comment about "perhaps we wanna save the individual files" for logging purposes.
Thanks @coleramos425 !
Signed-off-by: coleramos425 <colramos@amd.com>
Thanks for the review. Looks good to me. Merging this into dev for others to try out. |
Manually join rocprof runs Signed-off-by: fei.zheng <fei.zheng@amd.com>
In an attempt to address kernel merging issues with default rocprofiler implementation (i.e. $ROCPROF_DIR/libexec/rocprofiler/tblextr.py), we've created join_prof().
pmc_perf_split() separates
pmc_perf.txt
into 17 input files (one for each line)17 separate output files (pmc_perf_*.csv) are generated by rocprof and fed into join_prof()
join_prof() merges all 17 files into
pmc_perf.csv
by combining counters across kernels and averaging timestamps