fix for stoichastic kernel dispatch selection #183

skyreflectedinmirrors · 2023-09-11T14:17:07Z

Fixes an issue when trying to select a specific --dispatch <N> on runs w/ 'stoichastic' dispatches (e.g., you don't have the same number of launches of a specific kernel between runs), e.g. from our stochastic test code (I think this was originally Stan's? Thanks Stan!)

#include <hip/hip_runtime.h>
#include <stdio.h>

__global__ void test_kernel(void)
{
  printf("Hello there\n");
}

int main()
{
  FILE *fp;
  fp = fopen("memory.txt", "a+");
  int count = 0;
  for (char c = getc(fp); c != EOF; c = getc(fp))
    if (c == '\n') // Increment count if this character is newline
      count = count + 1;

  fprintf(fp, "new line \n");
  fclose(fp);
  printf("Running kernel %d times\n", count);

  for (int i=0; i < count; i++)
    {
      test_kernel<<<1,1,0>>>();
    }
  hipDeviceSynchronize();
  return 0;
}

The current behavior is:

$ hipcc -O3 test.cpp
$ omniperf profile -n test --no-roof -- ./a.out
<...>
$ omniperf analyze --dispatch 1 -p workloads/test/mi200/
<...>
Traceback (most recent call last):
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf", line 917, in <module>
    main()
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf", line 897, in main
    analyze(args)
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/omniperf_analyze.py", line 282, in analyze
    run_cli(args, runs)
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/omniperf_analyze.py", line 201, in run_cli
    parser.load_table_data(
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/utils/parser.py", line 725, in load_table_data
    eval_metric(
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/utils/parser.py", line 508, in eval_metric
    ammolite__build_in[key] = eval(compile(s, "<string>", "eval"))
  File "<string>", line 2, in <module>
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/utils/parser.py", line 154, in to_int
    return int(a)
ValueError: cannot convert float NaN to integer

The problem is that when there is an uneven # of dispatches in the data-frame between runs, one (or more) indices gets marked as 'NaN' due to the merge between runs:

(Pdb) p ret_df[schema.pmc_perf_file_prefix]["Index"].astype(str)
0      0.0
1      1.0
2      2.0
3      3.0
4      4.0
      ... 
74    74.0
75    75.0
76    76.0
77    77.0
**78     nan**

Then, when we do the "convert to string and see if that index is in the user's string" test, dispatch 10 will fail, because "10.0" != "10". Thus by the time we get to eval_metrics, we have an empty dataframe (aside: @coleramos425 -- we should handle that more gracefully, e.g., check if the frame is empty, and spit out a helpful warning message)

The fix here is to simply convert the user's dispatch index to an integer, and use that to index into the data-frame directly.

Signed-off-by: Nicholas Curtis <nicurtis@amd.com>

skyreflectedinmirrors · 2023-09-11T14:24:51Z

Hmm, this doesn't fully solve the issue for the most pathological case (e.g., the example I linked), which will go back to the old breakage for dispatch >= 5, which starts to look extremely degenerate:

But, it works for the "sometimes I launch this kernel 3 times, sometimes 4" type of codes. Probably what I was referring to was that https://github.com/AMDResearch/omniperf/blob/cc5bba19f4ed60310371ca2cea0980f461614f9b/src/omniperf_analyze/utils/parser.py#L757 is insufficient. We'll have to think about better ways to detect this

coleramos425 · 2023-09-11T17:16:50Z

Thanks for the addition, @arghdos . We'll get this merged.

this doesn't fully solve the issue for the most pathological case (e.g., the example I linked), which will go back to the old breakage for dispatch >= 5

I wasn't able to reproduce this breakage after checking out the PR. My understanding is that timestamp columns (i.e. BeginNs and EndNs) shouldn't have any NaN's. Since your test code launches +1 kernels each run, the timestamp replacement should insert timestamps for each dispatch. Am I missing something?

*Note: For a truly random case like below, this could trigger a reproducer since the timestamp replacement run could generate fewer dispatches than the target replacement file, pmc_perf.csv

int  my_rand = 1 + (rand() % 100)
for (int i = 0; i < my_rand; i++){
   test_kernel<<<1,1,0>>>();
}
hipDeviceSynchronize();

skyreflectedinmirrors · 2023-09-11T18:08:41Z

[AMD Official Use Only - General] I might have collected the data w/ my fork (which is slightly stale in the joining stuff). If you don't see any NaN's in the timestamps, then I think we should be OK

…

________________________________ From: Cole Ramos ***@***.***> Sent: Monday, September 11, 2023 1:17 PM To: AMDResearch/omniperf ***@***.***> Cc: Curtis, Nicholas ***@***.***>; Mention ***@***.***> Subject: Re: [AMDResearch/omniperf] fix for stoichastic kernel dispatch selection (PR #183) Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Thanks for the addition, @arghdos<https://github.com/arghdos> . We'll get this merged. this doesn't fully solve the issue for the most pathological case (e.g., the example I linked), which will go back to the old breakage for dispatch >= 5 I wasn't able to reproduce this breakage after checking out the PR. My understanding is that timestamp columns (i.e. BeginNs and EndNs) shouldn't have any NaN's. Since your test code launches +1 kernels each run, the timestamp replacement should insert timestamps for each dispatch. Am I missing something? *Note: For a truly random case like below, this could trigger a reproducer since the timestamp replacement run could generate fewer dispatches than the target replacement file, pmc_perf.csv int my_rand = 1 + (rand() % 100) for (int i = 0; i < my_rand; i++){ test_kernel<<<1,1,0>>>(); } hipDeviceSynchronize(); Anything I'm missing here? — Reply to this email directly, view it on GitHub<#183 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABRKDCJI4BJBCXVVJAMV2RTXZ5BQ3ANCNFSM6AAAAAA4TNDVLI>. You are receiving this because you were mentioned.Message ID: ***@***.***>

fix for stoichastic kernel dispatch selection

a0955c2

Signed-off-by: Nicholas Curtis <nicurtis@amd.com>

skyreflectedinmirrors force-pushed the fix_for_dispatch_selection_of_stochastic branch from dd5c5ad to a0955c2 Compare September 11, 2023 14:17

coleramos425 merged commit 6cc220d into ROCm:dev Sep 11, 2023
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix for stoichastic kernel dispatch selection #183

fix for stoichastic kernel dispatch selection #183

skyreflectedinmirrors commented Sep 11, 2023 •

edited

skyreflectedinmirrors commented Sep 11, 2023

coleramos425 commented Sep 11, 2023 •

edited

skyreflectedinmirrors commented Sep 11, 2023 via email

fix for stoichastic kernel dispatch selection #183

fix for stoichastic kernel dispatch selection #183

Conversation

skyreflectedinmirrors commented Sep 11, 2023 • edited

skyreflectedinmirrors commented Sep 11, 2023

coleramos425 commented Sep 11, 2023 • edited

skyreflectedinmirrors commented Sep 11, 2023 via email

skyreflectedinmirrors commented Sep 11, 2023 •

edited

coleramos425 commented Sep 11, 2023 •

edited