Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for stoichastic kernel dispatch selection #183

Conversation

skyreflectedinmirrors
Copy link
Contributor

@skyreflectedinmirrors skyreflectedinmirrors commented Sep 11, 2023

Fixes an issue when trying to select a specific --dispatch <N> on runs w/ 'stoichastic' dispatches (e.g., you don't have the same number of launches of a specific kernel between runs), e.g. from our stochastic test code (I think this was originally Stan's? Thanks Stan!)

#include <hip/hip_runtime.h>
#include <stdio.h>

__global__ void test_kernel(void)
{
  printf("Hello there\n");
}

int main()
{
  FILE *fp;
  fp = fopen("memory.txt", "a+");
  int count = 0;
  for (char c = getc(fp); c != EOF; c = getc(fp))
    if (c == '\n') // Increment count if this character is newline
      count = count + 1;

  fprintf(fp, "new line \n");
  fclose(fp);
  printf("Running kernel %d times\n", count);

  for (int i=0; i < count; i++)
    {
      test_kernel<<<1,1,0>>>();
    }
  hipDeviceSynchronize();
  return 0;
}

The current behavior is:

$ hipcc -O3 test.cpp
$ omniperf profile -n test --no-roof -- ./a.out
<...>
$ omniperf analyze --dispatch 1 -p workloads/test/mi200/
<...>
Traceback (most recent call last):
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf", line 917, in <module>
    main()
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf", line 897, in main
    analyze(args)
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/omniperf_analyze.py", line 282, in analyze
    run_cli(args, runs)
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/omniperf_analyze.py", line 201, in run_cli
    parser.load_table_data(
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/utils/parser.py", line 725, in load_table_data
    eval_metric(
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/utils/parser.py", line 508, in eval_metric
    ammolite__build_in[key] = eval(compile(s, "<string>", "eval"))
  File "<string>", line 2, in <module>
  File "/home/nicurtis/miniconda/envs/omniperf/bin/omniperf_analyze/utils/parser.py", line 154, in to_int
    return int(a)
ValueError: cannot convert float NaN to integer

The problem is that when there is an uneven # of dispatches in the data-frame between runs, one (or more) indices gets marked as 'NaN' due to the merge between runs:

(Pdb) p ret_df[schema.pmc_perf_file_prefix]["Index"].astype(str)
0      0.0
1      1.0
2      2.0
3      3.0
4      4.0
      ... 
74    74.0
75    75.0
76    76.0
77    77.0
**78     nan**

Then, when we do the "convert to string and see if that index is in the user's string" test, dispatch 10 will fail, because "10.0" != "10". Thus by the time we get to eval_metrics, we have an empty dataframe (aside: @coleramos425 -- we should handle that more gracefully, e.g., check if the frame is empty, and spit out a helpful warning message)

The fix here is to simply convert the user's dispatch index to an integer, and use that to index into the data-frame directly.

Signed-off-by: Nicholas Curtis <nicurtis@amd.com>
@skyreflectedinmirrors skyreflectedinmirrors force-pushed the fix_for_dispatch_selection_of_stochastic branch from dd5c5ad to a0955c2 Compare September 11, 2023 14:17
@skyreflectedinmirrors
Copy link
Contributor Author

Hmm, this doesn't fully solve the issue for the most pathological case (e.g., the example I linked), which will go back to the old breakage for dispatch >= 5, which starts to look extremely degenerate:

image

But, it works for the "sometimes I launch this kernel 3 times, sometimes 4" type of codes. Probably what I was referring to was that https://github.com/AMDResearch/omniperf/blob/cc5bba19f4ed60310371ca2cea0980f461614f9b/src/omniperf_analyze/utils/parser.py#L757 is insufficient. We'll have to think about better ways to detect this

@coleramos425
Copy link
Collaborator

coleramos425 commented Sep 11, 2023

Thanks for the addition, @arghdos . We'll get this merged.

this doesn't fully solve the issue for the most pathological case (e.g., the example I linked), which will go back to the old breakage for dispatch >= 5

I wasn't able to reproduce this breakage after checking out the PR. My understanding is that timestamp columns (i.e. BeginNs and EndNs) shouldn't have any NaN's. Since your test code launches +1 kernels each run, the timestamp replacement should insert timestamps for each dispatch. Am I missing something?

*Note: For a truly random case like below, this could trigger a reproducer since the timestamp replacement run could generate fewer dispatches than the target replacement file, pmc_perf.csv

int  my_rand = 1 + (rand() % 100)
for (int i = 0; i < my_rand; i++){
   test_kernel<<<1,1,0>>>();
}
hipDeviceSynchronize();

@skyreflectedinmirrors
Copy link
Contributor Author

skyreflectedinmirrors commented Sep 11, 2023 via email

@coleramos425 coleramos425 merged commit 6cc220d into ROCm:dev Sep 11, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants