Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMP Synch_p2p depends on x86 #611

Closed
jeffhammond opened this issue Jul 11, 2022 · 7 comments · Fixed by #627
Closed

OpenMP Synch_p2p depends on x86 #611

jeffhammond opened this issue Jul 11, 2022 · 7 comments · Fixed by #627

Comments

@jeffhammond
Copy link
Member

jeffhammond commented Jul 11, 2022

Nondeterministic failures in OpenMP Synch_p2p on Graviton 3 suggest the code depends on x86 memory model behavior and needs fixing.

[jhammond@c7g-dy-c7g16xlarge-1 Synch_p2p]$ ./p2p 8 10 8000 8000
Parallel Research Kernels version 2.17
OpenMP pipeline execution on 2D grid
Number of threads         = 8
Grid sizes                = 8000, 8000
Number of iterations      = 10
Neighbor thread handshake = on
Solution validates
Rate (MFlops/s): 13230.540174 Avg time (s): 0.009672
[jhammond@c7g-dy-c7g16xlarge-1 Synch_p2p]$ ./p2p 8 10 8000 8000
Parallel Research Kernels version 2.17
OpenMP pipeline execution on 2D grid
Number of threads         = 8
Grid sizes                = 8000, 8000
Number of iterations      = 10
Neighbor thread handshake = on
ERROR: checksum 143982.000000 does not match verification value 175978.000000

This pattern is almost certainly wrong in general (i.e. not x86).

    if (TID==0) { /* first thread waits for corner value to be copied            */
      while (flag(0,0) == true) {
        #pragma omp flush
      }
#if SYNCHRONOUS
      flag(0,0)= true;
      #pragma omp flush
#endif
    }
@jeffhammond
Copy link
Member Author

@tgmattso if you are looking for an excuse to write examples of OpenMP atomics, this is a good one.

@jeffhammond jeffhammond changed the title OpenMP Synch_p2p probably depends on x86 OpenMP Synch_p2p depends on x86 Jul 11, 2022
@tgmattso
Copy link

tgmattso commented Jul 12, 2022

Yes, this code is wrong. It works on x86 since we apply relaxed atomics in the x86 memory model. The current code is based on the very old, original OpenMP. We realized we needed a more flexible mechanism built around atomics which we added in OpenMP 4.0. I haven't tested this yet in syncP2P, but the right patter is the following.

    if (TID==0) { /* first thread waits for corner value to be copied            */
      while (1) {
        #pragma omp atomic read seq_cst
            flg_tmp = flag(0,0);
        if (flg_tmp == true)  break;
      }
#if SYNCHRONOUS
      #pragma omp atomic write seq_cst
         flag(0,0)= true;
#endif
    }

@tgmattso tgmattso reopened this Jul 12, 2022
@tgmattso
Copy link

sorry, I accidently marked this as closed. It's not closed until I test and verify the code.

@rfvander
Copy link
Contributor

Geez, who wrote that shitty code?

@tgmattso
Copy link

I don't know. :)

But I take considerable blame for this since for many years, that is how I told people to handle point to point synchronization in OpenMP. The real problem is the OpenMP specification prior to OpenMP 4.0. Those of us creating it didn't know better and created an API with insufficient atomics to write such code. Shame on them.

@rfvander
Copy link
Contributor

What's funny is that as I was implementing the synch_p2p code I looked for examples of the type of synchronization needed and landed on the LU NAS Parallel Benchmark. I soon discovered it was wrong and had been wrong for years. I was very proud of my discovery and made sure not to repeat the mistake with synch_p2p. Or so I thought ...

@jeffhammond
Copy link
Member Author

i finally got around to fixing this. hopefully at least one of you has a chance to review it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants