Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiP: New OpenACC features for GPUFORT runtime #29

Open
wants to merge 76 commits into
base: main
Choose a base branch
from

Conversation

domcharrier
Copy link
Collaborator

@domcharrier domcharrier commented Oct 28, 2021

FEATURES:

  • Initial support for acc declare

  • OpenACC (gpufort runtime)

    • offloaded loops:
      • add default clause handling
      • default strategy is present_or_copy if neither default(none) nor default(present) s specified.
    • Initial support for acc declare (module/subroutine/function/program variables, fixed-size, allocatable, pointer)
  • Add interoperable GPUFORT array datatype (up to 7 dimensions; autogenerated):

    • Manage host and device pointer pair (can be null)
      • Either wrap exisiting pointers or
      • Allocate (pinned) host memory requested
      • Allocate device memory if requested
    • Provide H2D, D2H copy operations
    • Encode bounds and sizes of an Fortran array
    • Can be configured to perform H2D, D2H copy operations at init/destruction
    • In C++, equipped with operator()(int i1, int i2, ...) to support Fortran style array indexing
      in C++ code. No index macros.
  • Will be used by GPUFORT to construct interoperable derived types from non-interoperable device types.
    This will allow AoS syntax such as:
    domains(5).cells(i).coord_x in GPUFORT C++ code, which is analogous to the Fortran equivalent:
    domain(5)%cells(i)%coord_x.

BUGFIXES:

  • Fix crash when encountering top-level subroutines/functions.

* Reason: Need to know all variables in loop kernel
body to generate `present_or_\(copy\|copyin\) runtime calls
for the vars not appearing in clauses.
*If no default clause is specified, present_or_copy is performed for all
 unmapped variables
*If 'default(none)' is specified and not all variables are mapped, warning
 is posted (will add option to convert to error)
*If 'default(present)' is specified,

TODO: Take parent data directive into account to prevent
some unnecessary runtime calls (if current behaviour is performance issue).
* vector-add-declare/vector-add.f90 example with declare in program seems
  to work correctly
* Some more care required for enabling declare in subroutines
FEATURE: gpufort acc runtime behavior can now be influenced/
 tuned via the following environment variables:
 GPUFORT_LOG_LEVEL                    (default=0)
   log level. Maximum log level used in code 3.
 GPUFORT_MAX_QUEUES                   (default=64)
   maximum number of async queues.
 GPUFORT_INITIAL_RECORD_LIST_CAPACITY (default=4096)
   mapping records are managed via a vector. Specify
   the initial vector capacity via this flag.
   If the maximum capacity is reached, the vector
   capacity is doubled.
 GPUFORT_BLOCK_SIZE                   (default=32)
   all device arrays are allocated as multiples of
   this block size.
 GPUFORT_REUSE_THRESHOLD              (default=0.9)
   reuse an existing device buffer if the requested
   buffer size greater than GPUFORT_REUSE_THRESHOLD x size of
   already existing buffer.
 GPUFORT_NUM_REFS_TO_DEALLOCATE       (default=-5)
   number of references for which a released array
   will be deallocated

OPTIMIZATION: gpufort acc runtime will now try to
reuse device arrays that have been previously
released but not allocated yet. Behavior can be tuned
via env. vars. GPUFORT_REUSE_THRESHOLD,
GPUFORT_NUM_REFS_TO_DEALLOCATE and in some sense via
GPUFORT_BLOCK_SIZE.

BUGFIX/OPTIMIZATION: Lookup records from back.
* add `scope` arg to signature of _intrnl_inout_arrays_in_subtree
* Fixes all tests in <gpufort_dir>/python/test/grammar_translator/openacc
* Detects now (additionally) expressions such as
  ```
  <line 1>&
  !$acc <rest of line 2>
  ```

  and removes the &\s*\n\s*!$acc
FEATURE/linemapper:
  Allow to prepend and append lines directly to
  statement data structures and not only whole line.
  New data structure triggered changes in all dependent packages (scanner,indexer)

FEATURE/gpufort:
  Add option to dump linemapper datastructure
@domcharrier domcharrier added the enhancement New feature or request label Oct 28, 2021
*Fix mismatching arg lists between wrapper/impl
function; put long dummy arg list of gpufort_acc_present_...
in macro and reuse macro in wrapper
(gpufort_acc_runtime) and implementation
(gpufort_acc_runtime_base).

minor/unrelated:
*rename internal function in gpufort.py (parse_cl_args)
@domcharrier domcharrier changed the title New OpenACC features for GPUFORT runtime WiP: New OpenACC features for GPUFORT runtime Nov 25, 2021
@domcharrier domcharrier self-assigned this Nov 25, 2021
domcharrier and others added 6 commits November 25, 2021 05:52
TODO: Improve test parkour for translator
to improve declaration parsing.
* Fix issue with parsing expressions that have '=>' in declared variable
  RHS.
* RHS of declared variable can now be logical expression too.
* Add more rigorous test for declaration.
GPUFORT tries to preserve comments.
Unfortunately, this becomes
difficult when a comment begins
after a line continuation character.

GPUFORT will move these comments
to the before the statement that
contains them.
* Add test to folder python/test/utils
…l-statement

WiP: BUGFIX: Support comments in multi-line statements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants