Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiP: Support derived types and make kernels look more like original Fortran code #40

Open
wants to merge 402 commits into
base: develop-acc
Choose a base branch
from

Conversation

domcharrier
Copy link
Collaborator

@domcharrier domcharrier commented Dec 14, 2021

TBC

Major changes:

  • New GPUFORTRT runtime, replaces gpufort_acc_runtime: Complete C++ rewrite of the gpufort_acc_runtime
    • Adds C header file gpufortrt_api.h and a Fortran module gpufortrt_api.f90,
      so that it can be accessed from C/C++ and Fortran
    • Adds logging with 5+ log levels (controllable via environment variable)

Other changes:

  • bin/gpufort and bin/gpufortfc: Add options to only partially convert a file, e.g. allows to convert only OpenACC compute constructs while not translating any other OpenACC directives.

Changes (unsorted):

  • Rewrite and breakup monolithic templates into individual macros. Use code generation to create python render methods from the macros. Allows to:
    • more flexibly use templates.
    • more easily test template-based code generation.
    • use templates in other GPUFORT python modules.
    • use GPUFORT functionality in other python apps
  • Anticipate new kernel code generation backends:
    • Split fort2hip into generic fort2x part and fort2x.hip part
    • Put abstract codegen base classes and generators into fort2x folder
      and HIP specific parts into subfolder fort2x/hip.
  • Add module fort2x.hip that provides routines for creating HIP code generators based on string input
    (Inputs: Fortran declaration list snippet & annotated Fortran loop snippet)
  • Replace hip_auto launcher by hip_ps (ps: problem size), where the first argument is a problem size dim3 that is derived from the range of the translated loop nest.
  • Improve performance of linemapper's preprocessor by using python string & regex features instead of pyparsing when evaluating macros or expressions.
  • CUDA Fortran
    • Support of fixed size device and pinned arrays in programs and procedures.

* Token-based allocate parser can decomposes deallocate/allocate
  statements into a list of variable tuples consisting each of
  the variable's name plus a list of its range
  triples (lbound, ubound, stride). Further parses
  `stat` expression if present.

* Token-based use statement parser now also parses
  module attribute lists, e.g. `use, intrinsic ::`
  plus renamings, e.g. `use, intrinsic :: iso_c_binding, renamed_local_type => c_ptr`.
* Move more analysis functionality from translator
  tree node into dedicated module
* Identify literal strings as single token
  Not used yet in indexer
* Previous version identified <name> and <name>(\w+)
  as the same tag.
* Groups multiple use statements into single use statement if
  detecting a group of statements with 'only' var list.

  Example:

  use mymod, only: a
  use mymod, only: b_local => b
  use mymod, only: c_local => c

  becomes:

  use mymod, only: a, b_local => b, c_local => c

* Groups multiple use statements into two use statement if
  detecting a group of statements with variable renaming list:

  Example:

  use mymod, a_local => a
  use mymod, b_local => b # 'a_local' is avalaible again as 'a'
  use mymod, c_local => c # 'a_local', 'b_local' are avalaible
        # again as 'a', 'b', 'c' is only available as 'c_local'

  becomes:

  use mymod, only: a_local => a, b_local => b
  use mymod, c_local => c
* Generate a C++ file for toplevel device procedures
  as well.
* util/parsing.py:is_assignment: Do not join the operands
  after splitting the tokens on the highest level with respect
  to the `=` operator before passing the first operand
  to `parse_lvalue`.

 * This commit fixes issues where do-loop related token streams like
  ["do","100","i"] were incorrectly
  identified as identifiers like "do100i".
... of numbers with user-defined precision

* Example:

"a>1._dp.and."

should be tokenized as

["a",">","1._dp",".and."]

Before this commit, it has been incorrectly tokenized as:

["a",">","1","._dp.","and","."]

as any \.\w+\. was identified as Fortran operator.

* This commit hardcodes all named Fortran operators in
  the regular expression used by the tokenizer.

Further:

* Makes certain private routines in util.parsing.parsing.py
  public to make it possible to test them. Further renames
  them to have a `_fortran_` substring.
* translator: Support case statements with multiple values

* python/converter: Remove "-std=f2008" from `--print-gfortran-config`
  flags as forcing the standard is an issue in codes where
  non-standard intrinsics such das `dfloat` are used

Minor:

* namespacegen: Ensure config flags are actually used;
                pass subprocess command as string, not as list of string
* Consider gfortran intrinsics in indexer.scope
* indexer.scope: Add routine to check if a name
  matches that of an intrinsic.
* indexer.scope: Add routine to solely lookup implicitly
  declared variables according to given
  scope's implicit spec.
... when checking if an expression is a tensor access
or function call before generating C code.

Now checking in the following order:

  1. Is explicitly declared array / implicitly declared array (via
          DIMENSION)?
  2. Is explicitly declared procedure?
  3. Is intrinsic?

If the expression cannot be associated with the any
of the above, it must be a variable from an ignored module,
or an external procedure, which will available only
at the linking stage. This check is still missing.
* Result type of an (accelerator) procedure
  might also be implicitly defined.
* This commit tries to resolve the type
  of an accelerator procedure also
  by checking the implicit spec of the current scope.
* Emit prototype for accelerator procedures
  so that they can be defined in any particular
  order afterwards and still depend on each other.
* Add test
* Integration into indexer still missing ...
* Only introduce a C++ parameter namespace for non-device procedures
  and program if that program unit contains
  compute constructs.
* Significant translation time reduction possible.
* Classify function call/array access expressions
  as array access, function call, or intrinsic call.
* Improve detection of intrinsics
* Add missing value attributes to interfaces declared in
  gpufortrt_wait* Fortran runtime function.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add "convert only compute constructs" option (and the alternative)
1 participant