Skip to content

Releases: GlobalArrays/ga

v5.8.2

03 Nov 21:30
Compare
Choose a tag to compare

[5.8.2]

  • Known Bugs
    • The MPI RMA port still shows spotty behavior and many tests in the test suite
      are failing for many MPI implementations. Currently, the Open MPI
      implementation in version 4.1.4 is working well and all tests are passing.
  • Added
    • Setting ARMCI_VERBOSE=1 at runtime will also dump configuration details for
      ComEx runtime
  • Changed
    • Updated compiler settings in CMake build if Fujitsu compilers are detected
  • Fixed
    • Fixed gcc toolchain checks in CMake for clang build
    • Fixed tiled arrays so that they work with restricted arrays and fixed some
      additional bugs in block cyclic distributions
    • Removed several memory leaks
    • Modified check on the number of processors that was being performed in the GA
      create process. Previously this check was failing since it was possible that
      the check was being performed before a process group had been assigned to
      global array.
    • Fixed some issues with hidden string length argument in fortran interface

v5.8.1

14 Dec 15:48
Compare
Choose a tag to compare
  • Known Bugs
  • Added
    • Added support in MA for CUDA managed memory. Provided by Jeff Hammond.
    • Added a GA_Deallocate function that deallocates memory but leaves GA in
      place. GA_Allocate can be called later on the handle. This can be used for
      memory management.
  • Changed
  • Fixed
    • Slurm conflict for free_buf symbol in DRA library. Fixed by Michael Klemm.
    • Deallocate GA_MPI_World_comm_dup in GA_Terminate.
    • Dependency of CMake build on C++ is configurable.
    • Improved CMake integration of linear algebra libraries

v5.8

02 Nov 20:08
Compare
Choose a tag to compare
  • Known Bugs
    • The MPI RMA port remains unreliable for many MPI implementations. Open MPI
      still reports many failures in the test suit. Intel MPI is better but still
      reports several failures. It is recommended to use the latest MPI
      implementations available.
  • Added
    • Version function that can be used to report the current version, subversion
      and patch numbers of the current release
    • Overlay option for creating new GAs on top of existing GAs
    • The number of progress ranks per node in the progress ranks runtime is now
      configurable
    • Functions for duplicating process groups and returning a process group that
      only contains the calling process
    • 64-bit versions of block-cyclic data distribution functions to
      C interface
    • Non-blocking test function
    • Read-only property based on caching
    • GA name can be recovered from handle
    • Added profiling capabilities to the GA branch that automatically generates
      a log file in the running directory. This can be controlled with GAW_FILE_PREFIX
      environment variable to add a prefix for the log files and the GAW_FMT
      environment variable to create a CSV format or human readable format. The
      default format is human readable.
      • For autotools, add --enable-profile=1 in the configure line
      • For CMake add -DENABLE_PROFILING=ON
  • Changed
    • Non-blocking handle management was completely revamped. This simplifies
      implementation and removes some bugs. The number of outstanding non-blocking
      calls was increased to 256
    • Modified internal function that computes rank of processors on the world
      communicator so that it does not use the MPI_Comm_translate_ranks function.
      This function is implemented with a loop that scales as the square of the
      number of processors and is very slow at large processor counts
    • modified internal iterators so that block cyclic data distributions work on
      processor groups
    • Improved CMake build
    • Modified ga_print_distribution so that it works on block-cyclic data
      distributions
  • Fixed
    • Fixed a non-blocking error that was showing up in nbtest.x

v5.7.2

28 Feb 20:18
Compare
Choose a tag to compare
  • Fixes
    • Accidently set strided accumulates to use MPI Datatypes in v5.7.1. Turned this off.

v5.7.1

28 Feb 19:32
Compare
Choose a tag to compare

v5.7

30 Mar 22:41
Compare
Choose a tag to compare
  • Known Bugs
    • Some combinations of MPI implementations with the MPI RMA and PR
      ports fail. Recommended to use latest MPI implementations available.
  • Added
    • Tiled data layout
    • Read-only property type using replication across SMP nodes
  • Changed
    • GA is now thread safe

    • MPI3 implementation based on MPI RMA now uses data types in MPI
      calls by default. This is higher performing but not as reliable as
      using multiple contiguous data transfers. The build can be
      configured to use contiguous transfers if data types are not working
      for your MPI implementation.

    • ComEx MPI-PR now uses MPI data types in strided put and get calls
      by default. To enable the old packed behavior, set the following
      environment variables to 0.

      • COMEX_ENABLE_PUT_DATATYPE
      • COMEX_ENABLE_GET_DATATYPE

      Additionally, the original packing implementation is faster for smaller
      messages. Two new environment variables control at which point the MPI
      data types are used.

      • COMEX_PUT_DATATYPE_THRESHOLD. Default 8192.
      • COMEX_GET_DATATYPE_THRESHOLD. Default 8192.
  • Fixed
    • Message sizes exceeding 2GB now work correctly
    • Mirrored Arrays now distributes data across SMP nodes for
      ComEx-based runtimes
    • Matrix multiply works for non-standard data layouts (may not be
      performant)
  • Closed Issues
    • [#48] Message sizes exceeding 2GB may not work correctly

v5.6.5

30 Mar 17:12
Compare
Choose a tag to compare
  • Known Bugs
    • [#48] Message sizes exceeding 2GB may not work correctly
  • Added
    • Environment variables to control internal ComEx MPI-PR settings
      • COMEX_MAX_NB_OUTSTANDING. Default 8.
        The maximum number of concurrent non-blocking operations.
      • COMEX_STATIC_BUFFER_SIZE. Default 2097152 bytes.
        Some ComEx operations require a temporary buffer. Any message larger than this size will dynamically allocate and free a new buffer to hold the larger message.
      • COMEX_EAGER_THRESHOLD. Default -1.
        Small messages can be sent as part of other internal ComEx operations. Recommended to set this to less than or equal to the corresponding MPI eager/rendezvous threshold cutoff.
      • COMEX_ENABLE_PUT_SELF. Default 1 (on). Contiguous put will use memcpy when target is same as originator.
      • COMEX_ENABLE_GET_SELF. Default 1 (on). Contiguous get will use memcpy when target is same as originator.
      • COMEX_ENABLE_ACC_SELF. Default 1 (on). Contiguous acc will use memcpy when target is same as originator.
      • COMEX_ENABLE_PUT_SMP. Default 1 (on). Contiguous put will use memcpy when target is on the same host via shared memory.
      • COMEX_ENABLE_GET_SMP. Default 1 (on). Contiguous get will use memcpy when target is on the same host via shared memory.
      • COMEX_ENABLE_ACC_SMP. Default 1 (on). Contiguous acc will use memcpy when target is on the same host via shared memory.
      • COMEX_ENABLE_PUT_PACKED. Default 1 (on). Strided put will pack the data into a contiguous buffer.
      • COMEX_ENABLE_GET_PACKED. Default 1 (on). Strided get will pack the data into a contiguous buffer.
      • COMEX_ENABLE_ACC_PACKED. Default 1 (on). Strided acc will pack the data into a contiguous buffer.
      • COMEX_ENABLE_PUT_IOV. Default 1 (on). Vector put will pack the data into a contiguous buffer.
      • COMEX_ENABLE_GET_IOV. Default 1 (on). Vector get will pack the data into a contiguous buffer.
      • COMEX_ENABLE_ACC_IOV. Default 1 (on). Vector acc will pack the data into a contiguous buffer.
      • COMEX_MAX_MESSAGE_SIZE. Default INT_MAX. All use of MPI will keep buffers less than this size. Sometimes useful in conjunction with eager thresholds to force all use of MPI below the eager threshold.
    • armci-config and comex-config added
      • --blas_size
      • --use_blas
      • --network_ldflags
      • --network_libs
    • ga-config added
      • --blas_size
      • --scalapack_size
      • --use_blas
      • --use_lapack
      • --use_scalapack
      • --use_peigs
      • --use_elpa
      • --use_elpa_2015
      • --use_elpa_2016
      • --network_ldflags
      • --network_libs
  • Changed
    • Removed case statement from install-autotools.sh
  • Fixed
    • install-autotools.sh works on FreeBSD
    • patch locally built m4 for OSX High Sierra
  • Closed Issues Requests
    • Scalapack with 8-byte integers? [#93]
    • Please clarify what is "peigs" library [#96]
    • additional arguments for bin/ga-config describing the presence of Peigs and/or Scalapack interfaces [#99]
    • additional arguments for bin/ga-config describing the integer size of the Blas library used [#100]

v5.6.4

21 Mar 17:49
Compare
Choose a tag to compare
  • Known Bugs
    • [#48] Message sizes exceeding 2GB may not work correctly
  • Added
    • armci-config and comex-config scripts to install.
  • Changed
    • install-autotools.sh installs all autotools regardless of existing versions
    • configure tests needing mixed C/Fortran code now use C linker
  • Fixed
    • Test suite was broken when GA was cross-compiled
    • eliop FreeBSD patch from Debichem
    • Locally installed automake is patched to work with newer perl versions
    • MPI-PR increased limit on number of possible comex_malloc invocations
  • Closed Pull Requests
    • [#92] eliop FreeBSD patch from Debian maintainers of the NWChem Package
  • Closed Issues Requests
    • [#82] Fortran failure on theta
    • [#88] Automake regex expression broken for Perl versions >=5.26.0
    • [#89] autogen fails on Mac 10.12
    • [#90] configure script fails when using clang-4/5 + gfortran 6.3 compilers on Linux
    • [#95] comex/src-mpi-pr/comex.c:996: _generate_shm_name: Assertion 'snprintf_retval < (int)31' failed

v5.6.3

09 Dec 01:04
Compare
Choose a tag to compare
  • Known Bugs
    • [#48] Message sizes exceeding 2GB may not work correctly
  • Fixed
    • Critical bug, incorrect use of MPI_Comm_split() might prevent startup
      in the following ComEx ports.
      • MPI-PR
      • MPI-PT
      • MPI-MT

v5.6.2

29 Sep 22:08
Compare
Choose a tag to compare
  • Known Bugs
    • [#48] Message sizes exceeding 2GB may not work correctly
  • Fixed
    • Bug in MPI-PT comex_malloc().
    • Revert ARMCI contiguous check due to regression.
    • ELPA updates.
    • ScaLAPACK updates, including case for large matrices.
    • ComEx OFI updates from Intel.
    • Improved configure tests for LAPACK.
    • Improved travis tests.