Skip to content

Releases: bmerry/clogs

1.5.2

13 May 16:59
Compare
Choose a tag to compare
  • Remove some uses of TR1/Boost in favour of C++11
  • Make C++11 mandatory
  • Move to Github

release-1.5.1

09 May 18:50
Compare
Choose a tag to compare
  • Workaround for NVIDIA driver bug that prevents scan from autotuning on Pascal hardware
  • Fix crash for systems with multiple NVIDIA GPUs

release-1.5.0

09 May 18:47
Compare
Choose a tag to compare
  • Make tuning work on-the-fly instead of requiring up-front tuning
  • Make the ABI robust against changes to OpenCL C++ bindings
  • Add method overloads that take OpenCL C API handles
  • Add free callback to setEventCallback functions
  • Add support for arbitrary function objects to setEventCallback functions
  • Allow algorithm objects to be default constructed, swapped, and moved
  • Fix CLOGS_VERSION_MINOR

release-1.4.0

09 May 18:46
Compare
Choose a tag to compare
  • Reduction has been added
  • Introduced the ScanProblem and RadixsortProblem classes
  • The cache is now stored in a SQLite database instead of lots of files
  • The cache is now located in an XDG-compliant location on UNIX (~/.cache/clogs by default).
  • The tuning caching mechanism has been significantly rewritten for use with SQLite
  • All kernels generated during tuning are now cached (this can use a lot of space)

release-1.3.0

09 May 18:45
Compare
Choose a tag to compare
  • Program binaries are now extracted during tuning and saved in the cache (see #8). This can also make tuning faster on systems that don't cache kernels.
  • Out-of-place scan is now supported (partially implements #12).
  • Workaround to avoid depending on OpenCL 1.2 ICD

release-1.2.3

09 May 18:45
Compare
Choose a tag to compare
  • Fix a bug causing incorrect results when SCAN_BLOCKS is small (#26)
  • Avoid building the unit test kernels except when testing (#24)
  • Speed up tuning on CPU devices (#25)

release-1.2.2

09 May 18:44
Compare
Choose a tag to compare
  • Fix a race condition in radix sort (mostly affects CPU devices)
  • Work around an AMD driver bug that caused segfaults in tuning
  • Avoid passing defines with both -D and #define

release-1.2.1

09 May 18:44
Compare
Choose a tag to compare
  • Performance improvements, particularly for AMD GPUs
  • Added --keep-going option to clogs-tune, as a temporary work-around for #23

release-1.2.0

09 May 18:43
Compare
Choose a tag to compare
  • Kernel parameters are now autotuned (refer to user manual)
  • Added benchmark support for scan
  • Fixed sorting in benchmark tool to support 3-element value types
  • Improved robustness to non-default locale
  • Added --split-debug and --variant=symbols configuration options

release-1.1.0

09 May 18:42
Compare
Choose a tag to compare
  • Add setEventCallback methods to Scan and Radixsort
  • Worked around a bug in the Intel OpenCL compiler