Skip to content

0.10.1

Compare
Choose a tag to compare
@athas athas released this 26 Mar 10:37

Added

  • Using definitions from the intrinsic module outside the prelude
    now results in a warning.

  • reduce_by_index with vectorised operators (e.g. map2 (+)) is
    orders of magnitude faster than before.

  • Executables generated with the pyopencl backend now support the
    options --default-tile-size, --default-group-size,
    --default-num-groups, --default-threshold, and --size.

  • Executables generated with c and opencl now print a help text
    if run with invalid options. The py and pyopencl backends
    already did this.

  • Generated executables now support a --tuning flag for passing
    many tuned sizes in a file.

  • Executables generated with the cuda backend now take an
    --nvrtc-option option.

  • Executables generated with the opencl backend now take a
    --build-option option.

Removed

  • The old futhark-* executables have been removed.

Changed

  • If an array is passed for a function parameter of a polymorphic
    type, all arrays passed for parameters of that type must have the
    same shape. For example, given a function

    let pair 't (x: t) (y: t) = (x, y)
    

    The application pair [1] [2,3] will now fail at run-time.

  • futhark test now numbers un-named data sets from 1 rather than
    0. This only affects the text output and the generated JSON
    files, and fits the tuple element ordering in Futhark.

  • String literals are now of type []u8 and contain UTF-8 encoded
    bytes.

Fixed

  • An significant problematic interaction between empty arrays and
    inner size declarations has been closed (#714). This follows a
    range of lesser empty-array fixes from 0.9.1.

  • futhark datacmp now prints to stdout, not stderr.

  • Fixed a major potential out-of-bounds access when sequentialising
    reduce_by_index (in most cases the bug was hidden by subsequent
    C compiler optimisations).

  • The result of an anonymous function is now also forbidden from
    aliasing a global variable, just as with named functions.

  • Parallel scans now work correctly when using a CPU OpenCL
    implementation.

  • reduce_by_index was broken on newer NVIDIA GPUs when using fancy
    operators. This has been fixed.