Skip to content

Latest commit

 

History

History
398 lines (243 loc) · 9.21 KB

CHANGELOG.md

File metadata and controls

398 lines (243 loc) · 9.21 KB

Neanderthal - notable changes between versions

0.49.0

  • Built with Java 8 bytecode compatibility.

0.48.0

  • Completely moved CPU engine to JavaCPP (and removed Neanderthal Native)
  • Support for sparse matrices
  • Support for integers in Fluokitten functions
  • More operations supported by integer engines
  • Major internal engine re-coding
  • Upgrade to CUDA 12.3
  • Upgrade to MKL 2024.0

0.47.0

Skipped

0.46.0

  • Upgrade CUDA to 11.8.
  • Upgrade neanderthal-native MKL native dependency to oneAPI MKL 2022.2 on Linux and Windows, and 2021.1 on Mac (the latest that my mac 10.12 supports).

0.45.0

  • Upgrade CUDA to 11.7. It should work with 11.6, too!

0.44.0

  • Upgrade CUDA to 11.6.

0.43.0

  • Upgrade CUDA to 11.4.

0.42.0

  • Fixed a few issues with devices that miss some hardware support for various OpenCL features.

0.38.0

  • Upgrade CUDA to 11.0.

0.37.0

  • Workaround the bug in Apple's OpenCL driver that doesn't support native_ functions.

0.36.0

  • Viewable/view moved to commons.

0.35.0

  • Does not require system-wide MKL. Uses binaries provided by the bytedeco jar if present.

0.34.0

  • cublas engines support int, long, short, byte, and uint8

0.33.0

  • New vect-math functions: exp2, exp10, log2, and log1p
  • Fix short and byte engine crash.

0.32.0

  • Enable submatrix of (tri)diagonal matrix when appropriate.

0.31.0

  • Fix regression in core.rk! caused by 0.30.0 update.

0.30.0

  • New function mmt! for symmetric matrix multiply with its transpose (faster than mm!).
  • Support for symmetric rk! of a vector with its transpose.

0.29.0

  • Support mapping a vector to a file channel.
  • Support int, long, short, and byte vectors.

0.28.0

  • Fix a subtle CUDA vect-math return object bug.

0.27.0

  • Support CUDA 10.2.0
  • A bunch of bugfixes provided by Kamil Toman (katox).

0.26.0

  • view and view-* behavior made more consistent.

0.25.0

  • Random number generation of vector entries with uniformly and normally distributed numbers.
  • Renamed aux to auxil to work around a Windows bug of not allowing files named aux.

0.23.0

  • Added copy-sign, ramp, step, and sigmoid functions to math and vect-math (MKL, CUDA, OpenCL)

0.22.0

  • Introduced the Viewable/view protocol and view function (interop).
  • CLBlast bumped to 1.5.0/

0.21.0

  • Clojure upgraded to 1.10.
  • Misc bugfixes.

0.20.0

  • SDD available as a SVD implementation.
  • Eigenvalues and eigenvectors computing available for symmetric matrices.
  • Fluokitten performance regression (introduced in 0.18.0) fixed.
  • Fluokitten support in non-double objects.
  • Fluokitten accepts non-primitive function for Neanderthal objects.
  • Custom non-blas sum function sped up on CPU.
  • JCublas upgraded to 0.9.2.

0.19.0

  • Support explicit stream in memcpy.
  • CUDA engine uses explicit context.
  • sum support in CUDA matrices.
  • TRSV in OpenCL matrices.
  • CLBlast dependency updated to 1.3.0. Context creation for OpenCL is much faster now.
  • Vertigo dependency removed.
  • view-ge supports arbitrary dimensions now.
  • ge supports nested sequence as source for its rows.

0.18.1

  • Added FlowProvider/flow to internal core.

0.18.0

  • Updated to Java 9 modules. Requires add-open jvm argument if run with JDK 9+.
  • Clojure dep updated to 1.9.0.
  • Upgrades JCuda dependency to 0.9.0, supports CUDA 9.
  • Core constructors accept any factory provider as factory.
  • GPU objects are safe to print after the factory has been released.

0.17.2

  • Fix the uplo_modf bug (#33).

0.17.1

  • Upgraded JOCLBlast dependency to 1.2.0.

0.17.0

New features

  • Vectorized mathematical functions (cca 50 pieces in the vect-math namespace).
  • New functions in the math namespace to support scalar equivalents of vect-math functions.
  • Schur decomposition!

Enhancements

  • JOCLBlast engine upgraded to 1.1.0.
  • CUDA implementation of SY matrices.
  • OpenCL implementation of SY matrices.

0.16.1

Bugfixes

  • Fixed call with wrong number of arguments for the transpose of OrthogonalFactorization.

0.16.0

New features

  • Diagonal matrices (GD)
  • Tridiagonal matrices (GT)
  • Diagonally dominant tridiagonal matrices (DT)
  • Symmetric tridiagonal matrices (ST)

Enhancements

  • Updated JOCLBlast dependency to 1.0.1.
  • Orthogonal factorizations greatly simplified
  • Symmetric and triangular mm support more layout and a/b position variations.
  • Upgraded Intel MKL to 2018 (it should work with earlier versions, but YMMV).

Breaking changes

  • New simplified orthogonal factorization related functions replace the old api in linalg.

Bugfixes

  • Fixed TR and SY mm and mv when k=0.
  • Fixed transpose implementations in various non-GE matrices.

0.15.0

New features

  • Symmetric matrices (SY)
  • Banded matrices (GB, TB, SB)
  • Packed matrices (TP, SP)

Enhancements

  • Better printing
  • Fluokitten protocols supported by all matrix types.
  • Overhaul of internals that opens easier path for new matrix types and specialized engines.
  • Automatized triangular factorizations and solvers.

Breaking changes

  • :order replaced by :layout in matrix options.

Fixes

  • Fix #29 - OpenCL engine does not try to load CUDA-related stuff any more.

0.14.0

Breaking changes:

  • tr* work with LUFactorization instead of GEMatrix.

New features:

  • Matrix inverse.
  • Condition number.
  • Pure tr* methods.

Enhancements

  • Improved TRMatrix printing.

0.13.0

  • Support for chained matrix multiplication in mm.

0.12.0

  • Support for inverting matrices through trf/tri.

0.11.0

New features:

  • CUDA/cuBLAS based engine (requires CUDA toolkit).
  • Additional methods from Blas supported by matrices.

0.10.0

New features:

  • Added aux namespace for auxiliary functions.
  • Sorting of vectors, GE, and TR host matrices.
  • Bulk alter! method added.
  • view-vctr and view-ge support stride multiplier.

Enhancements

  • set-all accepts NaN
  • CL factories implement FactoryProvider.

0.9.0

New features:

  • Linear algebra functions (LAPACK).
  • Support for TR matrices.
  • Pretty-printing
  • GE and TR support some more BLAS-1 functions.

Enhancements

  • Cheat Sheet in the docs.
  • Updated JOCLBlast dependency to 0.10.0.
  • Updated Fluokitten dependency to 0.6.0.
  • Internal api and implementations made more straightforward.

Breaking changes:

  • Naming scheme changed from single to float for single-precision structures.
  • sv and sge from the native namespace renamed to fv and fge.
  • core constructor functions changed from create to vctr, ge, tr, etc.
  • Core constructors no longer accept raw buffers.

0.8.0

Changes

  • Removed the amd-gcn engine. Use clblast engine instead.

Enhancements

  • The native part is now compiled for Linux, MacOX AND Windows
  • one-argument pow function added.
  • native-factory method added to FactoryProvider protocol.
  • factories and data accessors implement the compatible method.
  • Updated JOCLBlast dependency to 0.9.0.

Bugfixes

  • Attempting (create -m -n) now throws IllegalArgumentException
  • Fixed a sum bug in native implementation when stride is not 1.

0.7.0

New features:

  • scal implementation for matrices.

Bugfixes:

  • Fixed a Buffer.limit bug in subvector and submatrix.
  • Fixes #15

Enhancements

  • native function in core
  • one-argument fold now use sum instead of looping.
  • Updated JOCLBlast dependency to 0.8.0 (also fixes #15)

0.6.2

  • Updated ClojureCL dependency to 0.6.4

0.6.1

  • Updated ClojureCL dependency to 0.6.3

0.6.0

New features

  • Completely new OpenCL engine for GPU matrix computing - **supports AMD, Nvidia, and Intel, on Linux, Windows, and OSX
  • Support Fluokitten's Monoid and Magma in vectors and matrices
  • transfer method in core that always transfers data to host memory

Changes

  • opencl methods renamed
  • default OpenCL engine changed to clblast
  • old amd-gcn engine deprecated

0.5.0

New features

  • Streamlined dependencies: no longer need 2 dependencies in project files. The dependency on uncomplicate/neanderthal is enough
  • Comes with Mac OS X build out of the box. No need even for external ATLAS.
  • release and with-release moved from ClojureCL to uncomplicate/commons
  • Support for Fluokitten's fmap!, fmap, fold, foldmap, op...

0.4.0

New features

  • Streamlined factory-based constructors in core.
  • OpenCL vectors and matrices now support equality comparisons, offsets, strides, subvectors, and submatrices. Matrices now can be swapped and copied.

Changes

  • OpenCL read! and write! replaced with generic transfer! multimethod that supports a much wider area of memory types to move data to and from.
  • A large number of internal implementation changes that should not affect end users (other than as removing bugs).
  • Several important bugfixes (see git commit history).

0.3.0

New features

  • Support for pluggable BLAS engines
  • GPU computing engine based on OpenCL (kernels optimized for AMD for now)

Changes

  • Reorganized namespaces - now almost complete public API is in the core namespace
  • Changed the order of parameters in axpy!, mv! and mm! (and their variants)

0.2.0

New features

  • implemented BLAS support for floats
  • implemented fmap!, freduce, and fold functions for all existing types of matrices and vectors

Changes

No API changes were required for these features.