Skip to content

Latest commit

 

History

History
686 lines (485 loc) · 23.4 KB

CHANGELOG.rst

File metadata and controls

686 lines (485 loc) · 23.4 KB

TornadoVM Changelog

This file summarizes the new features and major changes for each TornadoVM version.

TornadoVM 1.0.1

30/01/2024

Improvements

  • #305: Under-demand data transfer for custom data ranges.
  • #313: Initial support for Half-Precision (FP16) data types.
  • #311: Enable Multi-Task Multiple Device (MTMD) model from the TornadoExecutionPlan API:
  • #315: Math Ceil function added

Compatibility/Integration

  • #294: Separation of the OpenCL Headers from the code base.
  • #297: Separation of the LevelZero JNI API in a separate repository.
  • #301: Temurin configuration supported.
  • #304: Refactor of the common phases for the JIT compiler.
  • #316: Beehive SPIR-V Toolkit version updated.

Bug Fixes

  • #298: OpenCL Codegen fixed open-close brackets.
  • #300: Python Dependencies fixed for AWS
  • #308: Runtime check for Grid-Scheduler names
  • #309: Fix check-style to support STR templates
  • #314: emit Vector16 Capability for 16-width vectors

TornadoVM 1.0

05/12/2023

Improvements

  • Brand-new API for allocating off-heap objects and array collections using the Panama Memory Segment API.
  • Handling of the TornadoVM's internal bytecode improved to avoid write-only copies from host to device.
  • cospi and sinpi math operations supported for OpenCL, PTX and SPIR-V.
  • Vector 16 data types supported for float, double and int.
  • Support for Mesa's rusticl.
  • Device default ordering improved based on maximum thread size.
  • Move all the installation and configuration scripts from Bash to Python.
  • The installation process has been improved for Linux and OSx with M1/M2 chips.
  • Documentation improved.
  • Add profiling information for the testing scripts.

Compatibility/Integration

  • Integration with the Graal 23.1.0 JIT Compiler.
  • Integration with OpenJDK 21.
  • Integration with Truffle Languages (Python, Ruby and Javascript) using Graal 23.1.0.
  • TornadoVM API Refactored.
  • Backport bug-fixes for branch using OpenJDK 17: master-jdk17

Bug fixes:

  • Multiple SPIR-V Devices fixed.
  • Runtime Exception when no SPIR-V devices are present.
  • Issue with the kernel context API when invoking multiple kernels fixed.
  • MTMD mode is fixed when running multiple backends on the same device.
  • long type as a constant parameter for a kernel fixed.
  • FPGA Compilation and Execution fixed for AWS and Xilinx devices.
  • Batch processing fixed for different data types of the same size.

TornadoVM 0.15.2

26/07/2023

Improvements

  • Initial Support for Multi-Tasks on Multiple Devices (MTMD): This mode enables the execution of multiple independent tasks on more than one hardware accelerators. Documentation in link: https://tornadovm.readthedocs.io/en/latest/multi-device.html
  • Support for trigonometric radian, cospi and sinpi functions for the OpenCL/PTX and SPIR-V backends.
  • Clean-up Java modules not being used and TornadoVM core classes refactored.

Compatibility/Integration

  • Initial integration with ComputeAorta (part of the Codeplay's oneAPI Construction Kit for RISC-V) to run on RISC-V with Vector Instructions (OpenCL backend) in emulation mode.
  • Beehive SPIR-V Toolkit dependency updated.
  • Tests for prebuilt SPIR-V kernels fixed to dispatch SPIR-V binaries through the Level Zero and OpenCL runtimes.
  • Deprecated javac.py script removed.

Bug fixes:

  • TornadoVM OpenCL Runtime throws an exception when the detected hardware does not support FP64.
  • Fix the installer for the older Apple with the x86 architecture using AMD GPUs.
  • Installer for ARM based systems fixed.
  • Installer fixed for Microsoft WSL and NVIDIA GPUs.
  • OpenCL code generator fixed to avoid using the reserved OpenCL keywords from Java function parameters.
  • Dump profiler option fixed.

TornadoVM 0.15.1

15/05/2023

Improvements

  • Introduction of a device selection heuristic based on the computing capabilities of devices. TornadoVM selects, as the default device, the fastest device based on its computing capability.
  • Optimisation of removing redundant data copies for Read-Only and Write-Only buffers from between the host (CPU) and the device (GPU) based on the Tornado Data Flow Graph.
  • New installation script for TornadoVM.
  • Option to dump the TornadoVM bytecodes for the unit tests.
  • Full debug option improved. Use --fullDebug.

Compatibility/Integration ~~~~~~~~~~~~~~~~~~~~~

  • Integration and compatibility with the Graal 22.3.2 JIT Compiler.
  • Improved compatibility with Apple M1 and Apple M2 through the OpenCL Backend.
  • GraalVM/Truffle programs integration improved. Use --truffle in the tornado script to run guest programs with Truffle. Example: tornado --truffle python myProgram.py Full documentation: https://tornadovm.readthedocs.io/en/latest/truffle-languages.html

Bug fixes:

TornadoVM 0.15

27/01/2023

Improvements

  • New TornadoVM API:

  • Launch a new website https://tornadovm.readthedocs.io/en/latest/ for the documentation
  • Improved documentation
  • Initial support for Intel ARC discrete GPUs.
  • Improved TornadoVM installer for Linux
  • ImprovedTornadoVM launch script with optional parameters
  • Support of large buffer allocations with Intel Level Zero. Use: tornado.spirv.levelzero.extended.memory=True

Bug fixes:

  • Vector and Matrix types
  • TornadoVM Floating Replacement compiler phase fixed
  • Fix CMAKE for Intel ARC GPUs
  • Device query tool fixed for the PTX backend
  • Documentation for Windows 11 fixed

TornadoVM 0.14.1

29/09/2022

Improvements

  • The tornado command is replaced from a Bash to a Python script.
    • Use tornado --help to check the new options and examples.
  • Support of native tests for the SPIR-V backend.
  • Improvement of the OpenCL and PTX tests of the internal APIs.

Compatibility/Integration ~~~~~~~~~~~~~~~~~~~~~

  • Integration and compatibility with the Graal 22.2.0 JIT Compiler.
  • Compatibility with JDK 18 and JDK 19.
  • Compatibility with Apple M1 Pro using the OpenCL backend.

Bug Fixes

  • CUDA PTX generated header fixed to target NVIDIA 30xx GPUs and CUDA 11.7.
  • The signature of generated PTX kernels fixed for NVIDIA driver >= 510 and 30XX GPUs when using the TornadoVM Kernel API.
  • Tests of virtual OpenCL devices fixed.
  • Thread deployment information for the OpenCL backend is fixed.
  • TornadoVMRuntimeCI moved to TornadoVMRutimeInterface.

TornadoVM 0.14

15/06/2022

New Features

  • New device memory management for addressing the memory allocation limitations of OpenCL and enabling pinned memory of device buffers.
    • The execution of task-schedules will still automatically allocate/deallocate memory every time a task-schedule is executed, unless lock/unlock functions are invoked explicitly at the task-schedule level.
    • One heap per device has been replaced with a device buffer per input variable.
    • A new API call has been added for releasing memory: unlockObjectFromMemory
    • A new API call has been added for locking objects to the device: lockObjectInMemory This requires the user to release memory by invoking unlockObjectFromMemory at the task-schedule level.
  • Enhanced Live Task migration by supporting multi-backend execution (PTX <-> OpenCL <-> SPIR-V).

Compatibility/Integration

  • Integration with the Graal 22.1.0 JIT Compiler
  • JDK 8 deprecated
  • Azul Zulu JDK supported
  • OpenCL 2.1 as a default target for the OpenCL Backend
  • Single Docker Image for Intel XPU platforms, including the SPIR-V backend (using the Intel Integrated Graphics), and OpenCL (using the Intel Integrated Graphics, Intel CPU and Intel FPGA in emulation mode). Image: https://github.com/beehive-lab/docker-tornado#intel-integrated-graphics

Improvements/Bug Fixes

  • SIGNUM Math Function included for all three backends.
  • SPIR-V optimizer enabled by default (3x reduce in binary size).
  • Extended Memory Mode enabled for the SPIR-V Backend via Level Zero.
  • Phi instructions fixed for the SPIR-V Backend.
  • SPIR-V Vector Select instructions fixed.
  • Duplicated IDs for Non-Inlined SPIR-V Functions fixed.
  • Refactoring of the TornadoVM Math Library.
  • FPGA Configuration files fixed.
  • Bitwise operations for OpenCL fixed.
  • Code Generation Times and Backend information are included in the profiling info.

TornadoVM 0.13

21/03/2022

TornadoVM 0.12

17/11/2021

  • New backend: initial support for SPIR-V and Intel Level Zero
    • Level-Zero dispatcher for SPIR-V integrated
    • SPIR-V Code generator framework for Java
  • Benchmarking framework improved to accommodate all three backends
  • Driver metrics, such as kernel time and data transfers included in the benchmarking framework
  • TornadoVM profiler improved:
    • Command line options added: --enableProfiler <silent|console> and --dumpProfiler <jsonFile>
    • Logging improve for debugging purposes. JIT Compiler, JNI calls and code generation
  • New math intrinsincs operations supported
  • Several bug fixes:
    • Duplicated barriers removed. TornadoVM BARRIER bytecode fixed when running multi-context
    • Copy in when having multiple reductions fixed
    • TornadoVM profiler fixed for multiple context switching (device switching)
  • Pretty printer for device information

TornadoVM 0.11

29/09/2021

  • TornadoVM JIT Compiler upgrade to work with Graal 21.2.0 and JDK 8 with JVMCI 21.2.0
  • Refactoring of the Kernel Parallel API for Heterogeneous Programming:
  • Compiler update to register the global number of threads: https://github.com/beehive-lab/TornadoVM/pull/133/files
  • Simplification of the TornadoVM events handler: https://github.com/beehive-lab/TornadoVM/pull/135/files
  • Renaming the Profiler API method from event.getExecutionTime to event.getElapsedTime: #134
  • Deprecating OCLWriteNode and PTXWriteNode and fixing stores for bytes: #131
  • Refactoring of the FPGA IR extensions, from the high-tier to the low-tier of the JIT compiler
    • Utilizing the FPGA Thread-Attributes compiler phase for the FPGA execution
    • Using the GridScheduler object (if present) or use a default value (e.g., 64, 1, 1) for defining the FPGA OpenCL local workgroup
  • Several bugs fixed:
    • Codegen for sequential kernels fixed
    • Function parameters with non-inlined method calls fixed

TornadoVM 0.10

29/06/2021

  • TornadoVM JIT Compiler sync with Graal 21.1.0
  • Experimental support for OpenJDK 16
  • Tracing the TornadoVM thread distribution and device information with a new option --threadInfo instead of --debug
  • Refactoring of the new API:
    • TornadoVMExecutionContext renamed to KernelContext
    • GridTask renamed to GridScheduler
  • AWS F1 AMI version upgraded to 1.10.0 and automated the generation of AFI image
  • Xilinx OpenCL backend expanded with:
      1. Initial integration of Xilinx OpenCL attributes for loop

        pipelining in the TornadoVM compiler

      1. Support for multiple compute units
  • Logging FPGA compilation option added to dump FPGA HLS compilation to a file
  • TornadoVM profiler enhanced for including data transfers for the stack-frame and kernel dispatch time
  • Initial support for 2D Arrays added
  • Several bug fixes and stability support for the OpenCL and PTX backends

TornadoVM 0.9

15/04/2021


TornadoVM 0.8

19/11/2020

  • Added PTX backend for NVIDIA GPUs
    • Build TornadoVM using make BACKEND=ptx,opencl to obtain the two supported backends.
  • TornadoVM JIT Compiler aligned with Graal 20.2.0
  • Support for other JDKs:
    • Red Hat Mandrel 11.0.9
    • Amazon Coretto 11.0.9
    • GraalVM LabsJDK 11.0.8
    • OpenJDK 11.0.8
    • OpenJDK 12.0.2
    • OpenJDK 13.0.2
    • OpenJDK 14.0.2
  • Support for hybrid (CPU-GPU) parallel reductions
  • New API for generic kernel dispatch. It introduces the concept of WorkerGrid and GridTask
    • A WorkerGrid is an object that stores how threads are organized on an OpenCL device: java WorkerGrid1D worker1D = new WorkerGrid1D(4096);
    • A GridTask is a map that relates a task-name with a worker-grid. java GridTask gridTask = new GridTask(); gridTask.set("s0.t0", worker1D);
    • A TornadoVM Task-Schedule can be executed using a GridTask: java ts.execute(gridTask);
    • More info: link
  • TornadoVM profiler improved
    • Profiler metrics added
    • Code features per task-graph
  • Lazy device initialisation moved to early initialisation of PTX and OpenCL devices
  • Initial support for Atomics (OpenCL backend)
  • Task Schedules with 11-14 parameters supported
  • Documentation improved
  • Bug fixes for code generation, numeric promotion, basic block traversal, Xilinx FPGA compilation.

TornadoVM 0.7

22/06/2020

  • Support for ARM Mali GPUs.
  • Support parallel reductions on FPGAs
  • Agnostic FPGA vendor compilation via configuration files (Intel & Xilinx)
  • Support for AWS on Xilinx FPGAs
  • Recompilation for different input data sizes supported
  • New TornadoVM API calls:
    1. Update references for re-compilation: taskSchedule.updateReferences(oldRef, newRef);
    2. Use the default OpenCL scheduler: taskSchedule.useDefaultThreadScheduler(true);
  • Use of JMH for benchmarking
  • Support for Fused Multiply-Add (FMA) instructions
  • Easy-selection of different devices for unit-tests tornado-test.py -V --debug -J"-Dtornado.unittests.device=0:1"
  • Bailout mechanism improved from parallel to sequential
  • Improve thread scheduling
  • Support for private memory allocation
  • Assertion mode included
  • Documentation improved
  • Several bug fixes

TornadoVM 0.6

21/02/2020

  • TornadoVM compatible with GraalVM 19.3.0 using JDK 8 and JDK 11
  • TornadoVM compiler update for using Graal 19.3.0 compiler API
  • Support for dynamic languages on top of Truffle
  • Support for multiple tasks per task-schedule on FPGAs (Intel and Xilinx)
  • Support for OSX Mojave and Catalina
  • Task-schedule name handling for FPGAs improved
  • Exception handling improved
  • Reductions for long type supported
  • Bug fixes for ternary conditions, reductions and code generator
  • Documentation improved

TornadoVM 0.5

16/12/2019

  • Initial support for Xilinx FPGAs
  • TornadoVM API classes are now Serializable
  • Initial support for local memory for reductions
  • JVMCI built with local annotation patch removed. Now TornadoVM requires unmodified JDK8 with JVMCI support
  • Support of multiple reductions within the same task-schedules
  • Emulation mode on Intel FPGAs is fixed
  • Fix reductions on Intel Integrated Graphics
  • TornadoVM driver OpenCL initialization and OpenCL code cache improved
  • Refactoring of the FPGA execution modes (full JIT and emulation modes improved).

TornadoVM 0.4

14/10/2019

  • Profiler supported
    • Use -Dtornado.profiler=True to enable profiler
    • Use -Dtornado.profiler=True -Dtornado.profiler.save=True to dump the profiler logs
  • Feature extraction added
    • Use -Dtornado.feature.extraction=True to enable code extraction features
  • Mac OSx support
  • Automatic reductions composition (map-reduce) within the same task-schedule
  • Bug related to a memory leak when running on GPUs solved
  • Bug fixes and stability improvements

TornadoVM 0.3

22/07/2019

  • New Matrix 2D and Matrix 3D classes with type specializations.
  • New API-call TaskSchedule#batch for batch processing. It allows programmers to run with more data than the maximum capacity of the accelerator by creating batches of executions.
  • FPGA full automatic compilation pipeline.
  • FPGA options simplified:
    • -Dtornado.precompiled.binary=<binary> for loading the bitstream.
    • -Dtornado.opencl.userelative=True for using relative addresses.
    • -Dtornado.opencl.codecache.loadbin=True removed.
  • Reductions support enhanced and fully automated on GPUs and CPUs.
  • Initial support for reductions on FPGAs.
  • Initial API for profiling tasks integrated.

TornadoVM 0.2

25/02/2019

  • Rename to TornadoVM
  • Device selection for better performance (CPU, multi-core, GPU, FPGA) via an API for Dynamic Reconfiguration
    • Added methods executeWithProfiler and executeWithProfilerSequential with an input policy.
    • Policies: Policy.PERFORMANCE, Policy.END_2_END, and Policy.LATENCY implemented.
  • Basic heuristic for predicting the highest performing target device with Dynamic Reconfiguration
  • Initial FPGA integration for Altera FPGAs:
    • Full JIT compilation mode
    • Ahead of time compilation mode
    • Emulation/debug mode
  • FPGA JIT compiler specializations
  • Added support for Java reductions:
    • Compiler specializations for CPU and GPU reductions
  • Performance and stability fixes

Tornado 0.1.0

07/09/2018

  • Initial Implementation of the Tornado compiler
  • Initial GPU/CPU code generation for OpenCL
  • Initial support in the runtime to execute OpenCL programs generated by the Tornado JIT compiler
  • Initial Tornado-API release (@Parallel Java annotation and TaskSchedule API)
  • Multi-GPU enabled through multiple tasks-schedules