Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support vectors of float-16 values #372

Merged
merged 43 commits into from
Apr 15, 2024

Conversation

mairooni
Copy link
Collaborator

@mairooni mairooni commented Apr 8, 2024

Description

This PR provides support for vectors containing half-float values.

Mark the backends affected by this PR.

  • OpenCL
  • PTX
  • SPIRV

OS tested

Mark the OS where this PR is tested.

  • Linux
  • OSx
  • Windows

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

  • Yes
  • No

How to test the new patch?

Run tornado-test -V uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats

…sult of an operation or a copy of the fields of an existing vector
… for each loadindexedvector if is it for a half float vector instead of assuming that all of them are if one is
@mairooni mairooni self-assigned this Apr 8, 2024
@jjfumero
Copy link
Member

jjfumero commented Apr 8, 2024

Some testing:

a) OpenCL on the Intel HD Graphics:

tornado-test --threadInfo -V --jvm="-Dtornado.unittests.device=0:1" uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats 
WARNING: Using incubator modules: jdk.incubator.vector

Task info: s0.t0
	Backend           : OPENCL
	Device            : Intel(R) UHD Graphics 770 CL_DEVICE_TYPE_GPU (available)
	Dims              : 1
	Global work offset: [0]
	Global work size  : [16]
	Local  work size  : [16, 1, 1]
	Number of workgroups  : [1]


Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
	Running test: vectorPhiTest              ................  [PASS] 
	Running test: testSimpleDotProductHalf2  ................  [PASS] 
	Running test: testSimpleDotProductHalf3  ................  [PASS] 
	Running test: testSimpleDotProductHalf4  ................  [PASS] 
	Running test: testSimpleDotProductHalf8  ................  [PASS] 
	Running test: testSimpleDotProductHalf16 ................  [PASS] 
	Running test: testSimpleVectorAddition   ................  [PASS] 
	Running test: testVectorHalf2            ................  [PASS] 
	Running test: testVectorHalf3            ................  [PASS] 
	Running test: testVectorFloat3toString   ................  [PASS] 
	Running test: testVectorHalf4            ................  [PASS] 
	Running test: testVectorHalf16           ................  [PASS] 
	Running test: testVectorHalf8            ................  [PASS] 
	Running test: testVectorHalf8_Storage    ................  [PASS] 
	Running test: testDotProduct             ................  [PASS] 
	Running test: privateVectorHalf2         ................  [PASS] 
	Running test: privateVectorHalf4         ................  [PASS] 
	Running test: privateVectorHalf8         ................  [PASS] 
	Running test: testVectorHalf4_Unary      ................  [PASS] 
	Running test: testInternalSetMethod01    ................  [PASS] 
	Running test: testInternalSetMethod02    ................  [PASS] 
	Running test: testInternalSetMethod03    ................  [PASS] 
	Running test: testInternalSetMethod04    ................  [PASS] 
	Running test: testAllocationIssue        ................  [PASS] 

B) SPIR-V Backend:

Task info: s0.t0
	Backend           : SPIRV
	Device            : SPIRV LevelZero - Intel(R) UHD Graphics 770 GPU
	Dims              : 1
	Global work offset: [0]
	Global work size  : [16]
	Local  work size  : [16, 1, 1]
	Number of workgroups  : [1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
	Running test: vectorPhiTest              ................  [FAILED] 
		\_[REASON] expected:<8.0> but was:<1.0>
	Running test: testSimpleDotProductHalf2  ................  [PASS] 
	Running test: testSimpleDotProductHalf3  ................  [PASS] 
	Running test: testSimpleDotProductHalf4  ................  [PASS] 
	Running test: testSimpleDotProductHalf8  ................  [PASS] 
	Running test: testSimpleDotProductHalf16 ................  [PASS] 
	Running test: testSimpleVectorAddition   ................  [FAILED] 
		\_[REASON] expected:<4.0> but was:<1.0>
	Running test: testVectorHalf2            ................  [FAILED] 
		\_[REASON] expected:<16.0> but was:<1.0>
	Running test: testVectorHalf3            ................  [FAILED] 
		\_[REASON] expected:<8.0> but was:<1.0>
	Running test: testVectorFloat3toString   ................  [PASS] 
	Running test: testVectorHalf4            ................  [FAILED] 
		\_[REASON] expected:<8.0> but was:<1.0>
	Running test: testVectorHalf16           ................  [FAILED] 
		\_[REASON] expected:<16.0> but was:<1.0>
	Running test: testVectorHalf8            ................  [FAILED] 
		\_[REASON] expected:<8.0> but was:<1.0>
	Running test: testVectorHalf8_Storage    ................  [PASS] 
	Running test: testDotProduct             ................  [PASS] 
	Running test: privateVectorHalf2         ................  [FAILED] 
		\_[REASON] expected:<120.0> but was:<1.0>
	Running test: privateVectorHalf4         ................  [FAILED] 
		\_[REASON] expected:<120.0> but was:<1.0>
	Running test: privateVectorHalf8         ................  [FAILED] 
		\_[REASON] expected:<120.0> but was:<1.0>
	Running test: testVectorHalf4_Unary      ................  [PASS] 
	Running test: testInternalSetMethod01    ................  [PASS] 
	Running test: testInternalSetMethod02    ................  [PASS] 
	Running test: testInternalSetMethod03    ................  [PASS] 
	Running test: testInternalSetMethod04    ................  [PASS] 
	Running test: testAllocationIssue        ................  [PASS] 
Test ran: 24, Failed: 10, Unsupported: 0

C) For the PTX backend:

ornado-test --threadInfo -V --jvm="-Dtornado.unittests.device=0:1" uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats 
WARNING: Using incubator modules: jdk.incubator.vector

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
	Running test: vectorPhiTest              ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testSimpleDotProductHalf2  ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testSimpleDotProductHalf3  ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testSimpleDotProductHalf4  ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testSimpleDotProductHalf8  ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testSimpleDotProductHalf16 ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testSimpleVectorAddition   ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf2            ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf3            ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorFloat3toString   ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf4            ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf16           ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf8            ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf8_Storage    ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testDotProduct             ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: privateVectorHalf2         ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: privateVectorHalf4         ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: privateVectorHalf8         ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testVectorHalf4_Unary      ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testInternalSetMethod01    ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testInternalSetMethod02    ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testInternalSetMethod03    ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testInternalSetMethod04    ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
	Running test: testAllocationIssue        ................  [FAILED] 
		\_[REASON] Index 1 out of bounds for length 1
Test ran: 24, Failed: 24, Unsupported: 0

Commit point: #24c971a95

@jjfumero
Copy link
Member

jjfumero commented Apr 8, 2024

Let's work on it together. We can start with the SPIR-V Backend.

@mairooni
Copy link
Collaborator Author

mairooni commented Apr 8, 2024

I cannot reproduce these errors for some reason. These are the tests for the SPIV backend for me:

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [2]
        Local  work size  : [2, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0-MAP
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [8]
        Local  work size  : [8, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t1-REDUCE
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 0
        Global work offset: [0]
        Global work size  : [1]
        Local  work size  : [1, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Task info: s0.t0
        Backend           : SPIRV
        Device            : SPIRV LevelZero - Intel(R) Graphics [0x46a6] GPU
        Dims              : 1
        Global work offset: [0]
        Global work size  : [16]
        Local  work size  : [16, 1, 1]
        Number of workgroups  : [1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
        Running test: vectorPhiTest              ................  [PASS] 
        Running test: testSimpleDotProductHalf2  ................  [PASS] 
        Running test: testSimpleDotProductHalf3  ................  [PASS] 
        Running test: testSimpleDotProductHalf4  ................  [PASS] 
        Running test: testSimpleDotProductHalf8  ................  [PASS] 
        Running test: testSimpleDotProductHalf16 ................  [PASS] 
        Running test: testSimpleVectorAddition   ................  [PASS] 
        Running test: testVectorHalf2            ................  [PASS] 
        Running test: testVectorHalf3            ................  [PASS] 
        Running test: testVectorFloat3toString   ................  [PASS] 
        Running test: testVectorHalf4            ................  [PASS] 
        Running test: testVectorHalf16           ................  [PASS] 
        Running test: testVectorHalf8            ................  [PASS] 
        Running test: testVectorHalf8_Storage    ................  [PASS] 
        Running test: testDotProduct             ................  [PASS] 
        Running test: privateVectorHalf2         ................  [PASS] 
        Running test: privateVectorHalf4         ................  [PASS] 
        Running test: privateVectorHalf8         ................  [PASS] 
        Running test: testVectorHalf4_Unary      ................  [PASS] 
        Running test: testInternalSetMethod01    ................  [PASS] 
        Running test: testInternalSetMethod02    ................  [PASS] 
        Running test: testInternalSetMethod03    ................  [PASS] 
        Running test: testInternalSetMethod04    ................  [PASS] 
        Running test: testAllocationIssue        ................  [PASS] 
Test ran: 24, Failed: 0, Unsupported: 0

@mairooni
Copy link
Collaborator Author

mairooni commented Apr 8, 2024

For PTX

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [2]
        Blocks dimensions : [2, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0-MAP
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [8]
        Blocks dimensions : [8, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t1-REDUCE
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 0
        Thread dimensions : [1]
        Blocks dimensions : [1, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Task info: s0.t0
        Backend           : PTX
        Device            : NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU
        Dims              : 1
        Thread dimensions : [16]
        Blocks dimensions : [16, 1, 1]
        Grids dimensions  : [1, 1, 1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
        Running test: vectorPhiTest              ................  [PASS] 
        Running test: testSimpleDotProductHalf2  ................  [PASS] 
        Running test: testSimpleDotProductHalf3  ................  [PASS] 
        Running test: testSimpleDotProductHalf4  ................  [PASS] 
        Running test: testSimpleDotProductHalf8  ................  [PASS] 
        Running test: testSimpleDotProductHalf16 ................  [PASS] 
        Running test: testSimpleVectorAddition   ................  [PASS] 
        Running test: testVectorHalf2            ................  [PASS] 
        Running test: testVectorHalf3            ................  [PASS] 
        Running test: testVectorFloat3toString   ................  [PASS] 
        Running test: testVectorHalf4            ................  [PASS] 
        Running test: testVectorHalf16           ................  [PASS] 
        Running test: testVectorHalf8            ................  [PASS] 
        Running test: testVectorHalf8_Storage    ................  [PASS] 
        Running test: testDotProduct             ................  [PASS] 
        Running test: privateVectorHalf2         ................  [PASS] 
        Running test: privateVectorHalf4         ................  [PASS] 
        Running test: privateVectorHalf8         ................  [PASS] 
        Running test: testVectorHalf4_Unary      ................  [PASS] 
        Running test: testInternalSetMethod01    ................  [PASS] 
        Running test: testInternalSetMethod02    ................  [PASS] 
        Running test: testInternalSetMethod03    ................  [PASS] 
        Running test: testInternalSetMethod04    ................  [PASS] 
        Running test: testAllocationIssue        ................  [PASS] 
Test ran: 24, Failed: 0, Unsupported: 0

@jjfumero
Copy link
Member

jjfumero commented Apr 8, 2024

ok. let me check with an older CPU. I detected that some of the tests are not passing using > Intel 12th gen HD Graphics.

@jjfumero
Copy link
Member

jjfumero commented Apr 8, 2024

My mistake. The PTX tests are passing. The command I used was wrong. Let me work on the SPIR-V and see what I can spot.

@jjfumero
Copy link
Member

jjfumero commented Apr 8, 2024

Still with an older CPU fails. I am using Intel compute runtime 23.35.27191.9 I will try to update to a newer version and check again.

@jjfumero
Copy link
Member

jjfumero commented Apr 8, 2024

This did the trick for SPIR-V Half2 vectors:

diff --git a/tornado-drivers/spirv/src/main/java/uk/ac/manchester/tornado/drivers/spirv/graal/nodes/vector/VectorAddNode.java b/tornado-drivers/spirv/src/main/java/uk/ac/manchester/tornado/drivers/spirv/graal/nodes/vector/VectorAddNode.java
index 761e060ce..01e2c8ae5 100644
--- a/tornado-drivers/spirv/src/main/java/uk/ac/manchester/tornado/drivers/spirv/graal/nodes/vector/VectorAddNode.java
+++ b/tornado-drivers/spirv/src/main/java/uk/ac/manchester/tornado/drivers/spirv/graal/nodes/vector/VectorAddNode.java
@@ -13,7 +13,7 @@
  *
  * This code is distributed in the hope that it will be useful, but WITHOUT
  * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
  * version 2 for more details (a copy is included in the LICENSE file that
  * accompanied this code).
  *
@@ -95,6 +95,8 @@ public class VectorAddNode extends BinaryNode implements LIRLowerable, VectorOp
 
         if (kind.getElementKind().isFloatingPoint()) {
             binaryOp = SPIRVAssembler.SPIRVBinaryOp.ADD_FLOAT;
+        } else if (kind.isHalf()) {
+            binaryOp = SPIRVAssembler.SPIRVBinaryOp.ADD_FLOAT;
         }
 

Le'ts replicate this change for all vector types. It might be a driver fix after all with new versions of the Intel compute runtime.

@jjfumero
Copy link
Member

jjfumero commented Apr 8, 2024

Cool, now it passes all new tests regarding FP16 with SPIR-V:

Task info: s0.t0
	Backend           : SPIRV
	Device            : SPIRV LevelZero - Intel(R) UHD Graphics 770 GPU
	Dims              : 1
	Global work offset: [0]
	Global work size  : [16]
	Local  work size  : [16, 1, 1]
	Number of workgroups  : [1]

Test: class uk.ac.manchester.tornado.unittests.vectortypes.TestHalfFloats
	Running test: vectorPhiTest              ................  [PASS] 
	Running test: testSimpleDotProductHalf2  ................  [PASS] 
	Running test: testSimpleDotProductHalf3  ................  [PASS] 
	Running test: testSimpleDotProductHalf4  ................  [PASS] 
	Running test: testSimpleDotProductHalf8  ................  [PASS] 
	Running test: testSimpleDotProductHalf16 ................  [PASS] 
	Running test: testSimpleVectorAddition   ................  [PASS] 
	Running test: testVectorHalf2            ................  [PASS] 
	Running test: testVectorHalf3            ................  [PASS] 
	Running test: testVectorFloat3toString   ................  [PASS] 
	Running test: testVectorHalf4            ................  [PASS] 
	Running test: testVectorHalf16           ................  [PASS] 
	Running test: testVectorHalf8            ................  [PASS] 
	Running test: testVectorHalf8_Storage    ................  [PASS] 
	Running test: testDotProduct             ................  [PASS] 
	Running test: privateVectorHalf2         ................  [PASS] 
	Running test: privateVectorHalf4         ................  [PASS] 
	Running test: privateVectorHalf8         ................  [PASS] 
	Running test: testVectorHalf4_Unary      ................  [PASS] 
	Running test: testInternalSetMethod01    ................  [PASS] 
	Running test: testInternalSetMethod02    ................  [PASS] 
	Running test: testInternalSetMethod03    ................  [PASS] 
	Running test: testInternalSetMethod04    ................  [PASS] 
	Running test: testAllocationIssue        ................  [PASS] 
Test ran: 24, Failed: 0, Unsupported: 0

public static boolean isEqual(HalfFloatArray a, HalfFloatArray b) {
boolean result = true;
for (int i = 0; i < a.getSize() && result; i++) {
result = compareBits(a.get(i).getHalfFloatValue(), b.get(i).getHalfFloatValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be something like:

result = result & compareBits(a.get(i).getHalfFloatValue(), b.get(i).getHalfFloatValue());

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy from the other isEqual methods we have in this class, just for HalfFloatArray data. If I change this one, should I change all the others as well?


public final class VectorHalf implements TornadoCollectionInterface<ShortBuffer> {

private static final int ELEMENT_SIZE = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elements size is 2 bytes, correct?


public static final Class<VectorHalf16> TYPE = VectorHalf16.class;

private static final int ELEMENT_SIZE = 16;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I am confused now. Element size then indicates the number of Half elements, not the half size.

In this case, I suggest renaming this constant: ELEMENT_VECTOR_SIZE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. I kept it like that for consistency, because this is how this field is named in all the other vector collection classes. I was thinking to have a separate PR for refactoring all the vector classes at some point, but I can just rename this field for the new classes.

for (Node vectorElement : vectorValueNode.inputs()) {
if (vectorElement instanceof VectorLoadElementNode) {
VectorLoadElementNode vectorLoad = (VectorLoadElementNode) vectorElement;
VectorLoadElementNode vectorLoadShort = new VectorLoadElementNode(SPIRVKind.OP_TYPE_FLOAT_16, vectorLoad.getVector(), vectorLoad.getLaneId());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, FLOAT16 is used, instead of SHORT, as we saw in the OpenCL.

Copy link
Member

@jjfumero jjfumero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a second review, LGTM. I do not have access to the SPIR-V backend on OSx. I will check the latest changes by Monday on my other laptop.

@@ -0,0 +1,11 @@
package uk.ac.manchester.tornado.api.internal.annotations;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add License Header

LeftShiftNode leftShiftNode = index.inputs().filter(LeftShiftNode.class).first();
ConstantNode currentOffset = leftShiftNode.inputs().filter(ConstantNode.class).first();
// if the shifting is by 3 (for float values)
if (currentOffset.getValue().toValueString().equals("3")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the shift is by 3 for a float value? Can we generalize this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above is wrong, it's not because of float types, it's because the JavaKind for half is Object (8 bytes). This was done because otherwise we were having issues with the stamp. I will update the comment to reflect that.

Copy link
Member

@jjfumero jjfumero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

Copy link
Member

@mikepapadim mikepapadim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jjfumero jjfumero merged commit ac476de into beehive-lab:develop Apr 15, 2024
2 checks passed
@mairooni mairooni deleted the feat/vectorfloat16 branch April 16, 2024 08:04
jjfumero added a commit to jjfumero/TornadoVM that referenced this pull request Apr 30, 2024
Improvements
~~~~~~~~~~~~~~~~~~

- [beehive-lab#369](beehive-lab#369): Introduction of Tensor types in TornadoVM API and interoperability with ONNX Runtime.
- [beehive-lab#370](beehive-lab#370): Array concatenation operation for TornadoVM native arrays.
- [beehive-lab#371](beehive-lab#371): TornadoVM installer script ported for Windows 10/11.
- [beehive-lab#372](beehive-lab#372): Add support for ``HalfFloat`` (``Float16``) in vector types.
- [beehive-lab#374](beehive-lab#374): Support for TornadoVM array concatenations from the constructor-level.
- [beehive-lab#375](beehive-lab#375): Support for TornadoVM native arrays using slices from the Panama API.
- [beehive-lab#376](beehive-lab#376): Support for lazy copy-outs in the batch processing mode.
- [beehive-lab#377](beehive-lab#377): Expand the TornadoVM profiler with power metrics for NVIDIA GPUs (OpenCL and PTX backends).
- [beehive-lab#384](beehive-lab#384): Auto-closable Execution Plans for automatic memory management.

Compatibility
~~~~~~~~~~~~~~~~~~

- [beehive-lab#386](beehive-lab#386): OpenJDK 17 support removed.
- [beehive-lab#390](beehive-lab#390): SapMachine OpenJDK 21 supported.
- [beehive-lab#395](beehive-lab#395): OpenJDK 22 and GraalVM 22.0.1 supported.
- TornadoVM tested with Apple M3 chips.

Bug Fixes
~~~~~~~~~~~~~~~~~~

- [beehive-lab#367](beehive-lab#367): Fix for Graal/Truffle languages in which some Java modules were not visible.
- [beehive-lab#373](beehive-lab#373): Fix for data copies of the ``HalfFloat`` types for all backends.
- [beehive-lab#378](beehive-lab#378): Fix free memory markers when running multi-thread execution plans.
- [beehive-lab#379](beehive-lab#379): Refactoring package of vector api unit-tests.
- [beehive-lab#380](beehive-lab#380): Fix event list sizes to accommodate profiling of large applications.
- [beehive-lab#385](beehive-lab#385): Fix code check style.
- [beehive-lab#387](beehive-lab#387): Fix TornadoVM internal events in OpenCL, SPIR-V and PTX for running multi-threaded execution plans.
- [beehive-lab#388](beehive-lab#388): Fix of expected and actual values of tests.
- [beehive-lab#392](beehive-lab#392): Fix installer for using existing JDKs.
- [beehive-lab#389](beehive-lab#389): Fix ``DataObjectState`` for multi-thread execution plans.
- [beehive-lab#396](beehive-lab#396): Fix JNI code for the CUDA NVML library access with OpenCL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants