Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Windows support to automatic installation script #371

Merged
merged 22 commits into from
Apr 22, 2024

Conversation

otabuzzman
Copy link
Contributor

Description

The PR adds Windows support to the bin/tornadovm-installer automatic installation script. The documentation (readthedocs) has been updated accordingly.

Problem description

n/ a.

Backend/s tested

Mark the backends affected by this PR.

  • OpenCL
  • PTX
  • SPIRV

OS tested

Mark the OS where this PR is tested.

  • Linux
  • OSx
  • Windows

See results summary of test suite below.

Did you check on FPGAs?

If it is applicable, check your changes on FPGAs.

  • Yes
  • No

How to test the new patch?

On a Windows box:

  • Install Visual Studio Community 2022 and Python (using respective Windows installer for each)
  • Run installer script prepended with Python interpreter in a venv:
    python -m venv .venv
    .venv\Scripts\activate.bat
    
    python bin\tornadovm-installer --jdk jdk21 --backend "opencl,ptx,spirv"
    
  • Setup environment with command setvars.cmd
  • List devices with command python %TORNADO_SDK%\bin\tornado --devices
  • Run the test suite nmake /f Makefile.mak tests

Results summary (failures) of TornadoVM test suite.

Note: The suite was run twice consecutively on each configuration, with TornadoVM previously compiled for a single specific backend.

Windows 11 Home on Lenovo IdeaPad

OpenCL backend, 1st run

Running test: testBatchNotEven           ................  [FAILED]
        \_[REASON] expected:<5000000.0> but was:<0.0>

OpenCL backend, 2nd run

Running test: testBatchNotEven           ................  [FAILED]
        \_[REASON] expected:<5000000.0> but was:<0.0>

PTX backend, 1st run

Running test: testBatchNotEven           ................  [FAILED]
        \_[REASON] expected:<5000000.0> but was:<0.0>
Running test: testTornadoMathSinPIDouble ................  [FAILED]
        \_[REASON] Bailout is disabled.
Running test: testTornadoMathCosPIDouble ................  [FAILED]
        \_[REASON] Bailout is disabled.

PTX backend, 2nd run

Running test: testCopyInWithDevice       ................  [FAILED]
        \_[REASON] expected:<452661.0> but was:<513877.0>
Running test: testBatchNotEven           ................  [FAILED]
        \_[REASON] expected:<5000000.0> but was:<0.0>
Running test: testTornadoMathSinPIDouble ................  [FAILED]
        \_[REASON] Bailout is disabled.
Running test: testTornadoMathCosPIDouble ................  [FAILED]
        \_[REASON] Bailout is disabled.

SPIR-V backend, 1st run

      Running test: testReduction01            ................  [FAILED]
              \_[REASON] expected:<-966107475> but was:<-464861690>
      Running test: testIrregularSize03        ................  [FAILED]
              \_[REASON] expected:<336.0173> but was:<337.83768>
      Running test: testCopyInWithDevice       ................  [FAILED]
              \_[REASON] expected:<114088.0> but was:<100620.0>
      Running test: testBatchNotEven           ................  [FAILED]
              \_[REASON] expected:<5000000.0> but was:<0.0>
      Running test: test50MBInteger            ................  [FAILED]
              \_[REASON] Unable to allocate 50000024 bytes of memory.
      Running test: test04                     ................  [FAILED]
              \_[REASON] Unable to compile task s0.t0 - badCascadeKernel4

SPIR-V backend, 2nd run

      Running test: testReduction01            ................  [FAILED]
              \_[REASON] expected:<1533689219> but was:<1584861366>
      Running test: testIrregularSize03        ................  [FAILED]
              \_[REASON] expected:<424.82053> but was:<427.6317>
      Running test: testCopyInWithDevice       ................  [FAILED]
              \_[REASON] expected:<108992.0> but was:<95368.0>
      Running test: testBatchNotEven           ................  [FAILED]
              \_[REASON] expected:<5000000.0> but was:<0.0>
      Running test: test04                     ................  [FAILED]
              \_[REASON] Unable to compile task s0.t0 - badCascadeKernel4

Amazon Linux on EC2 (g4dn.xlarge)

OpenCL backend, 1st run

Running test: testComputePi              ................  [FAILED]
        \_[REASON] expected:<3.14> but was:<6.576169967651367>
Running test: testBatchNotEven           ................  [FAILED]
        \_[REASON] expected:<5000000.0> but was:<0.0>

OpenCL backend, 2nd run

Running test: testComputePi              ................  [FAILED]
        \_[REASON] expected:<3.14> but was:<6.476556777954102>
Running test: testBatchNotEven           ................  [FAILED]
        \_[REASON] expected:<5000000.0> but was:<0.0>

PTX backend, 1st run

Running test: testComputePi              ................  [FAILED]
        \_[REASON] expected:<3.14> but was:<6.341561794281006>
Running test: testCopyInWithDevice       ................  [FAILED]
        \_[REASON] expected:<163063.0> but was:<138806.0>
Running test: testBatchNotEven           ................  [FAILED]
        \_[REASON] expected:<5000000.0> but was:<0.0>
Running test: testTornadoMathCosPIDouble ................  [FAILED]
        \_[REASON] Bailout is disabled.
Running test: testTornadoMathSinPIDouble ................  [FAILED]
        \_[REASON] Bailout is disabled.

PTX backend, 2nd run

Running test: testComputePi              ................  [FAILED]
        \_[REASON] expected:<3.14> but was:<7.097877502441406>
Running test: testBatchNotEven           ................  [FAILED]
        \_[REASON] expected:<5000000.0> but was:<0.0>
Running test: testTornadoMathCosPIDouble ................  [FAILED]
        \_[REASON] Bailout is disabled.
Running test: testTornadoMathSinPIDouble ................  [FAILED]
        \_[REASON] Bailout is disabled.

####Windows Server 2022 on EC2 (g4dn.xlarge)

Note: Missing SSL certificates required manual Maven download.

OpenCL backend, 1st run

Running test: testComputePi              ................  [FAILED]
        \_[REASON] expected:<3.14> but was:<3.9345362186431885>
Running test: testBatchNotEven           ................  [FAILED]
        \_[REASON] expected:<5000000.0> but was:<0.0>

OpenCL backend, 2nd run

Running test: testComputePi              ................  [FAILED]
        \_[REASON] expected:<3.14> but was:<5.727963447570801>
Running test: testBatchNotEven           ................  [FAILED]
        \_[REASON] expected:<5000000.0> but was:<0.0>

PTX backend, 1st run

Running test: testCopyInWithDevice       ................  [FAILED]
        \_[REASON] expected:<155511.0> but was:<132241.0>
Running test: testBatchNotEven           ................  [FAILED]
        \_[REASON] expected:<5000000.0> but was:<0.0>
Running test: testTornadoMathSinPIDouble ................  [FAILED]
        \_[REASON] Bailout is disabled.
Running test: testTornadoMathCosPIDouble ................  [FAILED]
        \_[REASON] Bailout is disabled.

PTX backend, 2nd run

Running test: testComputePi              ................  [FAILED]
        \_[REASON] expected:<3.14> but was:<4.928632736206055>
Running test: testCopyInWithDevice       ................  [FAILED]
        \_[REASON] expected:<152087.0> but was:<178420.0>
Running test: testBatchNotEven           ................  [FAILED]
        \_[REASON] expected:<5000000.0> but was:<0.0>
Running test: testTornadoMathSinPIDouble ................  [FAILED]
        \_[REASON] Bailout is disabled.
Running test: testTornadoMathCosPIDouble ................  [FAILED]
        \_[REASON] Bailout is disabled.

macOS Sonoma on Macbook Air (2015)

Note: Running the test suite required to remove enable assertions option in test target in Makefile. Probably due to Sonoma compatibility issues on unsupported MacBook Air (2015). Assert in question is in .../uk/ac/manchester/tornado/runtime/interpreter/TornadoVMInterpreter.java (device != null).

OpenCL backend, 1st run

Running test: test06                     ................  [FAILED] 
        \_[REASON] Bailout is disabled. 
Running test: testBatchNotEven           ................  [FAILED] 
        \_[REASON] expected:<5000000.0> but was:<0.0>
Running test: test512MB                  ................  [FAILED] 
        \_[REASON] Unable to allocate 512000024 bytes of memory.

OpenCL backend, 2nd run

Running test: test06                     ................  [FAILED] 
        \_[REASON] Bailout is disabled. 
Running test: test512MB                  ................  [FAILED] 
        \_[REASON] Unable to allocate 512000024 bytes of memory.
Running test: testBatchNotEven           ................  [FAILED] 
        \_[REASON] expected:<5000000.0> but was:<0.0>

@CLAassistant
Copy link

CLAassistant commented Apr 5, 2024

CLA assistant check
All committers have signed the CLA.

@jjfumero
Copy link
Member

jjfumero commented Apr 6, 2024

Thank you @otabuzzman for this patch. We will take a look in the following days.
In the meantime, it seems some of the commits were done using another account in which the CLA assistant detects that it has not been signed.

Regarding the errors in the unit-tests. some of them are expected and it depends on the driver user. But we will take a closer look as well.

@otabuzzman
Copy link
Contributor Author

otabuzzman commented Apr 6, 2024 via email

@jjfumero
Copy link
Member

jjfumero commented Apr 8, 2024

Hi Jürgen. I understand. Unfortunately, we can't merge if the CLA is not signed for all commits. So my take here, as you mentioned, is to open a new PR with the diff using the account/email that you have already used for the CLA for the previous merges. Sorry for the inconvenience.

From our side, half of the team will be in a conference from mid this week, so we could start review next week too.

@otabuzzman
Copy link
Contributor Author

otabuzzman commented Apr 8, 2024 via email

@jjfumero
Copy link
Member

I have tested on Windows 11 with the OpenCL backend and it works perfectly. Thank you, this is great work.

The tests that fail, many of them are expected, especially those related to SPIR-V and PTX. Thank you for the report too. On Linux, many of these are passing.

@jjfumero
Copy link
Member

jjfumero commented Apr 18, 2024

I got an error when running the native test:

==================================================
              Unit tests report
==================================================

{'[PASS]': 610, '[FAILED]': 10, '[UNSUPPORTED]': 45}
Coverage [PASS/(PASS+FAIL)]: 98.39%
Coverage [PASS/(PASS+FAIL+UNSUPPORTED)]: 91.73%

==================================================

Total Time(s): 179.11767625808716

NMAKE : fatal error U1077: 'python %TORNADO_SDK%\bin\tornado-test --ea --verbose' : return code '0x1'
Stop.Stop.

Copy link
Member

@jjfumero jjfumero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have minor comments.

if not os.path.exists(directoryName):
## clone the repo with the OpenCL Headers
import shutil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is no need to create a temporal directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no naming conflict between the top directory of the OpenCL header repository and other pre-existing files. Therefore, I don't think a temporary directory is necessary.

Furthermore, I skipped the removal of the OpenCL header directory after installation, just in case someone wants to look inside later for whatever reason... But after all, it's a matter of taste. If you prefer a temporal directory with removal, I'm fine with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the clarification. Let's keep it then.

__DIRECTORY_DEPENDENCIES__ = "etc/dependencies"
__VERSION__ = "v1.0.1-dev"
__DIRECTORY_DEPENDENCIES__ = os.path.join("etc", "dependencies")
__VERSION__ = "v1.0.3"
Copy link
Member

@jjfumero jjfumero Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[update] Good catch. Actually, it is the v.1.0.4-dev. Let's change it to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set it to v1.0.4-dev. Your value v0.1.0.4-dev ist a typo, right? (note the 0 after the v).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry. it was a typo.

@jjfumero
Copy link
Member

Linux and OSx also work.

@otabuzzman
Copy link
Contributor Author

I got an error when running the native test:

==================================================
              Unit tests report
==================================================

{'[PASS]': 610, '[FAILED]': 10, '[UNSUPPORTED]': 45}
Coverage [PASS/(PASS+FAIL)]: 98.39%
Coverage [PASS/(PASS+FAIL+UNSUPPORTED)]: 91.73%

==================================================

Total Time(s): 179.11767625808716

NMAKE : fatal error U1077: 'python %TORNADO_SDK%\bin\tornado-test --ea --verbose' : return code '0x1'
Stop.Stop.

I think this is expected behavior: The script tornado-test returns 0x1 if any error occurs that is not whitelisted. The nmake program considers return values != 0 errors. I therefore guess that at least one of the 10 tests that failed in the above report is not whitelisted.

The same goes for make. I checked it on Linux with a PTX backend that has not whitelisted errors and the behavior is the same:

==================================================
              Unit tests report
==================================================

{'[PASS]': 566, '[FAILED]': 9, '[UNSUPPORTED]': 56}
Coverage [PASS/(PASS+FAIL)]: 98.43%
Coverage [PASS/(PASS+FAIL+UNSUPPORTED)]: 89.7%

==================================================

Total Time(s): 254.25732254981995

make: *** [tests] Error 1
[ec2-user@ip-172-31-18-17 TornadoVM]$

Copy link
Collaborator

@stratika stratika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the bin/tornadovm-installer.cmd file still needed? To my understanding, this script can be removed.

docs/source/installation.rst Outdated Show resolved Hide resolved
Copy link
Collaborator

@stratika stratika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT. I tested it in Windows, mac and Linux. The only comment is if we deprecate the .\bin\tornadovm-installer.cmd, to also update the installation.rst file.

Question: Have you tried also the MSYS2 workflow?

Co-authored-by: Thanos Stratikopoulos <34061419+stratika@users.noreply.github.com>
@otabuzzman
Copy link
Contributor Author

LGMT. I tested it in Windows, mac and Linux. The only comment is if we deprecate the .\bin\tornadovm-installer.cmd, to also update the installation.rst file.

I think it would be ok to deprecate .\bin\tornadovm-installer.cmd. The original automatic installer now works (almost) the same on all supported operating systems. The respective paragraph in installation.rst could therefore be removed.

Question: Have you tried also the MSYS2 workflow?

Good and obvious question. I didn't think of it and thus didn't test it with MSys2. Should I? The documentation does not foresee running bin\tornadovm-installer in the MSYS2 workflow.

@stratika
Copy link
Collaborator

LGMT. I tested it in Windows, mac and Linux. The only comment is if we deprecate the .\bin\tornadovm-installer.cmd, to also update the installation.rst file.

I think it would be ok to deprecate .\bin\tornadovm-installer.cmd. The original automatic installer now works (almost) the same on all supported operating systems. The respective paragraph in installation.rst could therefore be removed.

Yes, I feel the same. Let's remove it to keep it simple. Can you please include this change in the PR?

Question: Have you tried also the MSYS2 workflow?

Good and obvious question. I didn't think of it and thus didn't test it with MSys2. Should I? The documentation does not foresee running bin\tornadovm-installer in the MSYS2 workflow.

That's fine. I spoke with @jjfumero and we believe that MSYS2 installation and working flow should be deprecated, since we have a native workflow working.

Thank you very much @otabuzzman, once the script is removed we can proceed to merge it.

@otabuzzman
Copy link
Contributor Author

... (text removed)
Yes, I feel the same. Let's remove it to keep it simple. Can you please include this change in the PR?

Sure. Done.

... (text removed)
That's fine. I spoke with @jjfumero and we believe that MSYS2 installation and working flow should be deprecated, since we have a native workflow working.

I think that the section on manual Windows installation is worth keeping in an adapted version, and provided a proposal (my last commits). I removed the section on Windows issues as well since they were all related to MSYS2.

Copy link
Collaborator

@stratika stratika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! LGTM

@jjfumero jjfumero merged commit 80d9a61 into beehive-lab:develop Apr 22, 2024
2 checks passed
jjfumero added a commit to jjfumero/TornadoVM that referenced this pull request Apr 30, 2024
Improvements
~~~~~~~~~~~~~~~~~~

- [beehive-lab#369](beehive-lab#369): Introduction of Tensor types in TornadoVM API and interoperability with ONNX Runtime.
- [beehive-lab#370](beehive-lab#370): Array concatenation operation for TornadoVM native arrays.
- [beehive-lab#371](beehive-lab#371): TornadoVM installer script ported for Windows 10/11.
- [beehive-lab#372](beehive-lab#372): Add support for ``HalfFloat`` (``Float16``) in vector types.
- [beehive-lab#374](beehive-lab#374): Support for TornadoVM array concatenations from the constructor-level.
- [beehive-lab#375](beehive-lab#375): Support for TornadoVM native arrays using slices from the Panama API.
- [beehive-lab#376](beehive-lab#376): Support for lazy copy-outs in the batch processing mode.
- [beehive-lab#377](beehive-lab#377): Expand the TornadoVM profiler with power metrics for NVIDIA GPUs (OpenCL and PTX backends).
- [beehive-lab#384](beehive-lab#384): Auto-closable Execution Plans for automatic memory management.

Compatibility
~~~~~~~~~~~~~~~~~~

- [beehive-lab#386](beehive-lab#386): OpenJDK 17 support removed.
- [beehive-lab#390](beehive-lab#390): SapMachine OpenJDK 21 supported.
- [beehive-lab#395](beehive-lab#395): OpenJDK 22 and GraalVM 22.0.1 supported.
- TornadoVM tested with Apple M3 chips.

Bug Fixes
~~~~~~~~~~~~~~~~~~

- [beehive-lab#367](beehive-lab#367): Fix for Graal/Truffle languages in which some Java modules were not visible.
- [beehive-lab#373](beehive-lab#373): Fix for data copies of the ``HalfFloat`` types for all backends.
- [beehive-lab#378](beehive-lab#378): Fix free memory markers when running multi-thread execution plans.
- [beehive-lab#379](beehive-lab#379): Refactoring package of vector api unit-tests.
- [beehive-lab#380](beehive-lab#380): Fix event list sizes to accommodate profiling of large applications.
- [beehive-lab#385](beehive-lab#385): Fix code check style.
- [beehive-lab#387](beehive-lab#387): Fix TornadoVM internal events in OpenCL, SPIR-V and PTX for running multi-threaded execution plans.
- [beehive-lab#388](beehive-lab#388): Fix of expected and actual values of tests.
- [beehive-lab#392](beehive-lab#392): Fix installer for using existing JDKs.
- [beehive-lab#389](beehive-lab#389): Fix ``DataObjectState`` for multi-thread execution plans.
- [beehive-lab#396](beehive-lab#396): Fix JNI code for the CUDA NVML library access with OpenCL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants