Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnsatisfiedLinkError - PyTorch 2.0 on Windows 11 #2552

Closed
enpasos opened this issue Apr 18, 2023 · 10 comments
Closed

UnsatisfiedLinkError - PyTorch 2.0 on Windows 11 #2552

enpasos opened this issue Apr 18, 2023 · 10 comments
Labels
bug Something isn't working

Comments

@enpasos
Copy link
Contributor

enpasos commented Apr 18, 2023

Very cool, that you have started to support PyTorch 2.0!
Here some minor bug ...

Description

Running some application with 0.22.0-SNAPSHOT (today, 18th April 2023) I experience:

Caused by: java.lang.UnsatisfiedLinkError: C:\Users\enpasos\.djl.ai\pytorch\2.0.0-cu118-win-x86_64\nvfuser_codegen.dll: Can't find dependent libraries
        at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
...
        at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53)

Workaround

Deleting .djl.ai\pytorch\2.0.0-cu118-win-x86_64\nvfuser_codegen.dll I only get a warning:

[W C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\jit\codegen\cuda\interface.cpp:47] Warning: Loading nvfuser library failed with: error in LoadLibrary for nvfuser_codegen.dll. WinError 126: Das angegebene Modul wurde nicht gefunden.
 (function LoadingNvfuserLibrary)

How to Reproduce?

git clone https://github.com/enpasos/muzero.git
cd muzero 
gradlew build -x test
java -jar games/tictactoe/build/libs/tictactoe-0.6.0-SNAPSHOT-exec.jar  

Environment Info

  • DJL: 0.22.0-SNAPSHOT (automatically installed with the app)
    • PYTORCH: 2.0.0
  • Java: Corretto-17.0.6 (needs to be installed)
  • CUDA (needs to be installed)
    • cudnn: 8.9
    • CUDA SDK: 11.8
    • GPU Driver: 517.89
  • OS: Microsoft Windows 11
  • Hardware
    • GPU: NVIDIA GeForce RTX 4090
    • CPU: Intel Core i9-13900K
    • RAM: 128 GB
@enpasos enpasos added the bug Something isn't working label Apr 18, 2023
@frankfliu
Copy link
Contributor

PyTorch 2.0.0 CUDA 118 has multiple issues. We are wait for 2.0.1 release.

@farzad-845
Copy link

@frankfliu The same error occurs in my workstation, how can I downgrade to the previous version, and which version works well with Windows 11?

@enpasos
Copy link
Contributor Author

enpasos commented May 2, 2023

@farzad-845 I am happy with PyTorch 2.0.0 and CUDA SDK 11.8 and use the workaround to simply delete
C:\Users\myuser.djl.ai\pytorch\2.0.0-cu118-win-x86_64\nvfuser_codegen.dll.
Previously I was working with

pytorch = "1.13.1" with djl-pytorch-native-cu117 = { module = "ai.djl.pytorch:pytorch-native-cu117", version.ref = "pytorch" }

Hope this helps.

@farzad-845
Copy link

farzad-845 commented May 2, 2023

@enpasos Thank you, But I delete this file and get a warning, after the warning get another error that tells Can't load this .dll.

C:\Users\User\.djl.ai\pytorch\2.0.0-cu118-win-x86_64\a.out: Can't load this .dll (machine code=0x4810) on a AMD 64-bit platform

The Full Error:

Exception in thread "main" ai.djl.engine.EngineException: Failed to load PyTorch native library
	at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:90)
	at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40)
	at ai.djl.engine.Engine.getEngine(Engine.java:187)
	at ai.djl.Model.newInstance(Model.java:99)
	at ai.djl.repository.zoo.BaseModelLoader.createModel(BaseModelLoader.java:191)
	at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:154)
	at ai.djl.repository.zoo.Criteria.loadModel(Criteria.java:172)
	at ai.djl.repository.zoo.ModelZoo.loadModel(ModelZoo.java:141)
Caused by: java.lang.UnsatisfiedLinkError: C:\Users\User\.djl.ai\pytorch\2.0.0-cu118-win-x86_64\a.out: Can't load this .dll (machine code=0x4810) on a AMD 64-bit platform
	at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
	at java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:388)
	at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:232)
	at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:174)
	at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2389)
	at java.base/java.lang.Runtime.load0(Runtime.java:755)
	at java.base/java.lang.System.load(System.java:1953)
	at ai.djl.pytorch.jni.LibUtils.loadNativeLibrary(LibUtils.java:347)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
	at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
	at ai.djl.pytorch.jni.LibUtils.loadLibTorch(LibUtils.java:146)
	at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:78)
	at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53)
	... 10 more

This is my pom file:

        <!-- https://mvnrepository.com/artifact/ai.djl/api -->
        <dependency>
            <groupId>ai.djl</groupId>
            <artifactId>api</artifactId>
            <version>0.22.1</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/ai.djl.huggingface/tokenizers -->
        <dependency>
            <groupId>ai.djl.huggingface</groupId>
            <artifactId>tokenizers</artifactId>
            <version>0.22.1</version>
        </dependency>


        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-model-zoo</artifactId>
            <version>0.22.1</version>
        </dependency>

        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-native-cu118</artifactId>
            <version>2.0.0</version>
        </dependency>
        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-jni</artifactId>
            <version>2.0.0-0.22.1</version>
            <scope>runtime</scope>
        </dependency>


        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-engine</artifactId>
            <version>0.22.1</version>
            <scope>runtime</scope>
        </dependency>

@enpasos
Copy link
Contributor Author

enpasos commented May 22, 2023

@farzad-845, sorry for the late reply. I do not know what is causing your problem. However, version 2.0.1 of PyTorch has been released. According to @frankfliu's comment, I hope the problems we are experiencing may be fixed once the new version is integrated into DJL.

@enpasos
Copy link
Contributor Author

enpasos commented Jul 11, 2023

Just tested with version 2.0.1.
I see the same UnsatisfiedLinkError as in 2.0.0.
Runs for me with the same workaround as in 2.0.0.

@eleven-dimension
Copy link

eleven-dimension commented Sep 11, 2023

@farzad-845 I am happy with PyTorch 2.0.0 and CUDA SDK 11.8 and use the workaround to simply delete C:\Users\myuser.djl.ai\pytorch\2.0.0-cu118-win-x86_64\nvfuser_codegen.dll. Previously I was working with

pytorch = "1.13.1" with djl-pytorch-native-cu117 = { module = "ai.djl.pytorch:pytorch-native-cu117", version.ref = "pytorch" }

Hope this helps.

Same error on windows 10. Strange as it may seem, this method works. Thank you! @enpasos

@enpasos
Copy link
Contributor Author

enpasos commented Oct 20, 2023

Maybe we get rid of this UnsatisfiedLinkError with PyTorch 2.1.0 released October 04, 2023.

https://github.com/pytorch/pytorch/releases/tag/v2.1.0

@frankfliu
Copy link
Contributor

#2868

@enpasos
Copy link
Contributor Author

enpasos commented Nov 27, 2023

Thx, a lot for the fix and for supporting PyTorch 2.1.1!
It works nicely on my stack

https://github.com/enpasos/muzero  with  
djl = "0.26.0-SNAPSHOT"
pytorch = "2.1.1-SNAPSHOT"
on
cudnn-windows-x86_64-8.9.6.50_cuda12
cuda_12.1.1_531.14
Windows 11 Pro, 23H2, 22631.2715

@enpasos enpasos closed this as completed Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants