Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StableDiffusion / Non-zero status code returned while running Add node. Name:'Add_221' #142

Open
imranypatel opened this issue May 30, 2024 · 5 comments

Comments

@imranypatel
Copy link

While trying Basic Stable Diffusion Example of https://www.nuget.org/packages/OnnxStack.StableDiffusion, at the following line in the code:

    // Run Pipleine
    var result = await pipeline.GenerateImageAsync(promptOptions);

Following exception is raised:

Microsoft.ML.OnnxRuntime.OnnxRuntimeException
  HResult=0x80131500
  Message=[ErrorCode:RuntimeException] Non-zero status code returned while running Add node. Name:'Add_221' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2482)\onnxruntime.DLL!00007FFC280E7AA5: (caller: 00007FFC280E712D) Exception(3) tid(1b6c) 80004005 Unspecified error

  Source=Microsoft.ML.OnnxRuntime
  StackTrace:
   at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus)
   at Microsoft.ML.OnnxRuntime.InferenceSession.<>c__DisplayClass75_0.<RunAsync>b__0(IReadOnlyCollection`1 outputs, IntPtr status)
--- End of stack trace from previous location ---
   at Microsoft.ML.OnnxRuntime.InferenceSession.<RunAsync>d__75.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<EncodePromptTokensAsync>d__39.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<GeneratePromptEmbedsAsync>d__40.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<CreatePromptEmbedsAsync>d__37.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<RunInternalAsync>d__31.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<GenerateImageAsync>d__26.MoveNext()
   at TestOnnxStack.TestStableDiffusion.<Test01>d__1.MoveNext() in ...\TestOnnxStack\TestStableDiffusion.cs:line 42
   at Program.<<Main>$>d__0.MoveNext() in ...i\TestOnnxStack\Program.cs:line 6

To reproduce:

  1. Create .net8 console project
  2. Add nuget package microsoft.ml.onnxruntime.directml (1.17.3) and onnxstack.stablediffusion (0.31.0)
  3. Copy code in Program.cs from Basic Stable Diffusion Example of https://www.nuget.org/packages/OnnxStack.StableDiffusion documentation.
  4. create d:\model folder to git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 -b onnx
  5. change path to model in code e.g. var pipeline = StableDiffusionPipeline.CreatePipeline(@"D:\model\stable-diffusion-v1-5");

Platform

  • Windows 10
  • .NET 8.0.200
  • OnnxStack.StableDiffusion 0.31.0
  • Microsoft.ML.OnnxRuntime.DirectML Version="1.17.3"
  • Target CPU x64
@saddam213
Copy link
Member

saddam213 commented May 30, 2024

That;s an odd looking error coming from deep within

I'll download that model and see if its a regression, been a while since I have used the version of the model

I have this version on disk, and that seems to work ok following your steps
https://huggingface.co/TensorStack/stable-diffusion-v1-5-onnx

I will check the other model now and update you with what I find

EDIT:

Downloaded a fresh copy of the model you used and it seemed to work fine

image

Must be another cause, corrupt download?
What kind of GPU/Device are you using?

@imranypatel
Copy link
Author

Downloaded the model from https://huggingface.co/TensorStack/stable-diffusion-v1-5-onnx

Now getting slightly different error than above:

Microsoft.ML.OnnxRuntime.OnnxRuntimeException
  HResult=0x80131500
  Message=[ErrorCode:RuntimeException] Non-zero status code returned while running Mul node. Name:'' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2482)\onnxruntime.DLL!00007FFC9DD77AA5: (caller: 00007FFC9DD7712D) Exception(3) tid(6c74) 80004005 Unspecified error

  Source=Microsoft.ML.OnnxRuntime
  StackTrace:
   at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus)
   at Microsoft.ML.OnnxRuntime.InferenceSession.<>c__DisplayClass75_0.<RunAsync>b__0(IReadOnlyCollection`1 outputs, IntPtr status)
--- End of stack trace from previous location ---
   at Microsoft.ML.OnnxRuntime.InferenceSession.<RunAsync>d__75.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<EncodePromptTokensAsync>d__39.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<GeneratePromptEmbedsAsync>d__40.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<CreatePromptEmbedsAsync>d__37.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<RunInternalAsync>d__31.MoveNext()
   at OnnxStack.StableDiffusion.Pipelines.StableDiffusionPipeline.<GenerateImageAsync>d__26.MoveNext()
   at TestOnnxStack.TestStableDiffusion.<Test01>d__1.MoveNext() in ...\TestOnnxStack\TestStableDiffusion.cs:line 42
   at Program.<<Main>$>d__0.MoveNext() in ...i\TestOnnxStack\Program.cs:line 6

Device/CPU.:

image

GPU:
image

image

Kind of stuck at the moment as just a beginner in this ML.Net/Onnx/directML/etc. space.

Thank you for your support.

@saddam213
Copy link
Member

saddam213 commented May 30, 2024

Unfortunately you GPU may not have enough VRAM for stable diffusion, at minimum you would need 3GB-4GB for a F16 model

You can switch to CPU mode by using the CPU execution provider and see if that works

var pipeline = StableDiffusionPipeline.CreatePipeline(@"D:\models\test\stable-diffusion-v1-5", executionProvider: ExecutionProvider.Cpu);

Its ok, I'm just learning ML too :)

@imranypatel
Copy link
Author

I was expecting better error reporting or help for troubleshooting from .Net framework and APIs built upon for ML. Situation seems not as much different from python platforms.

Tried for Execution Provider CPU only to see similar problem; not copying detail here as I think, based on your response, I better first get hardware/software platform in order for ML on windows/,net.

Based on your learning so far, could you suggest/refer to requirements for platform (laptop, cpu, gpu, ram, windows os, etc) as developer to explore ML in general and ML.Net space in particular?

Good luck on your learning ride!

Thank you.

@imranypatel
Copy link
Author

Tried on another machine with success.

Typical GPU state during image generation:
image

Dxdiag system:
image

Dxdiage AMD Radeon GPU:
image

It takes on average about 8 minutes per image generation.

Now looking into how to reduce the time, which is of course not accetable at the moment.

Would welcome any suggestion in that direction!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants