Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0x80004005 - unknown error] 'internal error: no storage type for block output" #54

Closed
gaiastellar opened this issue Nov 19, 2020 · 19 comments
Labels

Comments

@gaiastellar
Copy link

gaiastellar commented Nov 19, 2020

hi,
came across this while writing a compute shader on Unity. After extensive testing the error only occurs when using FastNoiseLite.
Initialisation of noise and settings variables with values works, like this:

fnl_state noise = fnlCreateState(1337);
noise.noise_type = 2;
noise.frequency = 0.02f;
noise.fractal_type = 2;
etc etc

no problem, but using variables to set the parameters causes the error, like this:

float myseed = 1212;
float myfrequency = 0.02f;
int mynoisetype = 1;
int myrotationtype = 1;
int myfractaltype = 2;
float mygain = 0.5f;

fnl_state noise = fnlCreateState(myseed);
noise.frequency = myfrequency;
noise.noise_type = mynoisetype;
noise.rotation_type_3d = myrotationtype;
noise.fractal_type = myfractaltype;
noise.gain = mygain;

In reality, im sending the variable paremeters from CPU side, which in Unity is SetFloat("mygain", 0.5f), which generally works fine, but when setting parameters like this and one of them is fractal_type, throws the error. if fractal_type is set with a definate value the error doesnt occur eg: noise.fractal_type = 2;
I originally thought this was an error in Unity in the way it was handling variables, but i wrote something using variables in a similar way, not using the FastNoiseLite HLSL library and i couldnt replicate the error, so im guessing theres something in the library in the way it is utilising gpu memory that is throwing the error, but this really isnt my area, so im only guessing.
I want to be able to send the values for noise parameters from CPU side into the compute shader, so this is really limiting the way i can use it.
many thanks

@Rover656
Copy link
Contributor

Rover656 commented Nov 19, 2020

Did you receive any form of error messages or stack traces (beyond what you stated in the title)? If so could you please share them in the form of a gist? Thank you!

@gaiastellar
Copy link
Author

gaiastellar commented Nov 19, 2020 via email

@Rover656
Copy link
Contributor

Okay, after researching this I have found that if you set more than 4 fields in the state struct from global compute shader parameters, the compiler dislikes it. I'm going to reach out to some Unity developers over the weekend when I have more time and will discuss this issue with them. I will get back to you. In the meantime the workaround would be to use as many hardcoded values (locally scoped, i.e. within your main function, not outside) until this issue can be resolved.

@gaiastellar
Copy link
Author

gaiastellar commented Nov 20, 2020

hi reece,
after some more testing, you can use more than 4 global variables - using my test code, i imported all 15 of the noise structs settings in an array, and as long as i dont set fractal_type or cellular_distance_func , it works fine. I will paste the code below. I think i have found the reason behind it, but i dont fully understand it.. to quote microsoft: Additionally, HLSL packs data so that it does not cross a 16-byte boundary. the full page on hlsl data packing is here: https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-packing-rules, but in short, when it gets to the end of a 16bytes block it will bump the next value to the start of the next block, so I am wondering if the underlying structure for fractal_type and cellular_distance_func is more than 16 bytes, in which case it when it is attempting to split the struct data in 16 byte blocks, it will get stuck on those... or something like that anyway... i also read somewhere that while the struct is just used to pass data internally, it is merely 'sugar' (thats what they called it), and doesnt effect the underlying process, but when it has to interact with something external, the data blocking becomes important...

@gaiastellar
Copy link
Author

void CSMain (uint3 id : SV_DispatchThreadID)
{
fnl_state noise = fnlCreateState(settings[0]);
noise.frequency = settings[1];
noise.noise_type = settings[2];
noise.rotation_type_3d = settings[3];
noise.fractal_type = 1;
noise.octaves = settings[5];
noise.lacunarity = settings[6];
noise.gain = settings[7];
noise.weighted_strength = settings[8];
noise.ping_pong_strength = settings[9];
noise.cellular_distance_func = 1;
noise.cellular_return_type = settings[11];
noise.cellular_jitter_mod = settings[12];
noise.domain_warp_amp = settings[13];
noise.domain_warp_type = settings[14];

float thisnoise = Remap(fnlGetNoise2D(noise,id.x % 256, id.x / 256), -1, 1, 0, 1);
Result[id.x] = float4(thisnoise, thisnoise, thisnoise, 1.0f);

@gaiastellar
Copy link
Author

hi, further testing: i created a switch statement that created a new noise instance and hardcoded the fractal type, depending on what the fractal type was, which worked until i tried to then pass a variable into the switch statement to decide which new instance to create, and then it threw the same error. I'll post the code below if i explained that badly. Im wondering if there is something in the fractal algorithm and distance function that has to be precompiled and cant be changed once set?
the only work around i can think of that might work is to create a seperate compute shader pragma for each fractal and distance function choice, thus each choice is precompiled.

fnl_state CustomfnlCreateState(float settings[15], int fractal)
{

switch (fractal)
{
case 0:
    fnl_state newState0;
    newState0.seed = settings[0];
    newState0.frequency = settings[1];
    newState0.noise_type = (int)settings[2];
    newState0.rotation_type_3d = (int)settings[3];
    newState0.fractal_type = 0;
    newState0.octaves = settings[5];
    newState0.lacunarity = settings[6];
    newState0.gain = settings[7];
    newState0.weighted_strength = settings[8];
    newState0.ping_pong_strength = settings[9];
    newState0.cellular_distance_func = 0;
    newState0.cellular_return_type = settings[11];
    newState0.cellular_jitter_mod = settings[12];
    newState0.domain_warp_amp = settings[13];
    newState0.domain_warp_type = settings[14];
    return newState0;

case 1:
    fnl_state newState1;
    newState1.seed = settings[0];
    newState1.frequency = settings[1];
    newState1.noise_type = (int)settings[2];
    newState1.rotation_type_3d = (int)settings[3];
    newState1.fractal_type = 1;
    newState1.octaves = settings[5];
    newState1.lacunarity = settings[6];
    newState1.gain = settings[7];
    newState1.weighted_strength = settings[8];
    newState1.ping_pong_strength = settings[9];
    newState1.cellular_distance_func = 0;
    newState1.cellular_return_type = settings[11];
    newState1.cellular_jitter_mod = settings[12];
    newState1.domain_warp_amp = settings[13];
    newState1.domain_warp_type = settings[14];
    return newState1;

case 2:
    fnl_state newState2;
    newState2.seed = settings[0];
    newState2.frequency = settings[1];
    newState2.noise_type = (int)settings[2];
    newState2.rotation_type_3d = (int)settings[3];
    newState2.fractal_type = 2;
    newState2.octaves = settings[5];
    newState2.lacunarity = settings[6];
    newState2.gain = settings[7];
    newState2.weighted_strength = settings[8];
    newState2.ping_pong_strength = settings[9];
    newState2.cellular_distance_func = 0;
    newState2.cellular_return_type = settings[11];
    newState2.cellular_jitter_mod = settings[12];
    newState2.domain_warp_amp = settings[13];
    newState2.domain_warp_type = settings[14];
    return newState2;

case 3:
    fnl_state newState3;
    newState3.seed = settings[0];
    newState3.frequency = settings[1];
    newState3.noise_type = (int)settings[2];
    newState3.rotation_type_3d = (int)settings[3];
    newState3.fractal_type = 3;
    newState3.octaves = settings[5];
    newState3.lacunarity = settings[6];
    newState3.gain = settings[7];
    newState3.weighted_strength = settings[8];
    newState3.ping_pong_strength = settings[9];
    newState3.cellular_distance_func = 0;
    newState3.cellular_return_type = settings[11];
    newState3.cellular_jitter_mod = settings[12];
    newState3.domain_warp_amp = settings[13];
    newState3.domain_warp_type = settings[14];
    return newState3;

default:
    fnl_state newState4;
    newState4.seed = 1337;
    newState4.frequency = 0.01f;
    newState4.noise_type = 1;
    newState4.rotation_type_3d = 1;
    newState4.fractal_type = 1;
    newState4.octaves = 3;
    newState4.lacunarity = 2.0f;
    newState4.gain = 0.5f;
    newState4.weighted_strength = 0.0f;
    newState4.ping_pong_strength = 2.0f;
    newState4.cellular_distance_func = 1;
    newState4.cellular_return_type = 1;
    newState4.cellular_jitter_mod = 1.0f;
    newState4.domain_warp_amp = 30.0f;
    newState4.domain_warp_type = 1;
    return newState4;
}

@Auburn Auburn added the HLSL label Jan 24, 2021
@FHomps
Copy link

FHomps commented Apr 8, 2021

Same issue here. I have tried refactoring Fastnoise to use two arrays of ints and floats as parameters instead of a state struct to no avail. Really weird stuff.

@FHomps
Copy link

FHomps commented Apr 9, 2021

Alright, took me a while but I found a workaround that doesn't require rewriting the entire library. Here's how to use this shader with Unity with the added benefits of it compiling. Buckle up, it's a journey.

A bit of context first; for my personal use case, I am using a Unity compute shader to fill up a RenderTexture with values from this shader; this texture is then passed around to a few other compute shaders for them to change values for a mesh I'm rendering, all without leaving the GPU.

First problem: Unity's compute shaders apparently don't support structs as parameters, at all. Not entirely sure about the reason but since it's Unity, it's probably a bad one.
So, first thing to do was change the library from using its state structure to two arrays of respectively ints and floats that I pass as two parameters. It was either that or change every function that uses state to pass every single individual argument, and I don't have time for this.

Now, I have two arrays which make up the state of my noise generation in my compute shader. Using Unity's SetInts() and SetFloats() to fill them up sounds good, right? Wrong, as of today they're bugged to hell and back and won't pass your values properly to anything that's not an array of int4s or float4s [EDIT: It has come to my attention that what I thought was a bug is in fact the documented padding requirements of SetInts() and SetFloats(). My bad]. So, pass int / float values individually with SetInt() and SetFloat() (or use a structured buffer, I haven't tried yet but it should work), then fill up your state arrays in your shader calls. It's inefficient, sure, but you're using a library with runtime nested switches in a shader, so you're probably not hung up on efficiency anyway.

Then comes the error that started this issue ; what is a block output and why is there no storage type for it? After some trial and error I also narrowed it down to switches - for some reason the switches for noise type and cellular distance functions just don't want to compile; hardcoding the return values instead worked. Looking at the HLSL documentation, Microsoft states that "the compiler may use a hardware switch or emit a series of if statements", probably depending on driver implementation.

Little-known feature of HLSL (at least, to me), you can pass attributes to switches to force a certain optimization behavior. Tried it with [flatten] or [branch] on the offending switches and bingo, the shader compiles. Now with [forcecase], which forces a hardware switch, and we get our errors again. So, my guess is there is some kind of bug in my driver which makes it have trouble organizing the switch instruction blocks - for info, I have an Intel HD 620 with up-to-date drivers, so I am not some kind of rare homebrew graphics card user.

Now, I won't make a pull request for this since it's kind of a dirty hack and removes the state struct, but attached are the fixed library file and my compute shader as an example.
shaders.zip

@Rover656
Copy link
Contributor

Rover656 commented Apr 9, 2021

Thank you for looking into this, when I get the time I'll sit down and look into a way of reworking the hlsl port taking into account these things you have found, and attempt to implement them in a hackless, or at least less-hacky, manner.

@FHomps
Copy link

FHomps commented Apr 9, 2021

I'm thinking the best way of doing this is to create a separate, Unity-optimized variant of FNL (with a .cginc extension while we're at it) that:

  • does away with state and simply uses multiple variables, this should also help with the redundancy I've seen in the internal functions which really don't need all of the state
  • gets rid of all the enum variables and switches, replacing them with preprocessor directives, making use of Unity's local keyword multicompilation; this will solve the original issue while probably improving performance quite a bit.

I'm going to try my hand at this, I will get back to you soon.

@FHomps
Copy link

FHomps commented Apr 10, 2021

A few hours of investigating later...
Multicompiling / using shader variants would be much trickier than expected. Bruteforcing all enum combinations into different compiled shaders results in a whopping 9072 different shader variants, which would make build times very, very long. Multiple options here:

  • Optimizing the shader variants by removing impossible combinations; what comes to mind is making all the different cellular options into different fnl_noise_type options (FNL_NOISE_CELL_EUC_DIST, FNL_NOISE_CELL_EUC_DIST2, etc.). Obviously impractical but effective; the traditional user interface would be implemented on the c# side, probably with an EditorGUI wrapper like the OG FastNoise did.
  • Putting the domain warping in a different shader called separately by the c# stuff
  • Keeping switches for most things and switching to shader variants only for the switches Unity has trouble with, which would make the API inconsistent (a mix of SetInt and EnableKeyword on the c# side)
  • Keeping switches for everything and using the switch attributes I mentioned before to make Unity happy.

After thinking about it for a while, I'm getting increasingly convinced by the last solution. Not only is it much faster to implement, but I feel like FNL's philosophy is more of a playing ground for noise testing than something meant for very efficient noise production (which would probably just use hardcoded function choices).

@gaiastellar
Copy link
Author

hi, FHomps et al, did anyone manage to make any progress on this? i'm afraid its too complex for my level of shader language to fix myself, but i still need a HLSL noise library for unity, if anyone has a version of this library that doesnt piss unity off...

@FHomps
Copy link

FHomps commented Feb 11, 2022

Hi @gaiastellar,
I haven't worked on this in a while; if memory serves, as mentioned before you can put [flatten] or [branch] before the switches that cause trouble to make the lib work with Unity. It's not an optimal solution, but it works (and was good enough for me on the project I was working on at the time).

@Akeyn
Copy link

Akeyn commented Sep 3, 2022

@Rover656 @FHomps @gaiastellar Hi all, I have the same error, after reading the discussion, I added changes to the FastNoiseLite.hlsl library and error disappeared
fix_hlsl_port
fix_hlsl_port_2
fix_hlsl_port_3
fix_hlsl_port_4
fix_hlsl_port_5

@Auburn
Copy link
Owner

Auburn commented Sep 8, 2022

This would be great if it does fix the issue, I don't really know much about HLSL to verify or test if this is the correct fix. @Rover656 what are your thoughts on the fix?

@Rover656
Copy link
Contributor

Rover656 commented Sep 8, 2022

I'm not familiar with the deeper details of the HLSL compiler, I can try and get a unity app up and running to see for myself but if it works I'm not opposed to it being brought in. From some basic research all this is doing is reducing the amount of branching in the compiled output, which I guess is helping with this odd storage error.

@reveriejake
Copy link
Contributor

@Akeyn Thanks! That worked for me 👍

@Auburn
Copy link
Owner

Auburn commented Nov 10, 2022

I'm happy to take this change if someone wants to PR it

@Auburn
Copy link
Owner

Auburn commented Nov 11, 2022

This should now be fixed with 95900f7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants