Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sm lights branching #255

Merged
merged 23 commits into from Jan 6, 2022
Merged

Conversation

dyunchik
Copy link
Contributor

@dyunchik dyunchik commented Jan 3, 2022

SM lights branching technique decreases possible shader variants for point and spot shadow-casting lights. The technique is very suitable when you have many regular (point or spot) shadow-casting lights in the scene and their combination may change in different scene locations. Default inline technique requires new shaders for every possible shadow-casting lights combination.
SM lights branching technique creates two separate loops in PS shader - one for point shadow-casting lights and one for spot lights. Make sure your lights are ordered and grouped correctly(point lights first, then - spots) if you have your own lights management.
Current implementation works for Metal, DirectX and Vulkan render systems. Default inline technique is used for OpenGL/OpenGLES.

To enable SM lights branching technique use Ogre::HlmsPbs::setMaxShadowMapLights() method before any PBS materials creation.
I've modified Sample_ShadowMapFromCode for demo and debug purposes. Use USE_STATIC_BRANCHING_FOR_SHADOWMAP_LIGHTS macro to turn on/off the technique.

@darksylinc
Copy link
Member

Great works! Thanks!

Two things:

  1. Don't worry about the merge conflicts. I'll see what I can do about that (our warning fixes + C++11-ification + clang format pretty much broke everything merging 2.3 -> master)
  2. Wouldn't it be better to use the array samplers on all APIs? I see on OpenGL and Vulkan you went for:
	layout( ogre_t@value(texShadowMap0) ) uniform @insertpiece( TEXTURE2DSHADOW ) texShadowMap[@value(hlms_num_shadow_map_textures)];
@else
	@foreach( hlms_num_shadow_map_textures, n )
		uniform @insertpiece( TEXTURE2DSHADOW ) texShadowMap@n;@end
@end

But on HLSL and Metal you kept the Texture2D<float> texShadowMap@n. IMO it would be more consistent if all APIs follow the same path; btw the Metal syntax for arrays is a bit awkward: array< texture2d<float>, num_textures > texShadowMap [[texture(slot)]].

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 3, 2022

  1. Wouldn't it be better to use the array samplers on all APIs?

I thought about making more consistent shaders. But, I'm not sure about efficiency using the same approach with TEXTURE2DSHADOW arrays for HLSL and Metal. When I get this solution integrated in our project I'll be able to test real performance on huge scene. Approach for glslvk uses two array lookups: first gets texture index from shadowmap index and second gets real sampler from texture index. Approach for HLSL and Metal uses just one array lookup to get TEXTURE2DSHADOW. So, I think we can start with current implementation and then test on real huge scenes both approaches.

@darksylinc
Copy link
Member

OHHHHH know I understand how you approach it! You solved the problem with an extra level of indirection!

AFAIK when it comes to shaders both approaches should give the same level of performance since they should theoretically compile to the same code (either unrolled or not).

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 4, 2022

OHHHHH know I understand how you approach it! You solved the problem with an extra level of indirection!

AFAIK when it comes to shaders both approaches should give the same level of performance since they should theoretically compile to the same code (either unrolled or not).

Hm, I tried to rewrite hlsl shader in the same way as glslvk and stuck with shader compiler error:
OGRE EXCEPTION(-2147467259:RenderingAPIException): Cannot compile D3D11 high-level shader 100000003PixelShader_ps Errors:
C:\Work\Projects\orge-next\build\bin\Debug\100000003PixelShader_ps.hlsl(1522,2-52): error X3511: forced to unroll loop, but unrolling failed.

Also tried all possible 'for' statement attributes (unroll, loop, fastopt) - no way. Very strange, I've gave up and leave first approach with single indirection.

Nevertherless, it looks like I managed to get it works for OpenGL.

@darksylinc
Copy link
Member

I've pushed a few fixes:

  • ShadowMapFromCode was broken when USE_STATIC_BRANCHING_FOR_SHADOWMAP_LIGHTS wasn't defined
  • All APIs are now using the same path. I didn't get the unroll error you mention
    • And... now I get it. I'll try to fix it.
  • I fixed the array syntax which was unreadable. Fortunately we had macros to deal with it :)

There is an error I've only seen it manifesting on (Linux) Vulkan yet: very infrequently at random ShadowMapFromCode will render differently (not everything is green and the pattern changes on multiple runs, when it's broken).

I'm not sure what's causing it, I suspect either an uninitialized variable or std::sort running on pointers and the order of lights is causing it.

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 4, 2022

  • And... now I get it. I'll try to fix it.

I suspect we use old shader model, probably SM2.0 for 9.x compatibility.
we can try this pragma:
#pragma exclude_renderers d3d11_9x

also we can suggest compiler unroll
@property( syntax == hlsl )[unroll(@value(hlms_lights_spot)-@value(hlms_lights_directional_non_caster))]@end

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 4, 2022

https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/specifying-compiler-targets?redirectedfrom=MSDN

When you use the *_4_0_level_9_x HLSL shader profiles, you implicitly use of the Shader Model 2.x profiles to support Direct3D 9 capable hardware. Shader Model 2.x profiles support more limited flow control behavior than the Shader Model 4.x and later profiles.

@darksylinc
Copy link
Member

I've been trying to fix the "forced to unroll" without success (we may just be forced to use your approach on D3D11).

also we can suggest compiler unroll
@Property( syntax == hlsl )[unroll(@value(hlms_lights_spot)-@value(hlms_lights_directional_non_caster))]@EnD

For the unroll to work we'd need to provide an upper bound e.g. for( ; light_idx<sm_point_lights_end || light_idx < @value( hlms_lights_point ); light_idx++ ) but it doesn't seem to be working (maybe I need to combine it with unroll like you said).

But I also stumbled with the issue that hlms_lights_point is unset to avoid variants from exploding.

#pragma exclude_renderers d3d11_9x

I tried the pragma but got:

warning X3568: 'exclude_renderers' : unknown pragma ignored

Anyway, we should be compiling for PS 5.0 unless the FL is lower.

My main confusion is that it does work fine on WINE. The thing is, WINE uses the MS compiler (e.g. d3dcompiler_47.dll and co.), so if it fails on Windows it should be failing to compile on Wine, unless our the exes are linking against different compiler versions

@darksylinc
Copy link
Member

AH!!!!!!! Mystery solved!!!

It works on Release, fails on Debug. I suspect after the optimizer, the compiler is able to realize it doesn't need to unroll the loop.

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 4, 2022

Indeed, hlms_lights_point was unset to avoid shader variants explosion. So we can't use it in this technique.

@darksylinc
Copy link
Member

I took a look at the generated assembly.

The optimizer is just able to realize it can remove the indirection and replace it with texShadowMap[1] for all point & spot lights, which is why it works.

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 4, 2022

I took a look at the generated assembly.

The optimizer is just able to realize it can remove the indirection and replace it with texShadowMap[1] for all point & spot lights, which is why it works.

The code should work when texShadowMap[0] is used for point & spot as well.
I've changed shadownode setup for testing purposes. But generally there may be more than two shadow map textures.

@darksylinc
Copy link
Member

darksylinc commented Jan 4, 2022

The code should work when texShadowMap[0] is used for point & spot as well.
I've changed shadownode setup for testing purposes. But generally there may be more than two shadow map textures.

Yes I know, I was just speaking out loud on why it was randomly working.

Which got me thinking if your alternative version was working because even debugger was able to realize that. So I decided to add an extra shadow map inserting a shadowParam.atlasId = 2; on the 3rd light and now I got a new compiler error:

Vulkan GLSL compiler error in 100000011PixelShader_ps:
ERROR: 0:1218: 'hlms_shadowmap3_uv_length' : undeclared identifier 
ERROR: 0:1219: '' : compilation terminated 
ERROR: 2 compilation errors.  No code generated.

I'm still into it

Update: That hlms_shadowmap3_uv_length error is fixed

Some shadow map setups would have inconsistent coverage of the UV atlas.
When useStaticBranchShadowMapLights is on, we need
hlms_shadowmapN_uv_length to always be defined
@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 4, 2022

I've just got the latest changes and got crash on Metal:

Metal SL Compiler Error in 100000003PixelShader_ps: program_source:957:4: error: expected ')' array<depth2d<float>, 2> ^

I can fix it, I'm on Mac right now.
Looks like syntax error in hlms

@darksylinc
Copy link
Member

OK just as I suspected: I tried your version of doing (in HLSL):

const Texture2D<float> shadowmapIdx[7] =
	OGRE_ARRAY_START( Texture2D<float> )
		hlms_shadowmap0
		
		, hlms_shadowmap1
		, hlms_shadowmap2
		, hlms_shadowmap3
		, hlms_shadowmap4
		, hlms_shadowmap5
		, hlms_shadowmap6
	OGRE_ARRAY_END;

This works on Debug & Release BUT turns out the compiler is just being smarter and doesn't need the optimizer to realize point & spot always use texShadowMap[1]

If I add one more atlas:

			// Third light, directional, spot or point
#ifdef USE_STATIC_BRANCHING_FOR_SHADOWMAP_LIGHTS
			shadowParam.atlasId = 2; // ----> New
#endif
#ifdef USE_STATIC_BRANCHING_FOR_SHADOWMAP_LIGHTS
            shadowParam.atlasStart[0].y = 2048u;
#else
			shadowParam.atlasStart[0].y = 2048u + 1024u + 2048u;
#endif
            shadowParams.push_back( shadowParam );

The unroll errors come back.

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 4, 2022

I've fixed syntax error in Metal shader

@darksylinc
Copy link
Member

I'll be taking a short break.

I reviewed my notes, it's clear we need to force-unroll on D3D11.

If there's up to 40 lights; we need to unroll 40 point and 40 spot.

We can mitigate by putting an upper limit: Up to 20 point and up to 20 spot. Thus we can unroll 20 point and 20 point, while having 40 lights.

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 4, 2022

Ok, we can add loop upper constant limit:

&& light_idx<@value(hlms_lights_spot)

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 4, 2022

Or even better:
&& cur_shadow_map<@value(hlms_num_shadow_map_lights)

Buy I'm not sure it'll help

@darksylinc
Copy link
Member

OK it looks like FXC can't unroll because apparently cur_shadow_map is a divergency. I've manually unrolled the code and it still complains!

This can be compiled with fxc.exe /T ps_5_0 /O3 /E main Test.hlsl:

Texture2D<float> myTex[8];
SamplerState mySampler;

uniform int maxValues;

struct PSInput
{
	float4 color : COLOR;
};

float4 main(PSInput input) : SV_TARGET
{
	float4 retVal = float4(0,0,0,0);

	int i=0;
	int cur_shadow_map = 0;
	if( i < maxValues )
	{
		retVal += myTex[cur_shadow_map].Sample( mySampler, input.color.xy ); // OK
		++cur_shadow_map;
	}
	++i;
	if( i < maxValues )
	{
		retVal += myTex[cur_shadow_map].Sample( mySampler, input.color.xy ); //error X3512: sampler array index must be a literal expression
		++cur_shadow_map;
	}

	return retVal;
}

It looks like FXC didn't like that previous "++cur_shadow_map;" happening inside the branch.
If we move cur_shadow_map outside:

	int i=0;
	int cur_shadow_map = 0;
	if( i < maxValues )
	{
		retVal += myTex[cur_shadow_map].Sample( mySampler, input.color.xy ); // OK
	}
	++cur_shadow_map;
	++i;
	if( i < maxValues )
	{
		retVal += myTex[cur_shadow_map].Sample( mySampler, input.color.xy ); //OK
		++cur_shadow_map;
	}

Now it works.

This is a valid solution (because if i < maxValues is false, then i + 1 < maxValues is definitely false).
But the problem is when we later evaluate spotlights:

	// Point lights
	int i=0;
	int cur_shadow_map = 0;
	if( i < maxValues )
	{
		retVal += myTex[cur_shadow_map].Sample( mySampler, input.color.xy ); // OK
	}
	++cur_shadow_map;
	++i;
	if( i < maxValues )
	{
		retVal += myTex[cur_shadow_map].Sample( mySampler, input.color.xy ); //OK
		++cur_shadow_map;
	}

	// Now spotlights
	cur_shadow_map = ....;	// Umm what goes here? This is dynamic, but DX11 asks the value to be
				// a literal; it can't be passed as a uniform
	if( i < maxValues )
	{
		retVal += myTex[cur_shadow_map].Sample( mySampler, input.color.xy );
	}
	++cur_shadow_map;
	++i;
	if( i < maxValues )
	{
		retVal += myTex[cur_shadow_map].Sample( mySampler, input.color.xy ); //OK
		++cur_shadow_map;
	}

The only solutions to this problem I can see is:

  1. Add restrictions. All pointlights must use the same atlas, all spotlights must use the same atlas. The atlas being used by spot can be different from point ones.
  2. Restrict shadow maps to one type. Example 20 shadow maps exclusively for point and 20 exclusively for spot. If user has 10 point lights and 30 spot lights, only 20 spot lights can be rendered correctly

With the 2nd option the code would be like this:

// Point lights
int cur_shadow_map = 0;
for( int i=0; i<8; ++i )
{
	if( i < maxValues )
	{
		retVal += myTex[cur_shadow_map].Sample( mySampler, input.color.xy );
	}
	++cur_shadow_map;
}

// Spot lights
cur_shadow_map = 4; // Spotlight start
for( int i=0; i<8; ++i )
{
	if( i < maxValues )
	{
		retVal += myTex[cur_shadow_map].Sample( mySampler, input.color.xy );
	}
	++cur_shadow_map;
}

Which will be considered valid code by FXC.

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 5, 2022

I think we can stay with one atlas for points and spots. Modern graphics cards allow to have very huge textures up to 16Kx16K. If we have 2Kx2K per light then we can support up to 64 shadow-casting lights - this is enough big amount. So we can relay on texShadowMap[0] for Directional light and texShadowMap[1] for points and spots

@darksylinc
Copy link
Member

That's true! The only issue I can see is static shadow maps, as it's often better to separate them into two atlases (because we can only clear the whole atlas, partial clears are often inefficient).

But then I remembered we support texture2darrays! I don't think we tested shadowmaps on texture2darrays so it may need fixing a few minor bugs. But other than that it should work.

If we add this restriction it may also be worth wondering: do we even need to place the shadow map in an array of textures? (Likely not)

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 5, 2022

We use separate atlases for dir light and regular(point/spot) lights in our project Live Home 3D. And we have custom shadow node creation and light management. To clear each sm-rect in the regular lights atlas we use quad pass with trivial custom material that clears depth:

if(needClearWholeAtlas)                    
{//we clear whole atlas only first one - for directional light, all other atlases serve for static spot & point lights

CompositorPassDef *passDef = targetDef->addPass( PASS_CLEAR );
CompositorPassClearDef *passClear = static_cast<CompositorPassClearDef*>( passDef );
passClear->setAllClearColours(clearColour);
passClear->mClearDepth = 1.0f;
passClear->mShadowMapIdx = shadowMapIdx;
passClear->mShadowMapFullViewport = atlasId==0;
                    
}else if(needSeparateClear)                    
{//clear the rect in the atlas with special quad pass

CompositorPassDef *passDef = targetDef->addPass( PASS_QUAD );
CompositorPassQuadDef *passClearQuad = static_cast<CompositorPassQuadDef*>( passDef );
passClearQuad->setAllLoadActions( LoadAction::Load );
passClearQuad->setAllClearColours( clearColour );
passClearQuad->mClearDepth = 1.0;
passClearQuad->mMaterialIsHlms = false;
passClearQuad->mMaterialName = "Ogre/Depth/Clear";
passClearQuad->mShadowMapIdx = shadowMapIdx;
passClearQuad->mShadowMapFullViewport = false;                   

}

So, our project is really ready for two atlases :)

Remove array of textures for texShadowMap now that these restrictions
are in place
@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 5, 2022

If we add this restriction it may also be worth wondering: do we even need to place the shadow map in an array of textures? (Likely not)

Sure we can avoid using arrays. I think it would be helpful to inform shader, using new properties, which texShadowMap we have to use for point&spot shadow casting lights. This will give us more freedom. For example if we stay with one texShadowMap0 we can pass 0, and pass 1 to use texShadowMap1, etc. So, the only limit - all points&spot lights should use one texture.
Approach with texture2darrays will require more development and I don't know if we can render to texture in texture2darrays to get depth.

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 5, 2022

Aha, I see your latest commit - nice job! Ignore my previous comment :)

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 5, 2022

I think this final solution is even much better. It simply does not allow to grab too much vram for shadow maps and solves main problem - reduces shaders variants ;)

Samples/2.0/ApiUsage/ShadowMapFromCode/ShadowMapFromCodeGameState.h
@darksylinc
Copy link
Member

I've merged your branch with master.

It now has no conflicts :)

There is one more thing I see: setMaxShadowMapLights asks for a uint16 but it is only ever used as a boolean i.e. if( getMaxShadowMapLights() > 0 )

I think this makes sense once we realize it's just a boolean settings and the max number of lights is determined by the active shadow node.

My only question is whether we are correctly handling everything i.e. what happens if:

  • The shadow node supports 5 lights but only 4 or less are used
  • The shadow node supports 5 lights but 6 or more are used
  • Lights are added/removed from the scene at runtime

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 5, 2022

I've merged your branch with master.

It now has no conflicts :)

Great! Tomorrow I'll get it in our Live Home 3D project and I'll test the technique in real use cases.

There is one more thing I see: setMaxShadowMapLights asks for a uint16 but it is only ever used as a boolean i.e. if( getMaxShadowMapLights() > 0 )

I think this makes sense once we realize it's just a boolean settings and the max number of lights is determined by the active shadow node.

I reserved MaxShadowMapLights as uint16 by intuition for possible future use.

My only question is whether we are correctly handling everything i.e. what happens if:

  • The shadow node supports 5 lights but only 4 or less are used
  • The shadow node supports 5 lights but 6 or more are used
  • Lights are added/removed from the scene at runtime

I think the technique is quite safe. We collect lights by iterating through active shadow maps. So in first case(The shadow node supports 5 lights but only 4 or less are used) we'll get just another one shader with 4 lights only. The second case will produce 5 shadow casting-lights and 6th light will be skipped.
The third case should be handled by client custom code to manage lights adding/removing. For example in our Live Home 3D project we track static lights attach/detach and assign them to shadow maps using setLightFixedToShadowMap(). non static lights are managed by Ogre, I don't see any problem there.

@darksylinc darksylinc merged commit 6eb0d72 into OGRECave:master Jan 6, 2022
@darksylinc
Copy link
Member

I think the technique is quite safe. We collect lights by iterating through active shadow maps. So in first case(The shadow node supports 5 lights but only 4 or less are used) we'll get just another one shader with 4 lights only. The second case will produce 5 shadow casting-lights and 6th light will be skipped.
The third case should be handled by client custom code to manage lights adding/removing. For example in our Live Home 3D project we track static lights attach/detach and assign them to shadow maps using setLightFixedToShadowMap(). non static lights are managed by Ogre, I don't see any problem there.

I see! So your main problem was the combinatorial explosion caused by point & spot lights changing even though the max number of lights stayed the same.

Yes, you could bring it down even more if Hlms always produces max N lights (and avoids i.e. crashing due to nullptr lights), but as long as you're ok. Great!

I reserved MaxShadowMapLights as uint16 by intuition for possible future use.

I had to revert that to a boolean to avoid confusion and YAGNI.

I actually thought about it: The max number of lights is strongly tied with the active shadow node. Thus that value should be auto-calculated (cached in the shadow node?) when starting the pass.

This is different from setMaxNonCasterDirectionalLights because there could be an arbitrary number of directional lights created regardless of the active shadow node.

Anyway, your PR has been merged.

Thank you for your contribution!

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 7, 2022

Thank you!

@darksylinc
Copy link
Member

darksylinc commented Jan 7, 2022

Btw I ran into the Vulkan issue again and I was able to fix it. Update: The fix was incomplete. Now it's done.

@dyunchik
Copy link
Contributor Author

dyunchik commented Jan 8, 2022

Btw I ran into the Vulkan issue again and I was able to fix it. Update: The fix was incomplete. Now it's done.

Great!
Is that what you saw earlier?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants