Skip to content


Subversion checkout URL

You can clone with
Download ZIP


Any plan to reduce register usage? #17

racingpht opened this Issue · 3 comments

2 participants


Hello, we have some complex shader(deferred lighting process) that fails to compile because of the 9th temporary register was used. My work mate was trying to name the register like vec4 temp0, temp1, .., temp7 and reuse them in GLSL, but without a success.

The same shader logic can work in AGAL assembly.

AGAL has a very restrictive temporary register set and weak instructions(no modifiers, for example), but better instruction count limit than ps_2_0, I think a trade off between instruction count and register count should be acceptable. Is there any plans to optimize register further? Would be very beneficial to quick development.



I made some changes recently that should have reduced register usage, if you still have a shader that has problems can you share the code, or a cut down version of the shader that has the same issue so I can investigate it?

@alexmac alexmac was assigned

Hi Alex, here's the sample code that doesn't compile with latest swc.

#define float4 vec4
#define float3 vec3
#define float2 vec2
#define frac fract

uniform mat4 g_ViewMatrixInv;
uniform mat4 g_ShadowMapMatrix;

uniform float4 g_CameraPosition;

uniform float4 g_LightDirection;

uniform float4 g_LightColor;

uniform float4 g_AmbientSHR;

uniform float4 g_AmbientSHG;

uniform float4 g_AmbientSHB;

uniform float4 g_ParametersPS;
uniform sampler2D gbuffer0;

uniform sampler2D gbuffer1;

uniform sampler2D shadowmap;
uniform sampler2D randommap;

uniform float4 g_DecodeShadowDepthValues;
uniform float4 g_DecodeGBufferParameters;
uniform float4 g_TexCoordHelpers;
uniform float4 g_Helpers;

#define positionES tempD

void main()
// First we read the gbuffer, then we decode it since is attributes are packed tight
float4 gb0= texture2D(gbuffer0, gl_TexCoord[0].xy);
float4 gb1= texture2D(gbuffer1, gl_TexCoord[0].xy);
float4 tempA = texture2D(shadowmap, gl_TexCoord[0].xy);

//float shadowValue = tempA.x;//dot(tempA.xyzw, tempA.xyzw);

float4 tempB;

// GBuffer attributes to decode
float4 albedo;
float4 normalWS;

float4 positionWS;
//float specularPower = 32;
//float specularFactor = 1;

// Lets decode the normal in World Space = * - g_DecodeGBufferParameters.yyy;
normalWS.w= g_DecodeGBufferParameters.y;
// Lets decode the Albedo and specular attributes

// The Albedo is packed using the lower 6 bits of rgb gbuffer1 channel

// The specular factor and power use the higher 2 bits of gbuffer1 rgb channels
// The specular factor is split between rg and the power is on the blue channel
albedo.rgb= fract(gb1.rgb * g_DecodeGBufferParameters.www);

//float3 specAux = tempA.rgb - albedo.rgb;

//specularPower = specAux.r + specAux.g * g_Helpers.w;
//specularFactor = specAux.b;

// Lets reconstruct the fragment world position using the encoded depth
tempA.w = gb0.w + gb1.w * g_DecodeShadowDepthValues.y;

tempB.x = gl_TexCoord[1].x * tempA.w;

tempB.y = gl_TexCoord[1].y * tempA.w;

tempB.z = gl_TexCoord[1].z * tempA.w;

tempB.w = g_DecodeGBufferParameters.y;

positionWS = tempB * g_ViewMatrixInv;

// Calculate Shadows
tempB= positionWS * g_ShadowMapMatrix;
//tempA.xy = texture2D(randommap, gl_TexCoord[0].zw).xy *;

tempB.xy = (tempB.xy / tempB.w) * g_TexCoordHelpers.xy + g_TexCoordHelpers.xx;// + tempA.xy;

float4 sm = texture2D(shadowmap, tempB.xy);
float smDepth = dot(,;
float shadowValue = 1;//smDepth - tempB.z;//smDepth;

if(smDepth < tempB.z) shadowValue = 0;

// Now that we have all the GBuffer attributes decoded now lets do the lighting and shading stuff

float NdotL = dot(,;

NdotL *= shadowValue;

// Accumulate lighting = NdotL *;

tempA.x+= dot(normalWS, g_AmbientSHR);

tempA.y+= dot(normalWS, g_AmbientSHG);

tempA.z+= dot(normalWS, g_AmbientSHB);

float3 finalColor;
finalColor.rgb = albedo.rgb *;

// Calculate the eye and half vector
tempA.w= dot(normalize(normalize( - +,;

gl_FragColor = finalColor.rgbg + vec4(pow(tempA.w, 32));


As a temporary workaround you can replace this:

tempA.x+= dot(normalWS, g_AmbientSHR);
tempA.y+= dot(normalWS, g_AmbientSHG);
tempA.z+= dot(normalWS, g_AmbientSHB);

with: += vec3(
dot(normalWS, g_AmbientSHR),
dot(normalWS, g_AmbientSHG),
dot(normalWS, g_AmbientSHB));

But some work on the optimizer should be able to fix this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.