Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r300 FP: Compiler Error: r300_fragprog_emit.c::emit_alu(): Too many ALU instructions on Radeon X1050 (RV370) #344

Closed
illwieckz opened this issue Jul 7, 2020 · 8 comments

Comments

@illwieckz
Copy link
Member

illwieckz commented Jul 7, 2020

The Unvanquished download page states the minimal requirement on GPU side is an OpenGL 2.1 GPU.

I've driven some tests to check that, using the to-be-released 0.52 engine. I tested both the Intel GMA965 X3100 (integrated in Core 2 Duo L7500 CPU) and the ATI X1050 (discrete PCIe card). They're both from around the year 2006.

While the test succeeded with the Intel GMA964 chip (I had to reduce texture resolution though), the test failed miserably with the ATI X1050 card.

The ATI X1050 GPU uses the architecture that existed before TeraScale (papers may have named it “Unified Superscalar Shader Architecture” at the time).

I got this error in Dæmon's output:

r300 FP: Compiler Error:
../src/gallium/drivers/r300/compiler/r300_fragprog_emit.c::emit_alu(): Too many ALU instructions
Using a dummy shader instead.
r300 FP: Compiler Error:
build_loop_info: Cannot find condition for if
Using a dummy shader instead.

I've read interesting things about a similar issue on another project and about fixes of similar issues I have read that:

The number instructions in a shader is limited to 64 on r300 hardware,
the fade shader in StScrollViewFade was ending up using 97 instructions
which is way over the limit.

So refactor the shader to use less instructions by precomputing as many
values as possible outside of the conditionals. The resulting shader
ends up using 34 instructions which is well within the hardware limits.

I still don't understand why in this example doing this avoids the issue:

result1 = compute1()
result2 = compute2()
if (something) {
  result = result1;
}
else {
  result = result2;
}

but doing this would fail:

if (something) {
  result = compute1();
}
else {
  result = compute2();
}

So, I don't really know how to fix the issue myself.

Anyway, since Xonotic/DarkPlaces runs on that GPU there is no reason Unvanquished/Dæmon cannot. This issue keeps track of this bug if someone wants to hack on it.

In all case, we would want to catch properly the issue and at least return to main menu with a proper message. That's for another ticket (see #345).

On a side note, I experienced hangs with that GPU but maybe I just need to reduce texture resolution like I had to do with the Intel X3100 GPU, that's for another ticket (see #346).

@illwieckz
Copy link
Member Author

Note that the game starts properly and displays the menu properly, the compiler error occurs when loading maps.

@illwieckz illwieckz changed the title r300 FP: Compiler Error: r300_fragprog_emit.c::emit_alu(): Too many ALU instructions (ATI X1050) r300 FP: Compiler Error: r300_fragprog_emit.c::emit_alu(): Too many ALU instructions on Radeon X1050 (RV370) Jul 7, 2020
@illwieckz
Copy link
Member Author

illwieckz commented Jul 7, 2020

We may also want to print the faulty shader name on console and dump its code to make debugging easier.

@illwieckz
Copy link
Member Author

All of this seems to be printed by the driver itself:

r300 FP: Compiler Error:
../src/gallium/drivers/r300/compiler/r300_fragprog_emit.c::emit_alu(): Too many ALU instructions
Using a dummy shader instead.
r300 FP: Compiler Error:
build_loop_info: Cannot find condition for if
Using a dummy shader instead.

So I don't know how to intercept it.

@illwieckz
Copy link
Member Author

Unfortunately, the problem does not occur when compiling shader but when using them, so #354 is helpless to identify the shader that fails.

Also, when using Low (1:4) texture size, the computer does not hang (see #346), but Dæmon exits by itself after this error, but without any other information.

@illwieckz
Copy link
Member Author

More logs about this GPU:

SDL_Init( SDL_INIT_VIDEO )... 
Using SDL Version 2.0.10
SDL using driver "x11"
Initializing OpenGL display
Display aspect: 1.250
...setting mode -1:
 1280 720
^3Warn: SDL_GL_CreateContext failed: Could not create GL context: GLXBadFBConfig

Using 24 Color bits, 24 depth, 8 stencil display.
Using GLEW 2.1.0
Using enhanced (GL3) Renderer in GL 2.x mode...
Available modes: '1280x1024 640x480 800x600 1024x768 '
GL_RENDERER: ATI RV370
Detected graphics driver class 'integrated'
Detected graphics hardware class 'generic'
Initializing OpenGL extensions
...ignoring GL_ARB_debug_output
...found shading language version 120
...using GL_ARB_half_float_pixel
...using GL_ARB_texture_float
...GL_EXT_gpu_shader4 not found
...GL_EXT_texture_integer not found
...using GL_ARB_texture_rg
...GL_ARB_texture_gather not found
...using GL_EXT_texture_filter_anisotropic
...using GL_ARB_half_float_vertex
...using GL_ARB_framebuffer_object
...using GL_ARB_get_program_binary
...using GL_ARB_buffer_storage
...GL_ARB_uniform_buffer_object not found
...using GL_ARB_map_buffer_range
...using GL_ARB_sync

[…]

GL_VENDOR: X.Org R300 Project
GL_RENDERER: ATI RV370
GL_VERSION: 2.1 Mesa 20.0.8

[…]

^5Debug: GL_EXTENSIONS: GL_ARB_multisample GL_EXT_abgr GL_EXT_bgra GL_EXT_blend_color GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_copy_texture GL_EXT_subtexture GL_EXT_texture_object GL_EXT_vertex_array GL_EXT_compiled_vertex_array GL_EXT_texture GL_EXT_texture3D GL_IBM_rasterpos_clip GL_ARB_point_parameters GL_EXT_draw_range_elements GL_EXT_packed_pixels GL_EXT_point_parameters GL_EXT_rescale_normal GL_EXT_separate_specular_color GL_EXT_texture_edge_clamp GL_SGIS_generate_mipmap GL_SGIS_texture_border_clamp GL_SGIS_texture_edge_clamp GL_SGIS_texture_lod GL_ARB_multitexture GL_IBM_multimode_draw_arrays GL_IBM_texture_mirrored_repeat GL_ARB_texture_cube_map GL_ARB_texture_env_add GL_ARB_transpose_matrix GL_EXT_blend_func_separate GL_EXT_fog_coord GL_EXT_multi_draw_arrays GL_EXT_secondary_color GL_EXT_texture_env_add GL_EXT_texture_filter_anisotropic GL_EXT_texture_lod_bias GL_INGR_blend_func_separate GL_NV_blend_square GL_NV_light_max_exponent GL_NV_texgen_reflection GL_NV_texture_env_combine4 GL_S3_s3tc GL_SUN_multi_draw_arrays GL_ARB_texture_border_clamp GL_ARB_texture_compression GL_EXT_framebuffer_object GL_EXT_texture_compression_s3tc GL_EXT_texture_env_combine GL_EXT_texture_env_dot3 GL_MESA_window_pos GL_NV_packed_depth_stencil GL_NV_texture_rectangle GL_ARB_depth_texture GL_ARB_occlusion_query GL_ARB_shadow GL_ARB_texture_env_combine GL_ARB_texture_env_crossbar GL_ARB_texture_env_dot3 GL_ARB_texture_mirrored_repeat GL_ARB_window_pos GL_ATI_fragment_shader GL_EXT_stencil_two_side GL_EXT_texture_cube_map GL_NV_fog_distance GL_APPLE_packed_pixels GL_ARB_draw_buffers GL_ARB_fragment_program GL_ARB_fragment_shader GL_ARB_shader_objects GL_ARB_vertex_program GL_ARB_vertex_shader GL_ATI_draw_buffers GL_ATI_texture_env_combine3 GL_ATI_texture_float GL_EXT_shadow_funcs GL_EXT_stencil_wrap GL_MESA_pack_invert GL_MESA_ycbcr_texture GL_ARB_fragment_program_shadow GL_ARB_half_float_pixel GL_ARB_occlusion_query2 GL_ARB_point_sprite GL_ARB_shading_language_100 GL_ARB_sync GL_ARB_texture_non_power_of_two GL_ARB_vertex_buffer_object GL_ATI_blend_equation_separate GL_EXT_blend_equation_separate GL_OES_read_format GL_ARB_pixel_buffer_object GL_ARB_texture_float GL_ARB_texture_rectangle GL_EXT_pixel_buffer_object GL_EXT_texture_compression_dxt1 GL_EXT_texture_mirror_clamp GL_EXT_texture_rectangle GL_EXT_texture_sRGB GL_ARB_framebuffer_object GL_EXT_framebuffer_blit GL_EXT_framebuffer_multisample GL_EXT_packed_depth_stencil GL_ARB_vertex_array_object GL_ATI_separate_stencil GL_ATI_texture_mirror_once GL_EXT_gpu_program_parameters GL_EXT_texture_sRGB_decode GL_OES_EGL_image GL_ARB_copy_buffer GL_ARB_half_float_vertex GL_ARB_instanced_arrays GL_ARB_map_buffer_range GL_ARB_texture_rg GL_ARB_vertex_array_bgra GL_EXT_vertex_array_bgra GL_NV_conditional_render GL_ARB_ES2_compatibility GL_ARB_debug_output GL_ARB_draw_elements_base_vertex GL_ARB_explicit_attrib_location GL_ARB_fragment_coord_conventions GL_ARB_provoking_vertex GL_ARB_sampler_objects GL_EXT_provoking_vertex GL_EXT_texture_snorm GL_MESA_texture_signed_rgba GL_NV_texture_barrier GL_ARB_get_program_binary GL_ARB_robustness GL_ARB_separate_shader_objects GL_EXT_direct_state_access GL_ANGLE_texture_compression_dxt3 GL_ANGLE_texture_compression_dxt5 GL_ARB_compressed_texture_pixel_storage GL_ARB_internalformat_query GL_ARB_map_buffer_alignment GL_ARB_texture_storage GL_EXT_framebuffer_multisample_blit_scaled GL_AMD_shader_trinary_minmax GL_ARB_clear_buffer_object GL_ARB_explicit_uniform_location GL_ARB_invalidate_subdata GL_ARB_program_interface_query GL_ARB_vertex_attrib_binding GL_KHR_debug GL_KHR_texture_compression_astc_ldr GL_ARB_buffer_storage GL_ARB_internalformat_query2 GL_ARB_multi_bind GL_ARB_shading_language_include GL_ARB_texture_mirror_clamp_to_edge GL_ARB_clip_control GL_ARB_get_texture_sub_image GL_ARB_texture_barrier GL_KHR_context_flush_control GL_ARB_parallel_shader_compile GL_KHR_no_error GL_KHR_texture_compression_astc_sliced_3d GL_ARB_texture_filter_anisotropic GL_KHR_parallel_shader_compile GL_EXT_EGL_image_storage GL_EXT_texture_sRGB_R8 GL_EXT_EGL_sync 
^5Debug: GL_MAX_TEXTURE_SIZE: 2048
GL_SHADING_LANGUAGE_VERSION: 1.20
GL_MAX_VERTEX_UNIFORM_COMPONENTS 1024
^5Debug: GL_MAX_VERTEX_ATTRIBS 16
^5Debug: 64 occlusion query bits
^5Debug: GL_MAX_DRAW_BUFFERS: 4
^5Debug: GL_TEXTURE_MAX_ANISOTROPY_EXT: 16.000000
^5Debug: GL_MAX_RENDERBUFFER_SIZE: 2048
^5Debug: GL_MAX_COLOR_ATTACHMENTS: 4
^5Debug: 
PIXELFORMAT: color(24-bits) Z(24-bit) stencil(8-bits)
^5Debug: MODE: -1, 1280 x 720 windowed hz:
^5Debug: N/A
^5Debug: texturemode: GL_LINEAR_MIPMAP_NEAREST
^5Debug: picmip: 0
^5Debug: Using S3TC (DXTC) texture compression
Using GPU vertex skinning with max 41 bones in a single pass

[…]

^4ill^7wie^1ckz [granger]^* entered the game
r300 FP: Compiler Error:
../src/gallium/drivers/r300/compiler/r300_fragprog_emit.c::emit_alu(): Too many ALU instructions
Using a dummy shader instead.
r300 FP: Compiler Error:
build_loop_info: Cannot find condition for if
Using a dummy shader instead.

Then Dæmon exits without saying anything more.

On a side note I notice this R300 GPU is not recognized as R300 (see #352, #353 was not applied yet).

@illwieckz
Copy link
Member Author

We know hardware from that era like the Radeon X1950 PRO (RV570) can render the game as soon as the hardware supports a large enough amount of instructions (and make sure to not be too demanding on it, see #364). So, it would be very good to know which shader is at fault and see if we can make it without it or ifdef-out some code to reduce its size on older hardware.

@illwieckz
Copy link
Member Author

illwieckz commented Oct 4, 2020

As confirmed by this page: https://dri.freedesktop.org/wiki/R300ToDo/

the R300 has small ALUs:

Vertex shader limits

ALU Const Temp Alt.Temp
R300 256 256 32
R400 256 256 32
R500 1024 256 32

Fragment shader limits

ALU TEX Const Temp Samplers TEX Indirections
R300 64 32 32 32 16
R400 512 512 32 64 16
R500 512 512 256 128 16

It's small but we know Darkplaces/Xonotic runs on this hardware with the same constraints.

@illwieckz
Copy link
Member Author

illwieckz commented Oct 5, 2020

With much surprise, after having commented the glDrawArrays call there:

void Tess_DrawArrays( GLenum elementType )
{
if ( tess.numVertexes == 0 )
{
return;
}
// move tess data through the GPU, finally
glDrawArrays( elementType, 0, tess.numVertexes );
backEnd.pc.c_drawElements++;
backEnd.pc.c_indexes += tess.numIndexes;
backEnd.pc.c_vertexes += tess.numVertexes;
if ( glState.currentVBO )
{
backEnd.pc.c_vboVertexes += tess.numVertexes;
backEnd.pc.c_vboIndexes += tess.numIndexes;
}
}

the map loads, there is no r300 compiler error, memory usage is sane, and everything seems to be rendered properly:

r300 small alu

r300 small alu

r300 small alu

r300 small alu

This code was added by @gimhael in commit 0bb3550 which… interestingly… is part of the tiled render but this code seems to be used even if dynamic lighting is disabled.

illwieckz added a commit to illwieckz/Daemon that referenced this issue Oct 5, 2020
illwieckz added a commit to illwieckz/Daemon that referenced this issue Oct 5, 2020
illwieckz added a commit to illwieckz/Daemon that referenced this issue Oct 8, 2020
illwieckz added a commit to illwieckz/Daemon that referenced this issue Oct 12, 2020
illwieckz added a commit to illwieckz/Daemon that referenced this issue Oct 12, 2020
illwieckz added a commit to illwieckz/Daemon that referenced this issue Oct 14, 2020
illwieckz added a commit to illwieckz/Daemon that referenced this issue Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant