I tried Portal as it was the first in the beta and is the most "clean" example.
The lag is very similar to so called input lag. When turning with the mouse the screen doesn't follow at once and you end up under different angle than desired. In-game it makes you walk like a drunk who can't hit the door.
At first I did check if I have "mouse filter" enabled. It was disabled.
I tried Raw input, but there was no change.
Since I'm overriding the audio output (to set alsa), I added "-nosound" to the command line parameters of Portal. The lag was not affected.
I tried to lower all graphic settings one by one, but for the most part there was no change. With "Motion Blur" set to disable there was noticeable improvement. "Multicore Rendering" was disabled. Setting it had no measurable impact. I left it disabled.
Lowering the resolution from 1920x1080 to 1280x720 eliminated the lag completely. Using intermediate resolutions introduced the lag in gradual steps.
Next, I set all graphic options to their default values, kept "Antialiasing", "Multicore Rendering" and "Motion Blur" disabled, restored 1920x1080 resolution and walked around a little. I enabled cl_showfps 1 too.
There are places where the lag is quite severe and places where the lag is hard to notice. It is interesting that at all times when the fps is high (above 150) the lag is present. The higher the fps the higher the lag. At extreme I got 200fps and visibly walking slower. Later when I tried different fglrx drivers, same location never got more than 150fps.
Limiting the fps with "Wait for vertical sync" eliminates the lag. Opening infinite portals also eliminates the lag/stutter (it lowers fps).
Then I disabled the fullscreen, so I could see some system diagnostic while playing. This had an side effect - lag turned into stutter. Later I discovered that the stutter is synchronized with the graph gadget that draws cpu usage in the wm taskbar.
One of my old quicksaves was on chamber 13, in the corridor right out of the elevator. It proved nice test case. Facing the elevator and walking the corridor shows a stutter, facing the chamber and doing the same shows no stutter at all.
I tried net_graph 4 and there is nothing interesting on the graph, no visible peaks correspond with lag/stutter, the graph looks the same with and without lag/stutter.
First thing I noticed was that when there is lag/stutter the cpu graph (from my wm) shows a lot of kernel load. top indicated that "migration/..." threads were getting about 10%cpu each (4 of them). I found a way to disable process migration, but the load moved to "kworker/..." threads.
I suspected kernel problem related to automatic page migration and consolidation, so I built kernels without them. Then I tried to disable NO_HZ, then to enable "Preemptible Kernel (Low-Latency Desktop)" and set the timer interrupt frequency to 1000Hz. I also tried the stock distribution kernel (3.8.8 at the time), and I've built my own vanilla 3.9 and 3.9.1. None of these had any noticeable change.
I also changed the cpu governor to "performance" at runtime - no change, either.
I tried to capture the stutter with screen recoding program (ffmpeg). I hoped to use cl_showpos 1 to see if the position jumps at the stutter. Unfortunately while capture program is running, there is no stutter at all (or it is fine grained). I tried to capture without transcoding (vcodec rawvideo) and discarding it to /dev/null, there was no stutter. I tried to max the cpu load with transcoding video and the stutter was as usual. It clearly shows that the act of capturing the screen somehow affects the stutter.
I tried playing back a video while playing, the draw operation seems to smoothed the lag/stutter.
I tried video playback through the opengl ( mplayer -vo gl) and I noticed that it slows down playback when game lags.
perf top when facing chamber - no lag.
24.97% fglrx_dri.so [.] 0x01c3a078
10.75% client.so [.] 0x0048f6bf
5.93% shaderapidx9.so [.] 0x00020ad1
5.51% engine.so [.] 0x00424a55
3.77% materialsystem.so [.] 0x0008be1d
2.49% [fglrx] [k] 0x00003651
1.91% server.so [.] 0x0046d081
1.65% stdshader_dx9.so [.] 0x00061439
0.86% studiorender.so [.] 0x00020a20
0.84% vguimatsurface.so [.] 0x00172b55
0.84% [kernel] [k] __kernel_text_address
0.73% [kernel] [k] system_call
0.67% libtogl.so [.] IDirect3DDevice9::DrawIndexedPrimitive(_D3DPRIMITIVETYPE, int, unsigned int, unsigned int, unsigned int, un
0.66% [kernel] [k] __schedule
0.61% [kernel] [k] sub_preempt_count
0.58% vgui2_s.so [.] 0x0007993e
0.55% datacache.so [.] 0x00045fae
0.54% [kernel] [k] get_typical_interval
0.54% [kernel] [k] print_context_stack
0.54% [kernel] [k] is_module_text_address
0.50% libc-2.17.so [.] memcpy
0.47% [kernel] [k] find_busiest_group
perf top when facing elevator - lag and stutter.
41.70% fglrx_dri.so [.] 0x0100f516
31.50% [fglrx] [k] 0x00008101
8.04% client.so [.] 0x0034cce9
2.59% engine.so [.] 0x003cdca4
1.99% shaderapidx9.so [.] 0x0005a731
1.34% materialsystem.so [.] 0x000331e5
0.93% vguimatsurface.so [.] 0x00172b46
0.64% stdshader_dx9.so [.] 0x00061439
0.56% libm-2.17.so [.] __sin
0.50% server.so [.] 0x00b4c7d1
0.31% vphysics.so [.] 0x000423f1
0.30% libm-2.17.so [.] feraiseexcept@@GLIBC_2.2
0.26% vgui2.so [.] 0x00017d7a
0.24% libtogl.so [.] IDirect3DDevice9::DrawIndexedPrimitive(_D3DPRIMITIVETYPE, int, unsigned int, unsigned int, unsigned int, un
0.22% datacache.so [.] 0x00032ff8
0.21% libc-2.17.so [.] _int_malloc
0.20% [kernel] [k] system_call
0.16% studiorender.so [.] 0x0002588a
0.14% libc-2.17.so [.] _int_free
0.13% libc-2.17.so [.] memcpy
0.12% libtogl.so [.] 0x0001b2bf
0.11% libc-2.17.so [.] malloc
When running latencytop and having lag&stutter, hl2_linux gets the amazing max 247.3ms lag. The case with most lag is "[down]" and the backtrace is:
(down is kernel semaphore function)
I tried to capture the opengl stream with apitrace, but the result was not replayable.
The last thing I tried was fglrx regression. Catalyst 13.3-beta3 had no lag. The driver however draws "AMD-beta" emblem on the screen. I eliminated the logo through short-circuit of 0041d2c0 <atiddxEnableLogo> function (32bit fglrx_drv.so) - the hacked version without visible logo is also lag-free. I tried Catalyst 13.1 too, it also ran nice and smooth (with only patching fglrx.ko source for 3.9.2 kernel).
I also tried half of the gl_* console commands, but I had no luck in finding workaround that way.
I've played TF2 and CS:S before, without noticing any such issues.
Processor: Intel(R) Core(TM) i3 CPU 530 @2.93GHz
Motherboard: Gigabyte GA-P55-UD3L
Chipset: Intel P55 Express Chipset
System Memory: 4GB DDR3
Graphics: Gigabyte Radeon HD 5670
Audio: Realtek ALC888 (Intel HDA rev 06)
Monitor: 1x Dell U2311H
Screen Dimensions: 1920 x 1080 pixels
OS: Slackware current (post 14.0)
Kernel: 3.9.1 (x86_32)
Sound: alsa 1.0.27 (x86_32)
Display Server: xorg-x11-server 1.13.2 (x86_32) (Composite disabled)
Mesa Libs: 9.1 (x86_32)
Catalyst: 13.4 (x86_32)
OpenGL: 4.2.12217 Compatibility Profile Context 8.961
Recent Failure Reports:
Mon Apr 22 15:58:59 2013 GMT: file ''/tmp/dumps/assert_20130422185857_1.dmp'', upload yes: ''CrashID=bp-9cb030db-e057-4cdb-8773-a8a762130422''
EP1 and EP2 run the exact same binaries.... Got any idea on the variance you see between them?
I confirm this report (also on 13.4). CS:S also has this lag, but it all depends on scene complexity and graphics settings (see #431).
I want to add something. I had disabled game-overlay for Portal, it doesn't seem to have an impact on the lag.
@alfred-valve, I played only the first 5 minutes of EP1 and EP2, so the scene complexity is quite plausible explanation.
There is a location in first level of EP1 that shows huge lag
maps/ep1_citadel_00.bsp pos: -6477.44 6043.07 -36.07 ang: 0.00 36.37 0.00
(I think DOG throws the van at my position: ) It lags when I face the stone wall that is lit by fire. I have 120fps while lagging. Looking at ang: 0.00 110.0 0.00 shows almost no lag.
There are other fires that doesn't cause massive lag.
pos: -6477.44 6043.07 -36.07
ang: 0.00 36.37 0.00
ang: 0.00 110.0 0.00
Also, I tried a few more levels in CS:S. There are levels that have no lag, levels that have lag on some places and levels that have huge lag from the start.
Also, I tried the benchmark in CS:S in non-fullscreen window. That should have turned the lag into stutter, but I saw no stutter.
If you haven't reproduced the problem yourself and want me to run some other programs or do some other tests, feel free to ask me.
I can confirm for 13.4 (Ubuntu 12.10 64-bit, Radeon 6670, Half-Life 2). I logged GPU usage using https://github.com/clbr/radeontop tool.
The part with a huge lag:
Almost the same place but without lag:
Look at Event Engine (ee) usage. It goes up when game lagging, everything else (Primitive Assembly, Vertex Grouper + Tesselator etc) goes down.
I can deliver detailed logs if they would be useful. BTW there is no lag at all using open source drivers.
The Catalyst 13.6-beta is already out and it also shows the same symptoms. They are present even when the beta logo is visible on the screen corner.
There is one more thing, I suspect that the problem is reproducible with a simple glxgears. The program runs normally when started (~6400fps), but once maximized it starts to stutter (~1200fps).
The same program under 13.3-beta3 runs with ~2400fps (normal), ~1200fps (maximized) and no stutter at all.
I experienced the same lag with other games, but not all - for example the Awesomenauts exhibit this, but not Oil Rush. Interetingly Awesomenauts author Joost written an article about this (http://joostdevblog.blogspot.cz/2011/10/what-no-one-told-you-about-videocard.html - see the fift picture) , and mentions fencing.
I can assure you that this issue is significantly worse in the fglrx drivers for GCN (Radeon HD 7xxx) hardware. TF2 is completely unplayable. All you can do is play 2D games, old Source games, and indie games with little graphics. Open source radeon driver also causes a lot of black boxes to flicker all over the screen. There's no place for AMD in gaming on Linux at the moment.
I've investigated a bit more and it seems to be an engine's problem, not driver's, after all. I am not sure if glXSwapBuffers should empty the command buffer but on catalyst it simply does not.
As a partial remedy, I hijacked elfhacks project and created an LD_PRELOAD hook that fixes this: https://github.com/volca02/glsync
As this forces to flush the command buffer on every frame it is clearly not ideal - expect a FPS drop (we could for example do this so that every frame finishes at the end of the next one). Also, unfortunately, it does not work with Left4Dead2 (but it works with other games).
Edit: fixed the sync to be done on end of next frame - the FPS seems quite similar to original, but without the lag.
This is a driver bug; it shouldn't be getting that many frames ahead. SwapBuffers() is specified to cause a command buffer flush. I know the NVIDIA driver has a configuration knob that lets you set how many frames you can have queued in the command buffer; the default value is 3, I believe. The catalyst driver has an internal setting called the 'flip queue', but it looks like it isn't properly kicking in in this case.
Then is there any way that we can access this flip queue property much like RadeonPro does in Windows? It was able to expose all the finer parameters for control for users, so I would imagine it is possible to do the same in Linux, even if by some clever hack. However, I suppose we need Valve to contact them and tell them about the problem. AMD doesn't listen to little people like myself. I made a post about these issues in their forum there with not a response from AMD. It just got ignored I guess.
Hmm, are you sure glFlush (and thus glXSwapBuffers) should cause any CPU synchronization? It is not said it should guarantee any of the commands already executed before returning from the call - only that they will be executed in "finite time" (ASAP). What I gather from the glFlush in opengl documentation is that calling that will result in guarantee the commands will get executed, as soon as the GL environment allows - there is nothing said about the call blocking before the command buffer empties (which of course would be ineffective since we'd have to wait for CPU to enter all commands of the next frame and GPU would starve in the meantime).
The way I see it is that the GL driver should provide at least a weak guarantee, much like the said maximal count of frames in the command queue - but it is up to the user to provide additional synchronization as needed. But there is nothing in the article (http://www.opengl.org/sdk/docs/man/xhtml/glFlush.xml) enforcing the synchronization, in fact the note says the opposite.
@ Plagman, would you please add gl_finish variable that forces glFinish() somewhere in the rendering cycle. It could be right before or after glXSwapBuffers().
I tried both in the glxgears.c and it seems to eliminate the extra buffering/render lag/stutter and it still manages to provide 3000fps with vsync disabled.
Doom3 does have r_finish to force similar behavior (disabled by default).
I checked wine, it seems that wined3d issues glFinish before glXSwapBuffers if there is more than 1 context.
I'm afraid that @voica02 may be right. You need something that guarantees synchronization.
Technically if you rely on undefined behavior, it is not a driver bug.
I think that glFinish might be slightly too aggressive - this is why I used fences in the hack, that way I always have GPU do work without starving/waiting for next frame to be entered (e.g Frame 1 is sent, Frame 2 is sent, Frame 1 is waited for to be finished, Frame 3 is sent, Frame 2 is waited for to be finished...).
I know I'm terribly late, but just dropping in to confirm this. Ubuntu 13.04 64-bit, Catalyst 13.6 Beta (xorg-edgers), Radeon HD 5830.
It's reproducible with glxgears as @iiv3 noted earlier. Setting fps_max to 60 seems to be an effective workaround in at least Garry's Mod and Half-Life 2: Deathmatch. I'm getting terrible performance in Team Fortress 2 for some reason so I can't be sure if it had any effect there.
Those games get a benefit because you are already getting more than 60 FPS. However, TF2 is quite horrible; especially when you join an online server. With 24 bots the framerates are sluggish a bit with a bit input lag but at least smooth to a degree. However, join an online server and say hi to 1FPS.
@volca02, yes of course, glFenceSync() provides much finer control. The only problem is that it requires OpenGL 3.2 or newer. I'm not sure what is the minimum required version for Source1 games.
There is something more. Sometimes lower latency is more important than high framerates. One such case is in multiplayer games. For example 3 frames delay at 60Hz is 50ms (that is on top of network lag).
Another case that may prefer lower latency is Oculus Rift (the 3D Virtual Reality goggles).
Players that have sufficiently powerful hardware may want to get the lower lag.
The best solution would be to check for "ARB_sync" extension. If it is available then allow configurable delay using the fences. If it is not available then allow enabling of 0 frame delay using the glFinish.
Wanted to post this link: http://www.nvidia.in/object/General_FAQ.html#G4
Apparently this is a problem also present on windows.
A little bit more info about the bug.
Please, read these first:
It is important to note that the primary function of SwapBuffers is to provide a new working buffer. The presentation is secondary to that. This is important to note for the case of vsync blocking.
(SwapBuffers is not explicitly defined to be working like this... but it is the most logical way.)
Let's examine a case where we have 2 buffers. We have buffer#1 already rendered and visible. We issue commands for drawing buffer#2. When SwapBuffers is called, it is not supposed to block until vsync (that blocking is actually a side effect), it is supposed to block until buffer#1 is freed(aka hidden). For this to happen the drawing commands (for buffer#2) should be finished AND THEN wait buffer#2 to be made visible by vsync (thus hiding and freeing buffer#1).
To avoid the above blocking, OpenGL uses triple buffering. Then, instead of blocking right away the extra buffer#3 would be allocated and commands could be issued while we wait for the rendering and switching of buffer#2. Then on next SwapBuffers we will block only if buffer#2 still haven't been displayed yet.
What I have found.
The problem with fglrx is "optimization" that allows SwapBuffers to work asynchronously. It allows the CPU to continue issuing OpenGL commands even when there is no free available buffer. (like borrowing a buffer we don't have yet). This lets the CPU run ahead and fill up the input command queue and then also fill up the GPU command queue.
In a sense Source1 games trigger this bug because they are too well optimized and avoid issuing of any commands that cause synchronization events (mentioned in the second link), just like glxgears.
To confirm this, I used the fence query function glGetSynciv() to figure out the amount of the render lag in glxgears -fullscreen. With some tweaking (e.g. call draw_gears() twice) I managed to reach a peak of 248 frames. At 1920x1080*3(rgb) it amounts more than my total video ram, so there is no way it really allocates that many buffers.
Also glGetIntegerv(GL_RENDERBUFFER_FREE_MEMORY_ATI, memory_info); usually returns same values (649916,455936,1883763,3600) even at peak lag. (VBO/TEXTURE/RENDERBUFFER seem to return the same values). Meaning, no additional memory is allocated for render buffers.
There is however one problem with this explanation. If SwapBuffers was entirely asynchronous, then vsync would not work or it would cause even bigger render lag. This means that SwapBuffers still blocks at vertical retrace, but that is the only thing it blocks at. Once the retrace is cleared it would continue without checking for available buffer.
I'm still able to get render lag with enabled vsync, but it is a lot harder to trigger. I did manage to get render lag of 5 frames using glxgears and calling draw_gears() 200 times. This means, it lags when rendering time is bigger than display duration.
I added a gl_finish cvar to force a sync in drivers that don't ever throttle swapBuffers, as requested. It should ship as part of the next update.
I saw the update of Portal and TF2 and it seems to works just fine.
Any plans to update the rest of the Source games?
This is live in HL2 beta now too, and will be soon for our other Multi-player titles.
I see, it's borderline playable now in servers. There's still some lag and framerates keep dipping to 19 with an average of 29 FPS but at least there isn't 2 second input lag.
@iiv3 you probably talking about this issue: http://ati.cchtml.com/show_bug.cgi?id=832#c13
thanks for the report; we found a regression in OGL to control the lag indeed. this will be fixed shortly.
thanks for the report; we found a regression in OGL to control the lag indeed. this will be fixed shortly.
Hello, trying to use gl_finish 1 command to see if it fixes lags in Portal if portal render depth is > 0 I get Engine error Too many verts for dynamic vertex buffer (106370>32768) Tell a programmer to up VERTEX_BUFFER_SIZE, what does it mean?
Too many verts for dynamic vertex buffer (106370>32768) Tell a programmer to up VERTEX_BUFFER_SIZE