Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nxagent entering high CPU load while session is suspended. #671

Closed
Ionic opened this issue Mar 5, 2018 · 29 comments

Comments

Projects
None yet
7 participants
@Ionic
Copy link
Member

commented Mar 5, 2018

With 3.5.99.14 on a Debian Unstable box, I started up a MATE session with one maximized mate-terminal, a running firefox instance showing the default first-run page (https://www.mozilla.org/en-US/firefox/54.0/firstrun/) and detached from it.

While at first nothing happened, after an hour I was able to reproduce nxagent producing high CPU load.

This seems to be consistent with reports from other X2Go users.

Reattaching to the session was not possible at first, but after a few more tries a new x2goagent process showed up in the process list(?), x2goserver listed the session as running and CPU usage was back to normal. Changing state from running to suspended lead to the other x2goagent process vanishing after a minute and the original x2goagent process to continue consuming CPU resources.

In all cases, nxproxy is reporting the (unhelpful) error message

Error: Failure negotiating the session in stage '7'.
Error: Wrong version or invalid session authentication cookie.

I'll try to fire up the perf tool to see if I can get a reasonable callgraph when this issue appears.

@Ionic

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2018

perf:

# Total Lost Samples: 0
#
# Samples: 40K of event 'cycles:ppp'
# Event count (approx.): 18204830777
#
# Children      Self  Shared Object    
# ........  ........  .................
#
    94.41%    65.13%  nxagent          
            |          
            |--43.28%--GetTimeInMillis
            |          |          
            |           --28.14%--__vdso_gettimeofday
            |                     |          
            |                      --5.92%--0x977
            |          
            |--21.03%--ScreenSaverTimeoutExpire
            |          
            |--12.10%--DPMSSet
            |          
            |--11.82%--TimerSet
            |          
            |--3.39%--DoTimer
            |          
             --2.45%--WaitForSomething

    34.52%    34.37%  [vdso]           
            |          
            |--28.81%--GetTimeInMillis
            |          |          
            |           --28.02%--__vdso_gettimeofday
            |                     |          
            |                      --5.92%--0x977
            |          
             --5.70%--__vdso_gettimeofday

     0.50%     0.50%  [kernel.kallsyms]
     0.00%     0.00%  libc-2.26.so

Looks like it's caused by the screensaver and the code changes related to that.

@uli42

This comment has been minimized.

Copy link
Member

commented Mar 5, 2018

GetTimeInMillis could also be related to the new logging code. Although nxagent is not using it explicitely it is still using nxcomp.

@Ionic

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2018

Yeah, but the screensaver timeout and DPMS stuff? I remember having merged code changes for that between 3.5.99.13 and 3.5.99.14, maybe something is amiss with that.

It's spinning in these functions way too often and never sleeping, so will need to look into this more carefully.

I'll also try disabling MATE screensaver in the session, then disconnect and see how nxagent behaves then.

@Hybrid512

This comment has been minimized.

Copy link

commented Mar 5, 2018

Hi,

I got some strange issues with screensaver stuff too.
At first, I thought it was triggered by openbox (for my TCEs) but as I read your reports, it's not only in that situation.
I had to step back to previous release, once my users were leaving their desktop for more than 10mn, they were unable to resume their session, nothing was working correctly and we had to kill their session one by one.
There is definitely something broken there regarding Xorg screensaver stuff.

@Ionic

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2018

Disabling both the screensaver and DPMS makes nxagent behave sanely again. At least the issue didn't show up for 1.5 hours.

It seems to be difficult actually provoke this, though. So far, neither setting DPMS power modes forcefully nor enabling the screensaver forcefully provoked nxagent to go into this mode.

After setting both timeouts to a minute and detaching, waiting for almost 50 minutes, nothing happened. A quicker way to trigger it rather than waiting for hours would be great to have...

@uli42

This comment has been minimized.

Copy link
Member

commented Mar 5, 2018

Suggestion: reduce the default timeouts in the code

@4nanook

This comment has been minimized.

Copy link

commented Mar 5, 2018

@RobbieTheK

This comment has been minimized.

Copy link

commented Mar 5, 2018

Would this change do the same thing?

One option is to edit the vlc.desktop file in the /usr/share/applications folder.

Replace the original command
Exec=/usr/bin/vlc –started-from-file %U

with the following line:
Exec=sh -c “xscreensaver-command -exit; /usr/bin/vlc –started-from-file %U; xscreensaver -nosplash”

@Hybrid512

This comment has been minimized.

Copy link

commented Mar 5, 2018

Are we talking about screensaver/dpms on the server side or the client side ?
I tried with updated libs on both and encountered the problem.
Reverted back to previous release on the client side and got problem too.
Only reverting both sides to previous release fixed my problem.

@Hybrid512

This comment has been minimized.

Copy link

commented Mar 5, 2018

okay ... seems really server side ....
Just leave a session alone for 10mn and you'll loose connection.
Once disconnected, you won't be able to resume the session.

@Ionic

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2018

More testing results:

Timeouts set to 1 minute each (if enabled).

[✘] MATE Screensaver [✘] Display Power Saving: no CPU spike
[✔️] MATE Screensaver [✔️] Display Power Saving: 100% CPU after 30 - 40 minutes
[✔️] MATE Screensaver [✘] Display Power Saving: 100% CPU after ~30 minutes
[✘] MATE Screensaver [✔️] Display Power Saving: 100% CPU after unknown time. Left it running for about 7h, CPU time currently is at 7:08h, so likely also within ~30 minutes

So it seems to be caused by either feature, only when both are turned off, everything works fine again.

@Ionic

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2018

Would this change do the same thing?

If you're running xscreensaver, yes. The better option would to be to disable the screensaver and DPMS in your desktop environment's settings.

okay ... seems really server side ....

Definitely server-side (only).

@RobbieTheK

This comment has been minimized.

Copy link

commented Mar 5, 2018

If you're running xscreensaver, yes. The better option would to be to disable the screensaver and DPMS in your desktop environment's settings.

I wanted to apply this to all users so I followed this suggestion,
In your /etc/X11/xorg.conf file, add this section:

Section "ServerFlags"
    Option    "blank time" "0"
    Option    "standby time" "0"
    Option    "suspend time" "0"
    Option    "off time" "0"
EndSection

And the command below that worked without any warning/error.

gconftool-2 --direct --config-source xml:readwrite:/etc/gconf/gconf.xml.mandatory --type bool --set /apps/gnome-screensaver/idle_activation_enabled false

Hoping one of these helps...

@Ionic

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2018

I wanted to apply this to all users so I followed this suggestion,
In your /etc/X11/xorg.conf file, add this section:

That won't do anything useful for nxagent. It's not using a server config file, and certainly even less this one. You can safely revert this change.

And the command below that worked without any warning/error.

That will work if you're using GNOME/GNOME screensaver. Other DEs will be unaffected by that case. Some DEs might not even have a global configuration file, but I figure most should have. MATE for instance doesn't have something like that directly, but you could add an autostart .desktop file that disables the relevant option via dconf. The command for that would be different, though.

@Ionic

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2018

This is getting weird. Disabling both MATE screensaver and DPMS in MATE power settings, then restarting the session and disconnecting lead to 100% CPU usage after ~30 minutes.

Previously that seems to have worked for 1.5 hours, but maybe I've been also poking directly with forcing DPMS power modes that might have done something.

Guess I'll have to dig into screensaver and DPMS code and find out what is happening.

@4nanook

This comment has been minimized.

Copy link

commented Mar 5, 2018

@4nanook

This comment has been minimized.

Copy link

commented Mar 5, 2018

@Ionic

This comment has been minimized.

Copy link
Member Author

commented Mar 5, 2018

I disabled both screen saver and DPMS and left a server connected via x2go up all night and the connection stayed up all night and was still servicable this morning.

That's what I thought as well, but I cannot reproduce nxagent running fine with both DPMS and MATE Screensaver disabled, which is weird.

/usr/bin/xset -dpms s off s noblank s 0 0 s noexpose -dpms
This worked, system stayed connected all night.

Theoretically, that should do the same thing, but maybe MATE is messing up here. I'll see how that goes in a new session.

@4nanook

This comment has been minimized.

Copy link

commented Mar 5, 2018

@Ionic

This comment has been minimized.

Copy link
Member Author

commented Mar 6, 2018

Turning off everything through xset directly like you did seems to alleviate the problem. I don't quite understand why the MATE GUI settings don't do exactly this as well, but alright.

I've noticed that some code backports are missing, will test to see if that helps.

@Ionic

This comment has been minimized.

Copy link
Member Author

commented Mar 6, 2018

Thanks to @sunweaver, I figured out (probably) that disabling DPMS support completely within nxagent also works around that issue.

This is not really what I had in mind though, so will continue digging.

@pongraczi

This comment has been minimized.

Copy link

commented Mar 6, 2018

I also experienced this kind of issue, under latest x2goserver + Linuxmint 18.3 + Mate
I try to solve the issue with the xset command as shown above.
Otherwise, in mate control panel I already turned off dpms and screensaver, but seems there was no effect.

@4nanook

This comment has been minimized.

Copy link

commented Mar 7, 2018

@sunweaver

This comment has been minimized.

Copy link
Member

commented Mar 7, 2018

@pongraczi

This comment has been minimized.

Copy link

commented Mar 7, 2018

So, xset on command line (or in .profile) did the trick, x2go session is up and running. Thanks.

@4nanook

This comment has been minimized.

Copy link

commented Mar 7, 2018

@4nanook

This comment has been minimized.

Copy link

commented Mar 7, 2018

@Ionic Ionic closed this in 76e7d26 Mar 7, 2018

@uli42

This comment has been minimized.

Copy link
Member

commented Mar 7, 2018

@Ionic

This comment has been minimized.

Copy link
Member Author

commented Mar 7, 2018

The DPMS display mode, i.e., on (default state), standby, suspend, off (increasing power reduction but also response time from left to right).

@sunweaver sunweaver added this to the 3.6.0.0 milestone Mar 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.