Performance issues with extension running longer periods of time #7

Open
jure opened this Issue Jul 3, 2014 · 12 comments

Comments

Projects
None yet
4 participants
@jure
Member

jure commented Jul 3, 2014

Relevant issues:

peers/peerjs#218
tsujio/webrtc-chord#6

It appears there is a "bug" in Chrome, which does not garbage collect any PeerConnection that's created in the history of runtime, even if you close() it. This results in 100% CPU usage and gigs of memory being used after a short while of running Scholar Ninja, which opens thousands of connections in that time.

This behaviour is supposedly according to spec:
https://code.google.com/p/chromium/issues/detail?id=356605
https://code.google.com/p/chromium/issues/detail?id=373690

Ideas about how to solve it are welcome.

One I've thought about is just reloading the backround.js every x seconds, to force garbage collection of all previous connections. Obviously not ideal, as you leave and than join the network again, increasing the churn rate artificially (and it's already plenty high)

@blahah

This comment has been minimized.

Show comment
Hide comment
@blahah

blahah Jul 3, 2014

What if you force garbage collection? Does it still not get collected?

http://stackoverflow.com/questions/13950394/forcing-garbage-collection-in-google-chrome

blahah commented Jul 3, 2014

What if you force garbage collection? Does it still not get collected?

http://stackoverflow.com/questions/13950394/forcing-garbage-collection-in-google-chrome

@jure

This comment has been minimized.

Show comment
Hide comment
@jure

jure Jul 3, 2014

Member

Even if it did, it would be very impractical to force users to use a command line flag to startup their browser.

Luckily, I won't have to debug this any further, I hope. It seems that this bug has been resolved in https://code.google.com/p/chromium/issues/detail?id=373690, about 2 weeks ago. I'm testing it right now.

Member

jure commented Jul 3, 2014

Even if it did, it would be very impractical to force users to use a command line flag to startup their browser.

Luckily, I won't have to debug this any further, I hope. It seems that this bug has been resolved in https://code.google.com/p/chromium/issues/detail?id=373690, about 2 weeks ago. I'm testing it right now.

@blahah

This comment has been minimized.

Show comment
Hide comment
@blahah

blahah Jul 4, 2014

OK yeah that was dumb - I didn't read the SO answers carefully enough (just googled to see if manually triggering GC was possible).

Great to see the bug is being fixed upstream.

blahah commented Jul 4, 2014

OK yeah that was dumb - I didn't read the SO answers carefully enough (just googled to see if manually triggering GC was possible).

Great to see the bug is being fixed upstream.

@jure

This comment has been minimized.

Show comment
Hide comment
@jure

jure Jul 6, 2014

Member

Based on some comments from @juberti, it looks like this particular fix will be landing in stable (Chrome 37) somewhere in September. The more I think about it and the more I consider the 20 currently running peers at 100% CPU right now, the more I think that forcing a refresh of the background page every X hours is not a terrible idea. Cycle chord.leave and chord.join and we're good again.

I have to say that I'm seeing 100% CPU after a while with latest Chromium build too, which supposedly already contains this fix. Not sure exactly what's going on there, but it looks like the same pathology as with current Chrome:

  • script, according to the timeline, isn't doing anything, 99% idle
  • profiler says 10 MB of memory,
  • Chromium helper is using gigs of memory and 100% CPU.

So it's likely I'm doing something wrong which is preventing garbage collection of the connections, but on the other hand, maybe this aspect of the issue has not been fixed.

Member

jure commented Jul 6, 2014

Based on some comments from @juberti, it looks like this particular fix will be landing in stable (Chrome 37) somewhere in September. The more I think about it and the more I consider the 20 currently running peers at 100% CPU right now, the more I think that forcing a refresh of the background page every X hours is not a terrible idea. Cycle chord.leave and chord.join and we're good again.

I have to say that I'm seeing 100% CPU after a while with latest Chromium build too, which supposedly already contains this fix. Not sure exactly what's going on there, but it looks like the same pathology as with current Chrome:

  • script, according to the timeline, isn't doing anything, 99% idle
  • profiler says 10 MB of memory,
  • Chromium helper is using gigs of memory and 100% CPU.

So it's likely I'm doing something wrong which is preventing garbage collection of the connections, but on the other hand, maybe this aspect of the issue has not been fixed.

@juberti

This comment has been minimized.

Show comment
Hide comment
@juberti

juberti Jul 6, 2014

Lack of GC should cause mem bloat but not CPU. If it is chewing CPU it must
be something else.

A chrome tracing dump on the helper process would be helpful.
On Jul 6, 2014 8:03 PM, "Jure Triglav" notifications@github.com wrote:

Based on some comments from @juberti https://github.com/juberti, it
looks like this particular fix will be landing in stable (Chrome 37)
somewhere in September. The more I think about it and the more I consider
the 20 currently running peers at 100% CPU right now, the more I think that
forcing a refresh of the background page every X hours is not a terrible
idea. Cycle chord.leave and chord.join and we're good again.

I have to say that I'm seeing 100% CPU after a while with latest Chromium
build too, which supposedly already contains this fix. Not sure exactly
what's going on there, but it looks like the same pathology as with current
Chrome:

  • script, according to the timeline, isn't doing anything, 99% idle
  • profiler says 10 MB of memory,
  • Chromium helper is using gigs of memory and 100% CPU.

So it's likely I'm doing something wrong which is preventing garbage
collection of the connections, but on the other hand, maybe this aspect of
the issue has not been fixed.


Reply to this email directly or view it on GitHub
#7 (comment)
.

juberti commented Jul 6, 2014

Lack of GC should cause mem bloat but not CPU. If it is chewing CPU it must
be something else.

A chrome tracing dump on the helper process would be helpful.
On Jul 6, 2014 8:03 PM, "Jure Triglav" notifications@github.com wrote:

Based on some comments from @juberti https://github.com/juberti, it
looks like this particular fix will be landing in stable (Chrome 37)
somewhere in September. The more I think about it and the more I consider
the 20 currently running peers at 100% CPU right now, the more I think that
forcing a refresh of the background page every X hours is not a terrible
idea. Cycle chord.leave and chord.join and we're good again.

I have to say that I'm seeing 100% CPU after a while with latest Chromium
build too, which supposedly already contains this fix. Not sure exactly
what's going on there, but it looks like the same pathology as with current
Chrome:

  • script, according to the timeline, isn't doing anything, 99% idle
  • profiler says 10 MB of memory,
  • Chromium helper is using gigs of memory and 100% CPU.

So it's likely I'm doing something wrong which is preventing garbage
collection of the connections, but on the other hand, maybe this aspect of
the issue has not been fixed.


Reply to this email directly or view it on GitHub
#7 (comment)
.

@jure

This comment has been minimized.

Show comment
Hide comment
@jure

jure Jul 7, 2014

Member

@juberti: Those were my general thoughts as well, but I don't know the internals of Chrome at all.

What I see is increasing CPU utilization when running a script, I have a fairly simplified replication of it in this gist: https://gist.github.com/jure/1da505b52bc8d1d33675

And a bit more description: It's a simple script where 2 peers join and then they disconnect. After a very short while (seconds), Chrome is using 3,5 GB of memory and 100% CPU. If you profile the script, it looks like nothing unusual is going on and the stack size is just over 8 MB. Even weirder is that even if you pause execution of the JavaScript, CPU usage and memory usage remain equally high.

@juberti let me know what you need and I'll gladly provide it. How do I generate a Chrome tracing dump?

Member

jure commented Jul 7, 2014

@juberti: Those were my general thoughts as well, but I don't know the internals of Chrome at all.

What I see is increasing CPU utilization when running a script, I have a fairly simplified replication of it in this gist: https://gist.github.com/jure/1da505b52bc8d1d33675

And a bit more description: It's a simple script where 2 peers join and then they disconnect. After a very short while (seconds), Chrome is using 3,5 GB of memory and 100% CPU. If you profile the script, it looks like nothing unusual is going on and the stack size is just over 8 MB. Even weirder is that even if you pause execution of the JavaScript, CPU usage and memory usage remain equally high.

@juberti let me know what you need and I'll gladly provide it. How do I generate a Chrome tracing dump?

@jure

This comment has been minimized.

Show comment
Hide comment
@jure

jure Jul 7, 2014

Member

@juberti, at the point when CPU is at 100% (after running Scholar Ninja for a while) I did a spindump, sampling (using activity monitor) and a trace using Xcode's Instruments and the Counter tool: https://www.dropbox.com/s/pyudb9ab389hcy0/Chromium-trace.zip

image

As you can see, the script is not really doing much at this point:
image

The same mechanic is present as previously (there is a big difference in memory usage though, Chromium uses a lot less of it, presumably because of the fix): If I pause script execution in the developer's console, CPU remains at 100%. Only if I disable the extension (or close the tab running the test) does CPU usage drop back to normal.

Hope this elucidates the nature of this problem a bit more.

Member

jure commented Jul 7, 2014

@juberti, at the point when CPU is at 100% (after running Scholar Ninja for a while) I did a spindump, sampling (using activity monitor) and a trace using Xcode's Instruments and the Counter tool: https://www.dropbox.com/s/pyudb9ab389hcy0/Chromium-trace.zip

image

As you can see, the script is not really doing much at this point:
image

The same mechanic is present as previously (there is a big difference in memory usage though, Chromium uses a lot less of it, presumably because of the fix): If I pause script execution in the developer's console, CPU remains at 100%. Only if I disable the extension (or close the tab running the test) does CPU usage drop back to normal.

Hope this elucidates the nature of this problem a bit more.

@jure

This comment has been minimized.

Show comment
Hide comment
@jure

jure Jul 7, 2014

Member

Looking at other issues on Chromium, I guess you really meant the chrome://tracing trace, so here it is:
https://www.dropbox.com/s/jz36vu9a8whjblx/trace.json.zip

Looks like the libjingle worker thread is in overdrive.

Member

jure commented Jul 7, 2014

Looking at other issues on Chromium, I guess you really meant the chrome://tracing trace, so here it is:
https://www.dropbox.com/s/jz36vu9a8whjblx/trace.json.zip

Looks like the libjingle worker thread is in overdrive.

@juberti

This comment has been minimized.

Show comment
Hide comment
@juberti

juberti Jul 7, 2014

Can you file a chromium bug on this? I think this is a different issue.
On Jul 7, 2014 10:47 PM, "Jure Triglav" notifications@github.com wrote:

Looking at other issues on Chromium, I guess you really meant the
chrome://tracing trace, so here it is:
https://www.dropbox.com/s/jz36vu9a8whjblx/trace.json.zip

Looks like the libjingle worker thread is in overdrive.


Reply to this email directly or view it on GitHub
#7 (comment)
.

juberti commented Jul 7, 2014

Can you file a chromium bug on this? I think this is a different issue.
On Jul 7, 2014 10:47 PM, "Jure Triglav" notifications@github.com wrote:

Looking at other issues on Chromium, I guess you really meant the
chrome://tracing trace, so here it is:
https://www.dropbox.com/s/jz36vu9a8whjblx/trace.json.zip

Looks like the libjingle worker thread is in overdrive.


Reply to this email directly or view it on GitHub
#7 (comment)
.

@jure

This comment has been minimized.

Show comment
Hide comment
@jure

jure Jul 9, 2014

Member

https://code.google.com/p/chromium/issues/detail?id=392651&thanks=392651&ts=1404944415

@juberti: Let me know if there's anything more I can do. I'm not experienced with Chromium bug reporting.

Member

jure commented Jul 9, 2014

https://code.google.com/p/chromium/issues/detail?id=392651&thanks=392651&ts=1404944415

@juberti: Let me know if there's anything more I can do. I'm not experienced with Chromium bug reporting.

jure added a commit that referenced this issue Jul 16, 2014

Rename exported functions to fit webrtc-chord's naming, no reason why…
… they should be different. Introduce a hacky way of keeping CPU usage low, keeps reloading the background page every hour (issue #7). Have a way of letting other modules know that dht.js has joined the network (so things like restoring documents from cache can happen).
@yarikoptic

This comment has been minimized.

Show comment
Hide comment
@yarikoptic

yarikoptic Sep 19, 2015

is ScholarNinja still kicking? sad to see it leave :-(

is ScholarNinja still kicking? sad to see it leave :-(

@jure

This comment has been minimized.

Show comment
Hide comment
@jure

jure Sep 20, 2015

Member

It's not dead, but it's not particularly alive either. I haven't tested the current situation, but I would expect that performance issues with lots of WebRTC connections in both Firefox and Chrome are now non-existent, or at least significantly reduced. Might be worth trying the experiment again!

Member

jure commented Sep 20, 2015

It's not dead, but it's not particularly alive either. I haven't tested the current situation, but I would expect that performance issues with lots of WebRTC connections in both Firefox and Chrome are now non-existent, or at least significantly reduced. Might be worth trying the experiment again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment