Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak on Render #164

Closed
l-cornelius-dol opened this issue Jul 10, 2017 · 53 comments
Closed

Memory Leak on Render #164

l-cornelius-dol opened this issue Jul 10, 2017 · 53 comments

Comments

@l-cornelius-dol
Copy link
Collaborator

l-cornelius-dol commented Jul 10, 2017

I've been chasing a severe memory leak since Saturday, that seems to boil down to references internal to DOMVM. The situation is a relatively simple, but large (4148 elements), list of pairs of string rendered in a DIV (translations, source text and target text).

Every render grows memory significantly, until I crash the 32 bit browser at a little under 4 GBs.

The usage pattern in the debugger from a heap differential looks like this, where it all seems to end with initElement, as if the elements are being cached and never discarded:

image

Any thoughts on where to go with this? Is there something my code might be doing to cause old vNode elements to be retained from render to render?

Version 3.0.2.

@l-cornelius-dol
Copy link
Collaborator Author

Also dev-mode yields nothing interesting; only a warning about an unkeyed input, which is uniquely keyed by a containing div.

@leeoniya
Copy link
Member

the only things in userspace that would affect this is holding references to vnodes or vms in your app. are you doing either of these?

it probably won't come as a big surprise to you that a repro would be 1,000x more useful :)

the SSR bench [1] creates a decent-sized tree with ~2000 nodes and ~200 sub-views. it may be worth setting a redraw interval of 1000ms and watch what happens to the RAM. the dbmonster bench @ 0 mutations generates ~3k nodes on each redraw and is still pretty gentle on the GC. any serious leaks would have been caught there very quickly. it makes me think it's an issue with sub-views, if it's internal at all.

[1] https://github.com/leeoniya/domvm/tree/3.x-dev/demos/bench/ssr

@l-cornelius-dol
Copy link
Collaborator Author

holding references to vnodes or vms in your app

I'm not, but I'm used to much more capable JVM debuggers for identifying this kind of thing, so I'll triple check.

it probably won't come as a big surprise to you that a repro would be 1,000x more useful :)

Yep, getting there. I was hoping for some low-hanging fruit to check for first.

it makes me think it's an issue with sub-views, if it's internal at all.

I don't have any sub-views in this one -- it's all a single view, one of many that will eventually form a larger overall application as I re-engineer it all to be an AJAX based DOMVM app.

I'll check the things you've suggested and report back here.

@l-cornelius-dol
Copy link
Collaborator Author

Actually, I am thinking it could be an accidental enclosure of each rendered graph. The speed of growth is definitely proportional to the number of elements displayed.

@leeoniya
Copy link
Member

leeoniya commented Jul 10, 2017

check out the attached tests.

  1. single top level redraw @100 ms
  2. single top level redraw @100 ms + "Content" subview redraw @73ms
  3. single top level redraw @100 ms + "Link" subviews redraw (all 200 of them) @73ms

all show normal sawtooth GC pattern and no uncontrolled growth. the benchmarks are ultra-sensitive to poor GC & mem leaks, so i think anything this severe would have shown up by now :/

memtest.zip

@l-cornelius-dol
Copy link
Collaborator Author

Hmmm. On every redraw, even while idle, I have two listeners added. Which surely explains why the DOM trees remain reachable. However, I only use onxxx for listeners, AFAICT.

@leeoniya
Copy link
Member

does the problem go away if you re-create the listeners on each redraw inside render?

@l-cornelius-dol
Copy link
Collaborator Author

You mean with a construct like onclick: [xltPairClick,nsp,xpt]? That's the only kind I have. Ten handlers, all recreated with every redraw.

@leeoniya
Copy link
Member

leeoniya commented Jul 11, 2017

no, if you reference a shared function and the params stay the same then nothing is recreated or rebound. parameterized handlers diff that array between redraws to determine if anything needs to be recreated/rebound.

try using an inline xltPairClick.bind(...). you can also try setting xltPairClick._raw = true to prevent domvm from doing any kind of handler wrapping, so it will receive just e like a bare event handler.

also try the option below, which will just attach the raw handler but still store the data on the vnode:

function xltPairClick(e) {
  var data = e.target._node.data;
  // data.nsp
  // data.xpt
}

xltPairClick._raw = true;

el("div", {onclick: xltPairClick, _data: {nsp:..., xpt:...}}, "moo")

if you look at what the event wrapper does [1], you'll see that it doesnt actually enclose anything except the passed args. both the vnode and the vm are retrieved dynamically via e.target._node each time the handler fires. so i don't think this would be a leak source :/

[1] https://github.com/leeoniya/domvm/blob/3.x-dev/src/view/patchEvent.js#L12-L33

@l-cornelius-dol
Copy link
Collaborator Author

So, I've narrowed it down to div elements:

image

This is on a display that started with 26, was expanded to 6508, then to about 12K, then collapsed back to 26.

It looks like all of the div elements remain reachable. But I can't see how to get a browser to show me how they are reachable. That is, via which object(s).

@leeoniya
Copy link
Member

i'd offer help, but there's too little info here :/

if you remove all ev handlers and just invoke redraw externally (or let's say on an internal timer) does the same thing happen?

@l-cornelius-dol
Copy link
Collaborator Author

i'd offer help, but there's too little info here :/

I appreciate that; at this point I am documenting the adventure for "future generations". I will pursue the avenues you've suggested after I complete the remaining functionality, so at least I can tell the bosses that I'm technically "done".

Also, it looks like it's only a problem with Firefox. I can't replicate the same fault in Chrome.

@leeoniya
Copy link
Member

leeoniya commented Jul 11, 2017

do you have views [with many divs] that are designed to unmount/remount, rather than being recycled? for that much dom to be hanging around, it has to be created in the first place, which suggests you're opting not to recycle combined with something not allowing the discarded dom to be GCd?

@l-cornelius-dol
Copy link
Collaborator Author

l-cornelius-dol commented Jul 11, 2017

The app is for facilitating application translations. So the general structure is:

Namespace, containing a set of translation pairs, where each of several namespaces can be collapsed out of the way when it's complete. Visually:

image

When the namespace is collapsed, none of it's children are rendered, only one-liner:

image

Each namespace is a div containing it's translations, each translation is a div pair of "source language text" then "target language text". Each namespace is keyed and each pair is keyed. No other elements are keyed. Since pairs may be omitted from the next view by either collapsing the namespace or applying a filter, I keyed them to allow efficient identification.

Does that answer the question?

@leeoniya
Copy link
Member

doesnt seem like anything too crazy. if you get a chance to recreate the minimal structure you use for this, it would help a lot. from your description, this UI is technically simpler than TodoMVC.

@l-cornelius-dol
Copy link
Collaborator Author

l-cornelius-dol commented Jul 12, 2017

First, a simple replication of the structure on jsfiddle shows no evidence of leaking; I'll need to continue to build it out until it matches my application more and more.

Second, I am not seeing a leak in Chrome, only in Firefox.

Third, the size/pace of the leaking is directly proportional to the DOM elements rendered. Thus, if I collapse all namespaces, or hide all completed translations, they leak is proportionally slower. So there must be some twist that retains DOM references that I am overlooking.

@leeoniya
Copy link
Member

First, a simple replication of the structure on jsfiddle show no evidence of leaking;

i really hope that's an artificial stress test and not your actual dom size, btw 😨

@l-cornelius-dol
Copy link
Collaborator Author

Oh, yeah. Typical total translations in all namespaces for a large app is under 5000. I just pulled numbers more or less out of my butt for this.

@l-cornelius-dol
Copy link
Collaborator Author

Not that we expected it to, but just FYI, 3.0.3 has not affected this problem.

@l-cornelius-dol
Copy link
Collaborator Author

l-cornelius-dol commented Jul 13, 2017

I've pretty much confirmed that the leak does not exist in Chrome -- it just has some unexpectedly relaxed GC behavior under load that resembles a leak. But, give a minute to breath and all the extra memory is reclaimed; also it doesn't climb nearly as rapidly as Firefox does under the same load.

@leeoniya
Copy link
Member

leeoniya commented Aug 4, 2017

@lawrence-dol have you had any more time to investigate this?

@l-cornelius-dol
Copy link
Collaborator Author

@leeoniya : Not at present, as I had to deliver it, even with the leak. But I will definitely return to it ASIC.

@leeoniya
Copy link
Member

leeoniya commented Aug 5, 2017

it would be good to get a minimal testcase and maybe even open a "Firefox leaks, Chrome does not" bug filed in bugzilla to get some help and maybe get a fix to Gecko in the process :)

@l-cornelius-dol
Copy link
Collaborator Author

@leeoniya : I agree and that's my next course of action.

@leeoniya
Copy link
Member

leeoniya commented Aug 5, 2017

i did some poking and got a pretty simple repro. i haven't tried it in Firefox, but Chrome definitely shows a buildup of nodes over time. the version that removes/inserts unrecycled nodes leaks, while the static dom node count w/recycling stays flat.

the first thing to try is to do some extra old vtree un-referencing here: https://github.com/leeoniya/domvm/blob/3.x-stable/src/view/dom.js#L62

i wonder if it's enough to set the el._node = vnode.el = null

var el = domvm.defineElement;

function ListView() {
  return (vm, list) =>
    el("ul", list.map(item =>
      el("li", item.val)
    ));
}

function randInt(min, max) {
  return Math.floor(Math.random()*(max-min+1)+min);
}

function randColor() {
  return '#'+Math.floor(Math.random()*16777215).toString(16);
}

function makeList(qty) {
  var items = [];
  while (qty--)
    items.push({val: randColor()});
  return items;
}

// no leaks
setTimeout(function() {
  var vm = domvm.createView(ListView, makeList(100)).mount(document.body);

  setInterval(function() {
    vm.update(makeList(100));
  }, 100);
}, 5000);

/* leaks
setTimeout(function() {
  var vm = domvm.createView(ListView, makeList(100)).mount(document.body);

  setInterval(function() {
    vm.update(makeList(randInt(0, 100)));
  }, 100);
}, 5000);
*/

@leeoniya
Copy link
Member

leeoniya commented Aug 5, 2017

none of the suspects turned out to have any effect :/

the strangest part is that the heap snapshots in chrome don't show any detached nodes, while according to the perf profile, there should be thousands: http://imgur.com/a/YATqh

i asked for some help here: https://www.reddit.com/r/javascript/comments/6rpnj9/memleak_discrepancy_between_chromes_perf_timeline/

@leeoniya
Copy link
Member

leeoniya commented Aug 5, 2017

adding node.el = el._node = node.parent = node.parent.body[node.idx] = null; into _removeChild does help the leaky case reclaiming mem from old vnodes but does nothing to help the Node accumulation. considering that old vtree reclamation is not an issue at all in the non-leaky case, i don't think this partial remedy worth keeping as it should resolve itself when we fix the main cause.

before:
leak

after:
improved-leak

another thing to note, is that if i manually trigger the GC, everything does get collected. which means it actually can be collected, but for some reason it isnt being done automatically:

http://imgur.com/a/89YRz

@leeoniya
Copy link
Member

leeoniya commented Aug 8, 2017

i believe you, but i'm flying blind. the Node leak i seem to have found isnt showing up in devtools and the leak you've found i simply can't reproduce. i'm gonna leave this until you can come up with at least a demo that can continuously accumulate mem as you've described.

@leeoniya
Copy link
Member

leeoniya commented Aug 9, 2017

i did an ivi impl and it looks like this over ~120secs:

ivi

domvm:

domvm

i'll need to start picking things apart to see where the gradual accumulation comes from. i suspect there's a sneaky closure somewhere.

@leeoniya
Copy link
Member

leeoniya commented Aug 9, 2017

@lawrence-dol can you check if the latest dev version has any effect on your case?

@l-cornelius-dol
Copy link
Collaborator Author

l-cornelius-dol commented Aug 9, 2017

In Firefox it doesn't appear any different. Leaks at a rate of about 100 M/sec. After about 2.5 G, total memory, the browser starts becoming non-responsive during which periods memory continues increasing, though at about 200 K/sec.

EDIT: Note, FF eventually recovers the memory after closing the tab; it just remains locked up for a long while. I noticed it was back to good after experimenting in Chrome.

However, in Google, the application memory is pretty stable overall (viewed in Task Manager) and the heap seems to grow , but the number of nodes and listeners grows continuously.

image

The corresponding activity is (approx seconds start):

  • 0 : Idle, all translation rows collapsed.
  • 5 : Expand all translation rows.
  • 10 : Actively cycling down translation rows with a slight pause around 22 seconds on a special row that needs me to tab before continuing.
  • 30 : Idle.
  • 35 : GC.
  • 40 : Active.
  • 47 : Idle.
  • 51 : GC.
  • 65 : Collapse translation rows and GC.

It makes me wonder if Firefox leaks heap attached to listeners, but Chrome somehow manages to clean up unreachable heap more effectively. Thus, the core problem is the listeners remaining.

I'll see if adding the listeners to my fiddle replicates this.

@leeoniya
Copy link
Member

leeoniya commented Aug 9, 2017

the core issue is the Nodes somehow remaining, which in turn leaves the listeners as well. the two trend-lines basically trace each other. we know that nodes remaining has nothing to do with the listerners because in my tests there are no listeners at all.

i'm gonna take the pico build and start stripping out any functionality not needed to by my simple test (eg. refs, attrs, hooks, etc) that still exhibits the node buildup. i've already tried a few things like an un-referencing pass over all vnode.el in the old vtree, in case it was still somehow reachable after a redraw - made no difference.

the whole situation is 10x as annoying because a manual GC invocation cleans everything up, so there shouldn't be any reachable code remaining that would justify the accumulation. and none of these phantom nodes show up in the Containment/dom areas in devtools. really frustrating.

@leeoniya
Copy link
Member

leeoniya commented Aug 9, 2017

mithril does well, too:

mithril

@l-cornelius-dol
Copy link
Collaborator Author

the whole situation is 10x as annoying because [...] there shouldn't be any reachable code

I completely agree.

But if nodes are accumulating, why isn't heap growing as well? The head GC pattern seems normal and overall size stable.

@l-cornelius-dol
Copy link
Collaborator Author

I've refactored my app code to put the listener on the group element instead of every individual element, which has slowed the leak down by a factor of 100, now leaking about 1 M/sec instead of 100 M.

The usage graph now looks like this:

image

Notice that the listeners still grow linearly during the period of actively switching from translation to translation. Notice also that the final explicit GC does not reclaim all the listeners, though it appears to reclaim all nodes and heap.

The initial spike in nodes, and then reduction, is from expanding and then collapsing all the translation items.

What happens during "activity" is that a new row of the list becomes editable, so an input field with a change listener is discarded from the previously editable item and added to the newly editable item.

@leeoniya
Copy link
Member

it appears that heap only measures js objects and dom nodes are measured separately. when i watch the task manager in Chrome during the test, the tab as a whole does climb in memory but Javascript Memory stays pretty flat.

@l-cornelius-dol
Copy link
Collaborator Author

I seem to have replicated the fault with this fiddle. I hope it's of some help.

image

@leeoniya
Copy link
Member

so far i have the pico build stripped down to just the essentials needed for the test and the the node buildup still exists. until i find why this happens and fix it, more complex cases won't shed more light on the situation. i have high confidence that when this is fixed, the leaks will go away.

thanks a lot for the repro! it'll help validate or invalidate any "fixes".

@leeoniya
Copy link
Member

leeoniya commented Aug 10, 2017

i've stripped down everything to the absolute bones needed to accommodate the contrived testcase. the entire lib + test is 245 lines [1] while still exhibiting the leak. now it's small and simple enough for additional eyeballs.

the next step will be to try replacing node.el and el._node with es5 getters/setters backed by Map or WeakMap objects and see it helps. both of these constructs are too slow for actual use in the lib, but it might help track down exactly where the issue is.

[1] https://gist.github.com/leeoniya/7622890577f1bc36eee70da368f69dd7

@leeoniya
Copy link
Member

different combos of the following had either a negative impact or no effect on both perf and mem (both Map and WeakMap flavors):

var mEls = new WeakMap();

Object.defineProperty(VNode.prototype, "el", {
  get: function() {
    return mEls.get(this);
  },
  set: function(el) {
    if (el == null)
      mEls.delete(this);
    else
      mEls.set(this, el);
  },
});

var mNodes = new WeakMap();

Object.defineProperty(Element.prototype, "_node", {
  get: function() {
    return mNodes.get(this);
  },
  set: function(node) {
    if (node == null)
      mNodes.delete(this);
    else
      mNodes.set(this, node);
  },
});

@l-cornelius-dol
Copy link
Collaborator Author

l-cornelius-dol commented Aug 10, 2017

Here is another, slightly altered fiddle which moves the listeners to the container, eliminating the listener-per-list-element, and reducing the number of listeners to a static and relatively small number.

The results still show the node growth, but prove that it's not associated with discarded listeners (I think).

image

@l-cornelius-dol
Copy link
Collaborator Author

Another potentially useful bit of information is that my other app, an 80x24 grid for terminal emulation does not exhibit any problems reclaiming nodes when running on either 3.0.6 or 3.x-dev. So this is not a blanket problem with DOMVM; there's something unique about our test case.

The test run changes just under 1000 emulation display screens, where a large percentage of cells in the grid change on each transition. Yet the number of nodes stays flat until the end, where it jumps by 44, before an explicit GC reclaims most everything.

image

@leeoniya
Copy link
Member

leeoniya commented Aug 10, 2017

the issue happens when dom nodes are created and then removed. the removed nodes (and their directly bound events) stick around without being garbage collected, despite being unreferenced.

there's no issue when dom nodes are reused & mutated in the live dom. you'll see the leak with eg keyed nodes that cannot be recycled and any dom nodes which have to be removed.

leeoniya added a commit that referenced this issue Aug 11, 2017
@leeoniya
Copy link
Member

@lawrence-dol i think this issue is now fixed.

domvm-fixed

apparently delete el._node was needed rather than el._node = null. delete is likely a stronger signal for the GC to do its thing.

can you please verify?

@l-cornelius-dol
Copy link
Collaborator Author

l-cornelius-dol commented Aug 12, 2017

Hmmm. I am still seeing slow memory growth in Firefox and Chrome, with Chrome's graph still showing nodes steadily climbing. My fiddle with listeners only on the group is using https://cdn.rawgit.com/leeoniya/domvm/3.x-dev/dist/micro/domvm.micro.js.

image

And I see the same pattern in my app, having updated my DOMVM branch to get your latest commit.

@l-cornelius-dol
Copy link
Collaborator Author

One thing I notice now in my app, is that growth in nodes exactly tracks with growth in listeners. Perhaps that suggests a different, separate bug. In my fiddle, the number of listeners stays flat (because I don't have a change listener on the input field).

image

@l-cornelius-dol
Copy link
Collaborator Author

l-cornelius-dol commented Aug 12, 2017

OK, by adding a listener to my fiddle, it now shows listeners and nodes tracking together as well. Which, I think, confirms only that elements are leaking, and cause any listeners associated with them to leak as well.

image

@leeoniya
Copy link
Member

leeoniya commented Aug 12, 2017

i'm seeing very different behavior between running the code in jsfiddle and the same code just in a test.html file. can you try running the test code outside of jsfiddle and see if you get the following:

nofiddle

the listeners do seem to be climbing slowly at a rate of about +1/toggle. As they leak, they appear to be leaking some nodes with them (or vice-versa). i'll look into this.

@leeoniya
Copy link
Member

nevermind. incognito mode shows a profile consistent with yours. non-incoginto gives me different profiles almost every other run.

i'm gonna close this thread since it's getting long. i'll open another as a followup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants