Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-use callback functions (filters, for example) can leak memory in a Membrane #141

Closed
ajvincent opened this Issue Dec 7, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@ajvincent
Copy link
Owner

ajvincent commented Dec 7, 2017

Example:
https://developer.mozilla.org/en-US/docs/Web/API/document/createTreeWalker

let walker = document.createTreeWalker(root, NodeFilter.SHOW_ELEMENT, function(element) {
  // this is the dangerous part:  every node which is an element gets a proxy created...
  let classList = element.className.split(/\s+/g);
  return classList.includes("foo") ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_SKIP;
  // ... which the filter may store a reference to the element outside this function.
}, true);

while (walker.nextNode()) {
  // do something
}

Here, the intent of the API is that the filter is a "look once if you have to" pure function. The problem is, it creates a proxy object for each element that goes into the filter, which lives long beyond the filter invocation. In other words, the proxy effectively is a memory leak, until the element is no longer attached to the DOM - and then it's up to the garbage collector to find that and realize that.

There are no light-weight solutions I can see for this. A few solutions I have considered are:

  • Invoke the filter in a disposable object graph, which is revoked when the user finishes. First, this potentially violates the identity property of the membrane. Second, nothing prohibits the filter function from storing a reference to the proxy elsewhere. Third, if the user never explicitly releases the filter function (or the walker it's associated with, in this example), then the memory leak at least doubles: two object graphs instead of one, where the second may never get deleted.

  • Require the filter be wholly constructed on the same object graph as the walker. In other words, don't allow the end-user to pass in an untrusted callback function. This would work, but the code gets ugly fast. The user basically has to implement the filter in a foreign language. In pseudocode, it might look like this:

const filter = originGraphObject.createObjectFilter();
{
  let element = filter.addArgument(); // returns proxy
  let className = element.className; // a lambda getter
  let classList = className.split(/\s+/g); // another lambda getter
  let ifCondition = filter.addCondition(classList.includes("foo"));
  filter.addReturn(ifCondition.true, NodeFilter.FILTER_ACCEPT);
  filter.addReturn(true, NodeFilter.FILTER_SKIP);
}

let walker = document.createTreeWalker(root, NodeFilter.SHOW_ELEMENT, filter, true);
while (walker.nextNode()) {
  // do something
}
  • Verify the filter is a pure function. This would involve something like Esprima. If the filter is a pure function, then we can safely say the arguments need no wrapping.

Verifying purity might be really hard! In the example that started this issue, we refer to NodeFilter.FILTER_ACCEPT and NodeFilter.FILTER_SKIP. These are just numbers (1 and 3, respectively), but the pureness validation code must be aware of that.

The TreeWalker API does allow passing in an object with an acceptNode method, but it's a little unclear whether referencing "this" in a method violates purity in JavaScript. That becomes a problem because the filter object then could be referenced outside the TreeWalker... and that makes it unsafe for conversion.

  • A combination of the previous two techniques:
const filter = document.createNodeFilter();
filter.acceptNode = function(element) {
  if (element.parentNode == filter.lastParent)
    return NodeFilter.FILTER_REJECT;
  filter.lastParent = element;
  return NodeFilter.FILTER_ACCEPT;
};

let walker = document.createTreeWalker(root, NodeFilter.SHOW_ELEMENT, filter, true);
while (walker.nextNode()) {
  // do something
}

Technically, the acceptNode method of the filter isn't "pure", because it refers to the filter object... but the filter comes from the same object graph as the element, so it should be safe. (See #65 for a similar idea.) Specifically, the filter here is a proxy to an unwrapped filter object, whose API the origin graph can safely define. Even if the origin graph provides a vanilla object ({}), the membrane takes care of all object wrapping and unwrapping, so again, it's safe.

If the user sets an unsafe property on the filter object, and the filter's acceptNode method relies on that property, then we're incurring the creation of additional proxies, coming back to the same problem. If the user sets a callback function that the acceptNode method calls, and the callback isn't pure, that brings back the original problem: forcing the membrane to create proxies that might never be used again.

@ajvincent ajvincent added the advisory label Dec 7, 2017

@erights

This comment has been minimized.

Copy link

erights commented Dec 11, 2017

I don't understand the starting problem and scenario. Can you sketch a diagram showing the reference graph? Tom's graphical notation would be good here. Thanks.

@ajvincent

This comment has been minimized.

Copy link
Owner Author

ajvincent commented Mar 17, 2018

My apologies - I've been too busy to do that diagram. Let me try again, in very short English:

The third argument to createTreeWalker is a callback function. Every time a DOM node meets the other criteria of the tree walker, a proxy for that node must be passed to the callback. Often that means the proxy must be created and maintained within memory until the DOM is dismissed. If the proxy is never referenced again by any code, that's effectively a memory leak.

The proxy can't be GC'd because:

  1. Other nodes in the DOM tree hold a reference to the underlying node
  2. The node holds a reference (by being a key in a WeakMap) to a ProxyMapping object
  3. The ProxyMapping object holds a reference (by object graph name) to the proxy.

Now, if the callback function could be verified as safe to run directly (that is, not wrapped) on the same object graph as the original DOM tree, then we wouldn't need to create all these proxies. Proxies, at least in my membrane's model, are created mostly as needed, so the less proxies created, the better. That's what this is all about.

@ajvincent

This comment has been minimized.

Copy link
Owner Author

ajvincent commented Mar 17, 2018

I may be completely wrong about this logic, which is based on the following:

var a = new WeakMap();
var b = {};
var c = {};
a.set(b, c);
c = null;
// assume gc
a.get(b) // returns an object 

Why a.get(b) returns an object - and this is my assumption - is because:

  • there is a strong reference to the WeakMap (a)
  • there is a strong reference to the map's key (b)

If one of these two conditions for some reason becomes false, I believe the object which was earlier returned may be garbage-collected, if no other strong references to that object exist.

@ajvincent

This comment has been minimized.

Copy link
Owner Author

ajvincent commented Mar 25, 2018

I've thought about this some more, and I've decided: this is a "first world problem", not a genuine one. If we create additional singleton proxies as a result of walking a tree, so be it.

@ajvincent ajvincent closed this Mar 25, 2018

@ajvincent ajvincent removed the advisory label Mar 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.