Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to not retain validator source code #293

Merged
merged 2 commits into from Sep 6, 2016

Conversation

rf
Copy link
Contributor

@rf rf commented Aug 31, 2016

What issue does this pull request resolve?
Programs with many schemas eat a lot of memory in the form of validator source strings.

What changes did you make?
Added an option to retain the sourceCode property and changed the eval to a new Function which is indirect and therefore does not retain references to the current context. This required explicitly passing in all variables that the compiled functions depend on. The option is called retainSourceCode and it defaults to true.

This has the added benefit of explicitly showing what variables are being used by the validator functions.

@MadanThangavelu @Matt-Esch

I noticed that in a program with many schemas, a large amount of memory
was taken up by the source code strings used to compile the validators.
This commit adds an option, `retainSourceCode`, default true, which
allows source code strings to be left out of the heap.
@epoberezkin
Copy link
Member

epoberezkin commented Aug 31, 2016

Thanks for the PR!
This essentially contains two [independent] changes:

  1. optionally not retaining the source code (I'd like to see that it reduces the memory usage - you can get the source code of the function only by calling .toString() method, or this property adds additional overhead? If that's the case this option can become the default in 5.0.0).
  2. different way of creating the closure (rather than scoped to the context, it is scoped to the wrapping function)

Not sure I understand why the 2nd is needed... All the shared objects that the closures are using will be shared anyway, all the closure-specific objects will be different anyway...

The inventory of these variables is useful in any case, as I keep thinking about some "easy" way to solve #156

The build fails because of ESLint

@@ -100,7 +101,7 @@ function compile(schema, root, localRefs, baseId) {

sourceCode = vars(refVal, refValCode) + vars(patterns, patternCode)
+ vars(defaults, defaultCode) + vars(customRules, customRuleCode)
+ sourceCode;
+ sourceCode + 'return validate;';

if (opts.beautify) {
/* istanbul ignore else */
Copy link
Member

@epoberezkin epoberezkin Aug 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this still affect .toString()code?

@epoberezkin
Copy link
Member

epoberezkin commented Aug 31, 2016

Also I think in case the second part is needed (new Function instead of eval) this line should be var validate = ....

@rf
Copy link
Contributor Author

rf commented Aug 31, 2016

With a direct call to eval, the validate function you get back retains a reference to the sourceCode in its closure. That looks like this:

screen shot 2016-08-31 at 4 44 22 pm

The blue edges are "context" references, which is what happens when a variable is closed over. Once you remove the explicit property, it looks like this:

screen shot 2016-08-31 at 4 46 40 pm

Only blue context edges left. Once you remove those, by using an indirect eval:

screen shot 2016-08-31 at 4 47 15 pm

The string is no longer retained.

These are heap dumps obtained with the heapdump module and viewed in Chrome's inspector, by the way.

@rf
Copy link
Contributor Author

rf commented Aug 31, 2016

The validate variable is declared because I passed it into the new Function() constructor on line 131 which makes it an argument to the function being created.

@rf
Copy link
Contributor Author

rf commented Sep 1, 2016

So it turns out that the strings will still be retained by the code object to support .toString() on the function.

However, it may make sense to keep the new Function() change as it may make it easier to minify the generated javascript, which could potentially reduce the memory usage of the source strings. I'm going to investigate that tomorrow.

@epoberezkin
Copy link
Member

epoberezkin commented Sep 1, 2016

The validate variable is declared because I passed it into the new Function() constructor on line 131 which makes it an argument to the function being created.

I see. It's a bit unusual to pass an undefined value simply to declare the local variable.

@epoberezkin
Copy link
Member

epoberezkin commented Sep 1, 2016

As far as I understand it, only the variables used inside closure are retained on the heap. So from the memory utilisation point of view it shouldn't make any difference whether you're using eval or function constructor...

Am I wrong?

It may or may not have difference on the time it takes to create this closure and also to execute it. But 1) closure creation time is not very important, 2) the difference can be negative - we will need to benchmark it to see if there is a positive difference.

@epoberezkin
Copy link
Member

epoberezkin commented Sep 1, 2016

So it turns out that the strings will still be retained by the code object to support .toString() on the function.

It's a different string though, so probably keeping it in the property as well doubles the size

Re eval vs new Function I am not sure it has any difference.

@rf
Copy link
Contributor Author

rf commented Sep 1, 2016

It is unusual to define the variable this way, but I wanted to make the change in the least intrusive way possible.

So in fact, there's one context created for the outer function compile and any functions inside it will refer to this context. So even though the generated validate function doesn't use the sourceCode variable, it will still retain it. This is precisely what Vyacheslav Egorov describes in this blog post.

I'm working on my own heap dump analysis tool; before my changes here, we see this:

 validate =  (functi(28735) type:string size:2176 deepsize:-1 ownership:
   retained as source type:internal in (code,28733)
   retained as sourceCode type:context variable in (closure,32415)
   retained as validateCode type:context variable in (closure,32415)
   retained as sourceCode type:property in (closure,32415)
   retained as (Compilation cache) type:hidden in system / Context(hidden,32517)
   retained as (Builtins) type:hidden in system / Context(hidden,32517)
   retained as (Extensions) type:hidden in (object properties)(array,32567)

There's a source edge, of type internal, from a code object. There's also a property edge from a closure, as well as context edges from that closure. So the string retained for toString() functionality is the same one that is used by eval() to compile the code.

It is most likely desirable to not create this closure at all. This indirect eval approach will likely save some amount of memory, though unfortunately it won't cause those strings to not be retained at all. After this change, I found this in the heap:

(function(refVal,co,(29693) type:string size:2304 deepsize:-1 ownership:
   retained as source type:internal in (code,29691)

This is the new Function() generated code; and there's still an internal edge from a code object. But at least no closure was created.

For information on new Function() vs eval() and why this matters, see this blog post. Essentially, any "indirect" call to eval() will cause the code to be executed in the global scope instead of in the current scope. This means you can't access any local variables (which is why I had to pass them in) and it also won't close over any variables in the current scope.

I think it's beneficial to keep this change; it will create one less closure, and it also makes it easier to potentially minify the generated JS to save memory or create portable validation functions as was requested in #156.

@epoberezkin
Copy link
Member

epoberezkin commented Sep 2, 2016

Thank you, it's very interesting.
I am going to check that capturing the whole context with eval usage is still the case in node and modern browsers (Egorov's post is 4yo). That's a bit surprising - I thought v8 would be a bit more clever and only capture what's used :)

@epoberezkin
Copy link
Member

epoberezkin commented Sep 2, 2016

It does indeed capture everything with eval :)

Could you please:

Then it's good to go.
Thank you again!

@epoberezkin
Copy link
Member

epoberezkin commented Sep 6, 2016

going to merge/complete...

@epoberezkin epoberezkin merged commit 792590b into ajv-validator:master Sep 6, 2016
epoberezkin added a commit that referenced this pull request Sep 6, 2016
epoberezkin added a commit that referenced this pull request Sep 6, 2016
epoberezkin added a commit that referenced this pull request Sep 6, 2016
@rf
Copy link
Contributor Author

rf commented Sep 6, 2016

Oh hey, I actually just finished addressing your comments, haha. But if you've done it already that's fine.

@epoberezkin
Copy link
Member

epoberezkin commented Sep 6, 2016

in 4.7.0 :)

@epoberezkin
Copy link
Member

epoberezkin commented Sep 6, 2016

Thank you!

@epoberezkin
Copy link
Member

epoberezkin commented Sep 18, 2016

@rf
Copy link
Contributor Author

rf commented Sep 18, 2016

Nice article! Thanks for the shout out! 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants