During code generation, the supplied schema is treated as untrusted input (but in JSON form).
While DoS safety of an untrusted schema can't be guaranteed even with complexity checks, both the process of code generation and the generated validator/parser code are supposed to be safe against arbitrary code execution, even when both schema and the data being validated were received as untrusted JSON input.
It should be noted that supplying untrusted schemas is not recommended.
Even though they should not cause arbitrary code execution, they can cause DoS. Also, security issues can happen on any point of the path, and the best way to avoid the problem is to not use untrusted input in the process of code generation, if possible.
To protect code generation against code injection from schemas, the following approach is taken:
-
All code generated from the schema is going through the
format(template, ...args)
function, which embeds supplied arguments into a template string.It typechecks the supplied arguments, and everything that is not code must be wrapped in JSON.
-
First argument of the
format()
function (the template), is trusted and must always be a literal string or come from a wrapper where it is a literal string. That way, it can't come from a runtime variable and can't be affected by untrusted input directly.Custom ESLint rules additionally check that in the code so that it won't be accidentally broken.
-
Trusted code could be produced only via a limited number of ways:
-
Output emitted by
format()
is treated as code. -
Safe generated variable ids are also treated as code.
-
Logical and (
&&
) and or (||
) operations on a variable-length list of code arguments is also treated as code. That is used in complex block conditions. -
Certain explicitly listed constant values can be inserted in
format()
template, but only at designated places.
Everything else is rejected by
format()
when trying to be concatenated into the template without proper escaping. -
-
No string transformations are performed on the generated code, as any non-context aware transformations of generated code are unsafe.
See below for more information.As AST transformation would be too complex here, all optimizations eliminating empty blocks, for example, have to come before the code is prepared.
The approach used here wraps block body generators so that when generating the body of the block does not emit any code, the whole block is excluded.
Source code for that could be seen in safe-format.js.
Separate logic is used for function stringification (e.g. formats), but those do not come from the schema and are assumed to be trusted input (and are typechecked to be functions).
Templates for format()
should be treated with care.
E.g. format('%j', arg)
(where %j
is JSON-escaped variable) is an arbitrary code execution
vulnerability if arg
is untrusted, as the contents of arg
can close the single quotes and
escape from the string.
Variables should be inserted only in those contexts where inserting any JSON-encoded value is safe against code execution.
Symbols \u2028
and \u2029
should be escaped when embedding JSON-wrapped objects into code.
The behavior changed only in ES2019, so it's best to not rely on that.
See https://v8.dev/features/subsume-json#security for more details.
format()
function handles that.
Even with the \u2028
and \u2029
difference resolved in newer ECMA Script specification versions
and by post-processing, there is one more parsing difference between JSON and JS contexts which has
to be accounted for before including JSON-stringified variables into JS context.
{"__proto__": ...
parses differently due to JS having special-handling for it which JSON ignores:
- ECMA 262, 24.5.1
JSON.parse ( text [ , reviver ] )
- ECMA 262, B.3.1
__proto__
Property Names in Object Initializers
To account for that, all occurances of {"__proto__":
should be replaced with {["__proto__"]:
and all occurances of ,"__proto__":
— with ,["__proto__"]:
, after each JSON.stringify
call.
The replacement above works given that JSON.stringify
is used without the space
formatting
option. Full regex pattern for properties that need replacement is /[^\\]"__proto__":/g
.
format()
function handles that.
Using /regexp/
form, produced by converting a RegExp
object to a string, is not safe.
Consider this on Node.js 10 and below: console.log(String(new RegExp("\n")))
.
That behaviour differs between platforms and versions.
If the pattern is arbitrary input from the schema and the RegExp object is stringified via converting it
to a string (e.g.
, ${regexp}
regexp + ''
, regexp.toString()
or String(regexp)
,
it can break the generated code in certain cases.
Instead, convert those to code in the form of new RegExp(pattern, flags)
(where pattern and flags
should be properly escaped, as with other string variables).
Stringification of RegExp
objects is supported in the format()
function.
That includes any type of replacements performed on the code after its generation.
Even a seemingly harmless replacement of else {}
to an empty string can be abused in some
situations because that string could be met in a non-code context, e.g. inside an object property
name.
That also includes altering newlines in the code.
The only safe way to transform generated code is by parsing it to AST, transforming the AST and re-generating the code from AST.
Trusted code concatenation is fine without being context aware, though.