-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy validation of functions #1464
Comments
That said, I know that some of us were actually hoping that we could remove this eventually, since it is gross, and was mainly introduced to address a single engine that now is no longer maintained. I would be rather sad if V8 were to start picking it up. I understand the advantages of lazy compilation, but does it need lazy validation? |
+1 to @rossberg's point |
Would not this actually be useful for large libraries? It's not hard to imagine how this would lead to performance improvements. |
I would be concerned that if lazy validation is started to be widely used in engines, we may eventually find all engines are required to implement it to remain web-compatible. A concrete issue would be if a widely used module has a function with a validation error in it, but the function is never called in practice. On an engine with lazy validation, the module would run fine. On an engine with eager validation, the module would fail to compile and the web page would be broken. In order to resolve this, the second engine would need to implement lazy validation even if it doesn't make sense in their compilation model. |
There are other concerns to about validation of the wasm if cached. Perhaps this isn't relevant but a strict validation of the wasm code is going to require more than a hash validation. Are these two different validations? Could be but probably shouldn't be. |
There is only one Wasm validation, but the question is whether or not it is OK to delay validating a function if it is not currently getting called. On a side note, the engine that introduced this is, strictly speaking, maintained, but it does not meet our requirements for a "web runtime" as it is not part of a browser.
This can be a way to support standard instructions and instructions from a proposal in a single module, though the check "is the proposal supported" would have to be externalized. However, you are right that it would create a situation when this module is both broken and correct at the same time. @gahaas do you have any data that you can share about this? |
@penzn here is some data I measured: I measured with validation times of an about 40MB big wasm module with a bit more than 100.000 functions, on a 4 core machine (I configured my work station to only use 4 cores). When validation is done in a separate step, then it takes about 125ms. When done as part of compilation, it only adds 22ms overhead to compilation without validation (with validation: 758, without validation: 736). With lazy compilation, validation has to be done separately, so validation causes around 100ms overhead. Note that the actual overhead is higher, especially during startup, because most functions don’t get executed during startup and would therefore not have to be validated. For the module I measured, only about 20% of the code gets executed during startup. On weaker devices we see validation times of more than 1 second for the same module. We see in big applications like Photoshop that especially during startup the CPU is a bottleneck. Therefore we try to reduce the use of the CPU for code compilation, validation, and optimization. The advantage of lazy compilation and validation is not just that we do not only postpone compilation and validation, additionally most of the compilation does not even happen because many function never get executed in the first place. |
I may be misunderstanding, what does this number mean? How are you compiling without validating?
So am I reading this right that in lazy compilation mode for V8 on this module, skipping validation for functions that are not called during startup reduces startup time by 100ms on your machine? |
For the experiment I compiled my own version of V8 where no function validation is happening. I wanted to measure how much overhead is introduced when validation is happening during compilation. So I compiled two versions of V8, one where validation is happening during compilation, and one where validation was not done at all. The performance difference between the two configurations was 22ms.
I guess I mixed up some thoughts in this paragraph. As I wrote above, if validation is done as part of compilation, it only takes 22ms in my benchmark. If validation is done in a separate step, then validation takes 125ms. With eager validation and lazy compilation, validation has to happen in a separate step, so lazy compilation has to spent 125ms during module initialization on validation. Later during module execution when functions get compiled lazily, function compilation can be 3% faster because no validation is needed anymore. However, even if all functions of a module were executed, this 3% speedup would only result in savings of 22ms. So all together eager validation adds a performance penalty to lazy compilation of 125ms while only providing the potential to save 22ms later. |
@rossberg @lukewagner Can you say more than "it's gross" about your concerns with lazy validation? If it was acceptable at Wasm's inception it's not clear to me why it would be less acceptable today. |
Lazy validation was a compromise to get 4 browsers to agree to ship a 1.0 release; Chakra sortof forced the issue. It seemed fine then because, if only 1 of 4 engines was doing the lazy thing, we wouldn't end up in the situation that @eqrion describes above. If we did end up in that state, we'd probably need to specify deterministically-lazy validation semantics. Implementation-wise, I don't think this would be much of a problem (since an AOT compiler can just compile an invalid function body to |
For all practical purposes, allowing lazy validation means multiple runtime behaviours: programs that succeed in one place may fail in another. It's essentially non-deterministic. As @lukewagner says, we could require lazy validation, but that again seems undesirable for many engines and use cases. @gahaas, if you say it costs 100ms, how much is that relative to the overall initialisation time? Also, has the implementation of validation been optimised for that purpose? |
@rossberg With lazy compilation, validation accounts for more than 60% of the time spent by I read again through #719, and I think the assumptions made there to reject lazy validation turned out to be wrong:
On the contrary, Chakra's arguments for lazy validation turned out to be correct. Fast startup turned out to be more important than reaching peak performance fast. Also, most functions don't get executed during startup, or even not at all, so validating and compiling them turned out as a waste of resources. That's why V8 is switching now to lazy compilation. |
It does feel a bit unfortunate to back off what was initially one of the original wins of wasm in the browser which was this smooth startup based on streaming parallel AOT compilation. I fully believe that for many/most of the practical use cases you're looking at, lazy compilation is a net win. But it's sad that certain workloads would permanently lose the ability to AOT in the cases that it would've been beneficial. I wonder if specifying deterministic lazy validation could complement another old idea that we used to discuss about optimizing load time: allowing producer toolchains to indicate which functions to compile eagerly vs. lazily. As a custom section of hints, this quickly opens up pandora's box and so we haven't. But if we're talking about this semantically-visible lazy validation, then if lazy validation was specifiable per-function, then perhaps that could maintain the good parts of the predictable cost model, by allowing engines to meaningfully say when they really do want eager treatment. I'm not sure this is a good idea, but it does seem related to the basic eager-vs-lazy discussion so I wanted to see if seemed beneficial to the folks actually measuring this now. |
@camio, remembered we have this discussion after the talk you gave at CG meeting. Large applications are one of the cases where this can make a difference. For an "palette" type of GUI app we are not really expecting every tool in it to be used every single time user opens the app, yet every every instruction in every tool has to pass Wasm validation, even if it is not going to be called. @gahaas has some synthetic benchmark data above, you can probably compare that to function and instruction count estimates for the apps you are dealing with. Another place where that might have impact is code that distinguishes between x86 and Arm - every function optimized via this techniques would have two versions, one is bound to never run, but they both have to be validated. For a large kernel library this might be measurable, though I don't think we have data on that. |
Already in 2016 an agreement was made (see #719 (comment)) to allow lazy validation of functions. This means that if a WebAssembly module contains an invalid function, module compilation would still be allowed to succeed, but executing the invalid function would trap.
My question now is, where is lazy validation of functions mentioned in the spec? I cannot find it anywhere.
If lazy validation was not added back then by accident, should we add it to the spec now? In V8 we did performance measurements with lazy validation, and especially in combination with lazy compilation lazy validation shows interesting performance gains.
The text was updated successfully, but these errors were encountered: