-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternatives to i31ref wrt compiling parametric polymorphism on uniformly-represented values (OCaml) #100
Comments
Although you've ruled it out, I would actually suggest (4) for OCaml, albeit with some optimization. First, the optimization I have in mind is to have OCaml function definitions operating on There are a few reasons I suggest this:
So in short, I suspect the short-term costs of this approach are more acceptable than you might expect (with some local/composable optimization techniques), and I suspect it is more likely to adapt well to integrate the improvements that will be made to WebAssembly over time. There's also a meta-point to consider: having WebAssembly modules compiled from OCaml in this manner will give the CG realistic programs to experiment with and analyze, which can be used to determine if it's worth the effort (at all levels of the process) to give modules more control over their low-level representations. Right now we can only hypothesize. That all said, you know your specific needs much better than I, and even if this approach might work well for the OCaml community broadly, it still might not be well-suited to more specialized concerns you might have. |
Fair enough, (4) is close enough to (1) to look like the next best solution in terms of effort needed and we could use |
I'm wondering if I can do this by first compiling to "WASM GC MVP with i31ref" and then translating that code to "WASM GC MVP without i31ref". At least, this way, I will not have to go all the way back to unbox all these integers again when a feature that is equivalent to 31ref arrives. Has anyone done that? |
Even with The more difficult thing to change would be using 31-bit vs. 32-bit arithmetic. It sounds like putting in the effort for 31-bit arithmetic is only worth it if |
That would be the smallest problem. More importantly, as you note in your edit, static specialisation kills any form of first-class polymorphism, of which OCaml has plenty (polymorphic methods and record fields, first-class modules, GADTs, ...). So I think this approach is simply impossible to use. The same is true for any other language with a rich enough type system.
That is not a scalable approach. In languages like OCaml, you have polymorphic type inference, and by design that results in way more polymorphism than in traditional languages. You also use lots of abstract types. In particular, many functions are polymorphic in multiple independent types, and refer to multiple abstract types. To be efficient, the approach you suggest would require an exponential number of specialisations to be generated upfront for each such function. There also is the practical problem of porting. If the GC proposal does not support established implementation techniques, but instead requires a major rewrite of a pre-existing compiler and runtime, then it will not be a plausible target for many existing languages. |
The advice I gave here is specific to OCaml. There is only one type we were talking about doing this for: |
There has been a good amount of research work on type-directed or shape-directed unboxing, supported by code specialization. For a recent example, see the work on call-graph-based specialization of (boxed) polymorphic functions as part of the 2017 thesis of Dmitry Petrashko on Scala. But these work typically require a fair amount of sophistication. In contrast, having an efficient way to "box" immediate scalars without going through the heap is easy for backends to allow, and fairly easy to compile into for types that fit the smaller width. The two approaches are complementary (clever specialization optimizations has other benefits), and yes, boxing everything is of course possible if the low-level substrate does not provide better, but getting reasonable performances requires much more sophistication on the user side. Many of the successful language implementations work by keeping a tight check on their complexity budget: they go for the 20% of sophistication that brings 80% of the benefits in as many areas as possible. OCaml was faster than most ML, or Chicken Scheme than most Schemes, not through extremely optimizations, but rather thanks a few very-important optimizations (static function calls) and good data-representation choices that ensured that typical programs were fast by default. It is also possible to build fast systems that are slow by default but fast through excellent, sophisticated engineering (good JVMs or CLR implementations, GHC, Graal+Truffle, etc.), but I agree with @rossberg that it is important to ensure that the 20%-80% approaches are available first. |
The issue is "alternatives to |
Consider a function like this one from the Map interface:
This is polymorphic in two type variables and two abstract types. Each of them could either be an integer or something else. So even with just int, that makes 2^4 = 16 different ways in which you'd have to specialise the code according to your suggestion (8 if the compilation scheme allows to fix the export type Or you reduce the number of specialisations, lump several dimensions together, and pay the extra cost of frequent boxing/unboxing conversions for the rest. Many variations of this have been investigated in highly polymorphic languages, and there are reasons few are used in practice. Leroy (among others) actually had a line of papers researching unboxing for polymorphic languages, yet OCaml, like many similar languages, uses a uniform representation instead. He observed that the conversion overhead often is higher than the benefit of unboxing, and that a minimal unboxing scheme produces better results for typical profiles. I'm not intimately familiar with how OCaml's native code compiler handles currying, but the byte code compiler handles it through clever dynamic stack techniques that are unrelated to compile-time unboxing. |
No, they're not, and it's getting a bit boring at this point that I have to keep refuting that. So again: plain ints are just one particular use case. The general purpose is unboxing small scalars of any sort, for which there are many, many use cases, in many, many languages. (And of course, 30 bit ints would fit just fine as well.) |
As a bystander who follows these threads I wanted to share some feedback that this aggressiveness is quite off-putting. |
Unfortunately, WebAssembly does not support these techniques. So I believe coercions will be necessary instead. @sabine, have you considered this problem with currying? To clarify, I was not suggesting specializing polymorphic functions based on their type arguments. I was suggesting coercing monomorphic functions that work specifically on integers to polymorphic functions operating on the uniform representation (when used polymorphically). This could be done with an augmentation of the coercion system I was suggesting above for dealing with currying. |
Oh, okay, but that's probably even worse. You would have to box/unbox in all first-order contexts where you pass a value from polymorphic to monomorphic functions and vice versa, which is like all the time. And you'd still have the same problem: for a function of type int->int->int, there are 8 different possibilities of a polymorphic context to which it could be passed in a higher-order fashion. All would have a different calling convention under the approach you suggest. So either your function has 8 different specialisations, or you're creating wrapper functions at each polymorphic use site. |
@sabine thanks for summarizing things so clearly. First, until we invent JIT and other features, the current Wasm, even when including the current GC proposal, is an entirely static, dare I say "closed world" system, with AOT compilation to a single There seems to be a lot of default fear of (5), that it going to result in unreasonable code bloat. If you generate specializations under a closed world assumption, you only generate the ones that are actually used. This causes more in-lining of single-use functions (not further increasing code size) followed by much more aggressive optimisation of inlined code working directly on specific naked (scalar) types, often shrinking the code back down significantly. It turns indirect calls into static calls, helping greatly with dead-code elimination. LLVM and binaryen can do endlessly more on code like this than code that stays generic. In my personal experience working with (5) a lot, the amount of code bloat can be remarkably small, and the resulting speedups due to code "collapsing" impressive. But @rossberg is correct that anyone that prefers (5) can make this happen, regardless of the presence of I'd like to see an example of polymorphism that is impossible to compile down to specialized code using (5), and requires a struct of all There will be people who have code size as their #1 concern, and don't believe my story above that it can actually help reduce code size, when done right. That's fine, because it is hard to make claims about this in the absence of specific languages with specific compilers and specific applications. Finally, are there downsides to having |
@aardappel, repeating what I pointed out above, and I believe many times before: static type specialisation does not work in a language with first-class polymorphism or polymorphic recursion. Simplest possible example in OCaml:
There is no way you can statically specialise anything here, since you don't know what polymorphic definition you are calling. It's completely dynamic. A client of The same is true for generic methods in a language like C#, which is why the CIL performs jit-time type specialisation. And it is the reason why C++ does not allow "template virtual methods", because its simplistic compilation model cannot support them. Static specialisation also does not work in a language with polymorphic recursion, because the set of instantiations can be statically unbounded. If I could make a wish, then that folks in these discussions could accept the fact that there are very good reasons why certain classes of languages are implemented in certain ways, that there has been decades of engineering and research into it, that we are not smarter than all the many folks who did this, and that we need to support a mapping for those compilation techniques instead of hoping for some yet undiscovered trick to make these requirements go away. Please? |
I'm confused, doesn't wasm allow producing modules and linking them together? I would expect compilers to use those to produce modules at the natural separate-compilation granularity of their source language. In particular, I would expect to compile each module/crate/package separately, without making assumptions on how other modules are going to use it. Of course you can still generate WASM code only after a pass of link-time-optimization, but I would naively expect that it can lead to code-distribution issues (if the generated WASM for my library depends on the clients it's compiled with, code caching etc. is going to be more delicate to handle).
It's easy to have examples where specialization leads to impressive blowups in code size. I mentioned the Scala specialization work by Dmitry Petrashko earlier; it starts by evaluating previous 2014/2015 work on call-graph-based type specialization in Scala on the codebase of the Dotty compiler, and found out that on average each method would be specialized 22 times. Then there is the example of polymorphic recursion, where the set of different type instantiations depends on runtime data (it particular it may be arbitrarily large). (In most cases I'm familiar with, the number of different "shapes" may be bounded, especially if you only distinguish immediates vs. pointers.) But then what about union types? Many language runtimes use representations that are either an immediate word or a pointer, with tagging to distinguish the two. This is ubiquitous for example in implementations of ML, Haskell, Scheme and what not. You can very easily have tuples or arrays of such immediate-or-pointers values, where the immediate-or-pointerness of any element is only known at runtime and can change on mutation. How would one operate on this data if functions are expected to know statically whether they take an immediate or a pointer value?
I'm not very familiar with Wasm (I was drawn into this discussion by chance), but I believe that:
If I understand correctly, Wasm implementations try to be safe from undefined behavior, so in particular they are careful to check that pointers go into allowed memory before dereferencing them. This monitoring discipline, and in general wasm sandboxing/CFI features, are going to completely drown out the cost of checking bitpatterns on a word (even if you decided to do checks or masking on each dereference). |
@rossberg your example is essentially a function pointer to a generic function if I'm not mistaken, which is indeed not possible in languages that rely entirely on monomorphic specialization. I'd expect a compiler to choose to force boxing on args for such functions (causing an extra heap alloc for I don't see why generic functions that are not used in this very dynamic fashion (which I would certainly hope are the vast majority) should suffer from the mere presence of this possibly very dynamic use, and certainly not why Wasm as an eco system should pay for this cost (if the mere presence of I am well aware that there are endless people who spend longer time looking at this than me, but just an "appeal to authority" is not going to be sufficient to stop me from at least questioning this direction. |
The Glasgow Haskell Compiler, which presumably is considered a production compiler within this class of languages, does not unbox |
Yes, but the types at the edges are currently very basic types that may well not make it possible to express the full set of type feature of a language like ML, meaning separate compilation would only be "safe" if compiled from one version of the program, essentially not making it separate compilation anymore. This may change in the future but is TBD. Also, current linking of multiple Wasm modules is runtime-only, though static linking is planned.
Average? Each method in the entire compiler? Meaning many methods would be specialized hundreds of times? (to account for the methods only used once). I find that hard to understand how that is possible. Does this account for optimization and DCE of the now much more static blown up code? Would be more useful to know how much a codebase would blow up in total, for example the ratio of AST nodes of a large program before and after specialization. In my tests for example, in cases with heavy nesting of higher order functions, that ratio was at most 2x (after optimization). All very anecdotal of course.
An (sandboxing is only done for linear memory pointers, but even there it has no cost typically due to use of memory mapping features) |
You don't know at its definition site which polymorphic function ends up being used that way, and with what degree of polymorphism. The composition can happen somewhere else entirely, e.g. in a different module. And it can be partially instantiated, i.e., you plug in a polymorphic function somewhere where a less polymorphic (but not monomorphic) type is expected. There are many degrees of freedom, and you generally need a compilation scheme that is both efficient and compositional for all of them. In the limit, the possibilities for polymorphic composition are almost similar to those in a dynamic language, which is why they make similar representation choices. (Though an important difference is that there usually is no observable type dispatch, because polymorphism is fully parametric.) As I have stated many times before, whether it's i31 or i30 is largely immaterial. This is an internal representation type. For the vast majority of use cases its max size doesn't matter. However, we can't easily make extra pointer bits available to user space across engines anyway, so AFAICS, we can just as well provide 31 bit range for tagged ints. But if there are reasons to pick another width, then that's totally fine as well. I just haven't heard them yet. There are many languages that use (wordsize-1) bit integers in their implementations one way or the other. Whether a language implementation exposes that to users is a completely separate question. Some do (not just OCaml), some also expose values like 30, 29, or 28. But where they do, the language definition typically does not guarantee an actual size. For example, OCaml explicitly states that
Huh? I'd fully expect that a language with a proper module system can (and should!) compile its modules to Wasm modules separately, and without any loss of generality. Wasm's module system allows you to import/export anything, so such a compilation scheme should be perfectly possible.
You can't deref an anyref. You first need to cast it to something concrete. That necessarily involves a check, with or without i31ref. |
That is indeed the one place where adding i31ref has a non-zero cost: when |
I don't think anyone contributing here wants to come off as aggressive (I certainly don't want to 😄). This I'm trying to summarize and sometimes comment on some of the points brought up. If you feel that I missed your point or something needs clarification, do speak up. Concerning the topic of separate compilation of modules: So far, we are not aware of any problems with separate compilation of modules for OCaml, in the presence of To get back to the topic of alternatives to Traditional hardware architectures have a memory model that lets us fulfill these requirements by pointer tagging integers at the cost of losing one bit on the integer representation - but there may be other reasonable alternatives that I don't see right now, that work for us, and that other languages could make better use of than I believe that these two requirements are all that we need in order to compile all the crazy polymorphism in OCaml. I think @rossberg is right that, OCaml, in the long term, can work with 30-bit integers or 31-bit integers, or even, crazy as it may seem, 29-bit integers. Requirement (2) can possibly be lifted after a major refactor of the OCaml compiler (which is not 100% guaranteed to succeed, and that we don't have resources for right now, but at least there are people optimistic that it could work). Having both requirements (1) and (2), it looks to me like I need to double-box integers in a MVP without I'm very interested in other languages who would use However, I expect that as soon as |
Thanks for the great post, @sabine!
I was just about to ask you this question, so thanks for beating me to it! This is extremely useful, as it's the kind of perspective we can only get from language implementers. Can I ask you for some more of your perspective along this line? Something I've been wondering is that, if we do change the number of bits for guaranteed-unboxed integers, what's a non-arbitrary number to change it too? The number that comes to mind is 24 bits because that's 3 bytes (so just enough for RGB and a little more than enough for some Unicode encodings). So I'm interested to hear specifically your thoughts on whether that would work for OCaml? That is, do you know of any OCaml applications or implementation strategies that would suggest a useful lower-bound on the number of bits needed for unboxed integers? |
JS implementations faced a similar dilemma with the need to optimize simple integers. The dominant technique there seems to be nan-boxing; which in the language of i31ref would be equivalent to i52ref. |
It is my assumption that if you compile down to a single
I was thinking of engine code, like the GC traversing objects. And checks can be more or less expensive.
That bit check may involve a missed branch, but yes, for code that doesn't use I'll take your word for it for now that cost will be negligible, though I hope at some point we'll have some numbers, particularly a real world GC benchmark (not containing any |
It's not just about how many bits we can fit in a pointer, it is also about whether we want to force the need to check these bits upon languages that don't need it (or languages that may need it, for code where it can be statically known that it's not needed). 64-bit wasm is mostly about 64-bit operands to linear memory load/stores, and thus larger linear memories, not much else. The size of an Conversely, does the GC proposal put some expected upper bound on the amount of GC objects that can be addressed? If this can be >2GB, then likely it must use 64-bit pointers internally, and we might as well use Actually, even an |
One could make the same argument about separate compilation when linking is done at the assembly level, or LLVM level, or JVM bytecode level, yet those are things that implementations do in practice, because whole-program compilation is extremely inconvenient in various ways. When you generate code for a library into a module, you don't know what client modules you will be linked against, so it is hard to tell what specialized instances you need to generate. It is possible of course to design whole-program three-shakers to reduce code size by propagating whole-programmation about usage, and a common practice in some communities, but that does not remove the importance in practice of also having an open-world compilation model.
Haskell is in an exceptional situation due to the pervasiveness of laziness: many values are thunks, and GHC's evaluation model performs an indirect jump on inspection, even to find out that a value is already evaluated. The benefit of pointer tagging is not to avoid a dereference (which you need to do in most cases to get the constructor arguments anyway), but to avoid this indirect jump. The way they avoid the cost of excessive boxing is through very aggressive inlining and optimizations in general; again the "sufficiently smart compiler" approach, which I don't think should be held as a goal for language implementors. Note that there is a lot more Haskell code written in the strict subset these days, so it is not completely clear to me that this pointer-tagging choice is still the right design. At the time it was introduced, pointer tagging provided a 5% performance improvement over a strategy without pointer tagging. At the risk of going completely into guesswork: this suggests that it is reasonable to consider implementing a Haskell runtime without pointer tagging, and get realistic performance. (You may even see interesting gains by using tagged integers, which may offset the absence of tagging.) On the other hand, when implementing a strict language, pointer tagging does not help, and not having tagged immediate values makes you fall off a performance cliff.
I am not aware of a performance comparison between pointer tagging and immediate-integers tagging for Haskell programs.
Chicken Scheme uses more than one bit of tag for value shapes, but only immediate integers use the least bit set to 1. In this model you can have I believe that in practice, if you used much less than 31 bits for immediate values, people would not use this for an integer type (24 is too small for your main type of machine-length integers), only for other immediate values. This might be a workable compromise (in particular, maybe people want 60-plus integers nowadays anyway), but it is not completely clear what the benefits are.
Ahead-of-time full-program monomorphization is known to be impossible for languages that support polymorphic recursion, including OCaml and Haskell (but not SML, see MLton), Agda, Idris, etc. If you have a JIT you can monomorphize "on the fly", this is what the dotnet runtime does. This point was made in this earlier comment of @rossberg: #53 (comment) |
@RossTate here is another argument that you may find interesting, in terms of finding a "principled" argument for choosing one size or the other. We are talking about splits in the data space of values that can fit one word. You may want to have tag bits/patterns for either pointers (GHC pointer tagging, or other tricks used by other implementations) or scalars (non-pointers). For example, the Chez Scheme runtime has a bitpattern for fixed-sized numbers but also for booleans, pairs (the "cons tag" mentioned in the discussion), symbols, vectors, etc. The GC only needs to know the fine-grained structure of the pointers, so from the GC perspective we may need many categories of pointers, but only one category of values (which language implementations can the subdivide with more tagging). Given that both sides may need tag space depending on the language, it makes sense to divide the space evenly between pointers and non-pointers: this is exactly the |
I don't think that a non-arbitrary number of bits per se exists. In an ideal world, we would have garbage collection in hardware, and we wouldn't need to chop off bits from a hardware word in order to implement efficient garbage collection in software. However, a specification for garbage collection in hardware mostly faces the same challenges as the WASM GC spec: there being a lot of different ideas of how efficient GC should work like, with nearly every language bringing both some acquired tastes and some fundamental invariants to the table. I agree with @gasche that, if the number of bits gets too low, we will not use this to implement integer types: the better trade-off in that situation is to implement the integer types via boxed values instead of these unboxed integers, work on optimizing that representation, and enjoying native 32-bit arithmetic. Note, though, that the implementation of integer types is only one of the situations where guaranteed-unboxed integers are used in the data representation of OCaml. My impression is that the OCaml compiler can make good use of guaranteed-unboxed integers with less bits in most, if not all of these other situations. (I need to check back with people tomorrow to make a list of all the situations where boxing the unboxed is considered to be particularly bad and confirm if this assessment is correct.)
Are there any resources that show how a simple JIT compiler that can monomorphize code that contains polymorphic recursion looks like? I do believe Andreas Rossberg when he says it's not possible to do ahead-of-time full-program monomorphization because of polymorphic recursion. I would like to understand the argument in detail, why this is the case, where exactly things cannot work when trying to monomorphize ahead of time. |
I think #94 should be fleshed out as a summarized proposal; right now people are discussing it as a single proposal while taking some points in the middle of the discussion, it is confusing/difficult to keep track of what they mean. @sabine: I think the simplest way to extend what-I-understand-as-#94 with |
@sabine @gasche I would say that if In the context of the MVP however, (with coercion, not subtyping), allowing several scalars types could be worth it, as the cost of an In fact, this modularizes the It will have the downside that access to the JS world is defined entirely in terms of |
@sabine @gasche The arguments I gave are not about types, they're about values. The purpose of using smaller scalars like |
@aardappel For OCaml, I think it is enough in #94 to have a runtime type for the heap block that specifies the type of values in the reference array (tagged with 64 bits, tagged with 32 bits, or primref). I agree that maintaining three size fields would be too much. Explicit coercion between There's nothing wrong with having Though,
Ah, the purpose you assume is more specific than what we actually need: In the OCaml heap model, we let 31-bit and 63-bit scalar values occupy spaces that can be occupied by either scalars or references. We do not need our scalars to fit in the same space as a generic reference value. What we do need is a type that can store both scalars and references with reasonable efficiency. |
@RossTate I still don't understand your argument. In a design where The existence of In the current MVP proposal, (In my intermediate proposal |
Y'all are assuming that In other words, while |
It means we commit to checking that bit in any |
@RossTate we could think about having, for example, My impression is that (Again, the |
@gasche, you seem to be describing a heterogeneous system, meaning there is no global tagging structure. But then you need
We're working on it, but it will not have a |
Out of curiosity, I manually boxed an OCaml program performing integer arithmetic, in order to evaluate the performance overhead of systematic integer boxing on the current runtime. (The function I used is what I call "sumtorial", like factorial but with a sum instead of a product, basically a complex way to complex n*(n+1)/2.)
On my machine, the |
Quick update: as of just now, In case anyone has a demo where they suspect that significant (or at least measurable) time is being spent on taggedness-checks even though the demo doesn't use i31ref, it would be simple to create a custom build (or introduce a runtime flag) to turn off i31ref support and verify this suspicion. I won't do that until/unless someone asks me to though :-) (Disclaimer: this is not a statement of opinion on whether i31ref should exist in the spec, or in what form. I find the flexibility of a generalized form like |
Ok, if I get this right, you say that having I see two cases here:
Since we cannot type cast between It looks to me like there is no difference for the GC walking the heap, when adding So, there are three entities here:
Is there anything that prevents the engine from assuming different semantics for the implementation of
It looks to me like adding |
This
|
So what you're describing is a heterogenous approach. Unfortunately, the current MVP is designed around a homogenous approach. For example, its compilation model for imported/exported types is designed around a universal representation. As such, you would not be able to use But if you want to go with a heterogeneous approach to enable custom low-level representations, then it's better to take things further than just For completeness, I should note that |
Thanks for bringing up the type imports spec, I hadn't read that in all details yet. 👍 Okay, the type imports proposal says "As far as a Wasm module is concerned, imported types are abstract. Due to Wasm's staged compilation/instantiation model, an imported type's definition is not known at compile time." (https://github.com/WebAssembly/proposal-type-imports/blob/c9700ff6267571f4a52151c8a46e800f8534f923/proposals/type-imports/Overview.md) One paragraph later it says "However, an import may specify a subtype constraint by giving a supertype bound with the import" If I understand this correctly, this means that all type imports/exports must be subtypes of the type Okay, now looking at this from a practical perspective for OCaml: why would we need to import/export types for When importing something like a global, a function, or a table, I don't need to have a nominal type that wraps I could still import/export
You mean, some garbage-collection implementations don't put meta-information in the bit-representation of the small object, but instead they put meta-information in the bit-representation of the pointer pointing to the small object?
How does that work with arrays of
I understand that as "The SOIL initiative proposal aims to give the producer a way to influence the tagging scheme the engine uses." To achieve that, unavoidably, there is some added amount of complexity that the current GC MVP does not have (in particular, it adds complexity to the engine implementation while producers get to pick what they need from the spec). Is there potential for simplifying the SOIL initiative proposal, or do you think it cannot get simpler than it is? Can you give feedback on the scheme I sketched in #100 (comment)? I was confused about some things, in particular, how to represent arrays. Is |
@RossTate I believe that forcing imported/exported types to be ref types (by requiring they are deftypes) is a stopgap to avoid the whole polymorphism issue. I believe strongly that we should not preclude true parametric polymorphism in that proposal but should design for the most general case of importing types of unknown representation. That is necessitated by embeddings that need to reference and manipulate values of host types that do not fit into Wasm's type system at all. What about making |
Oops, so sorry @sabine! I let this get buried.
Well ideally a WebAssembly built with OCaml would be able to export its values for others to use. If, say, you built an efficient hashmap (as a toy example), then you'd like to export your hashmap type abstractly in WebAssembly (just like you would in OCaml) along with WebAssembly functions for operating on your hashmaps (just like you would in OCaml).
Yep.
There's a wide variety of techniques, and which one a GC can use depends on the invariants it can maintain. Some techniques are to always use the same tagging convention throughout the system (this is what OCaml does). Another technique is to have a descriptor at the top of any struct/array informing the GC how to work with that structure. That descriptor can be encoded as bits in a variety of ways, or point to some meta information (or even a code address to jump to).
Yep. We've been waiting for there to be a proper discussion and round of feedback to determine which simplifications we should explore.
Ah! Even more sorry! You went through all that work and I somehow missed it entirely. I'm happy to give some feedback. Also, note that we did a case study on how to do typed functional languages here. It's handling of closures is a little outdated though, since the call-tags proposal now provides a better way to deal with the complications caused by currying.
Instead of using
For arrays, you use
Besides that (and insignificant syntactic things), your sketch looks good to me. |
@RossTate Thanks for the feedback! It does look like things are representable in the proposal. Instead of RTTs, there are schemes, and it is possible to test whether a gcref belongs to a certain scheme, similar to being able to test whether a reference has a certain RTT.
It seems nice, from a language user's (someone who writes code in a language that compiles to WebAssembly) perspective, to have the ability to create and link modules that expose opaque values and functions that operate on them. I don't know, if the ability to express and handle these opaque types" must or should be provided on the WebAssembly level. It looks to me like it is still an open question whether WebAssembly should provide all this infrastructure, or whether languages should invent their own abstraction on top of WebAssembly for this. The only place where I so far know about built-in opaque values being strictly necessary is in the case of WebAssembly system interfaces, where, for security reasons, handles to operating system resources must not be modifiable by a WebAssembly program.
I would guess that, if one could get many different languages to represent their heap in that model, it should be possible to quantify which of the features are not used (or so rarely used that it doesn't make sense to implement them in a MVP). If there was an interpreter implementation of the SOIL initiative model, that would enable some experimentation. However, that's a fairly large effort, even though the performance of the interpreter is not important at all. Another way to look at the model is to look at every piece and try to answer the question "what would be |
It's only a trap if a byte is accessed out-of-bounds, not if the destination or source index is out-of-bounds. This only makes a difference when the length is zero, in which case the previous behavior would trap, and the new behavior will be a no-op.
@eqrion, you may find the opening post of this issue interesting, especially the part where @sabine says that compiling OCaml to linear memory would be preferable to having to box everything if i31 were not available. Apart from that, most of this discussion is outdated or could be more productive in new issues now that we have folks actually working on compiling OCaml to WasmGC, so I'll close this issue. |
As
i31ref
doesn't seem to be an unanimously-agreed-on (see #53) part of the GC MVP spec, I am very interested in discussing what the concrete alternatives to it are in the context of parametric polymorphism on uniformly-represented values. (I would appreciate if the answer doesn't immediately read as "use your own GC on linear memory".)To give some (historical) context: Why does OCaml use 31-bit integers in the first place? Generally, it is possible, to have a model of uniform values where every value is "boxed" (i.e. lives in its own, individually allocated, heap block). Then, every value is represented by a pointer to the heap and can be passed in a single register when calling a function. A heap block always consists of a header (for the GC), and a sequence of machine words (values). From an expressiveness standpoint, this is fine. However, when even simple values such as integers are always boxed (i.e. require a memory access to "unbox" them), performance suffers. Design constraints for the representation of unboxed integers were: a) need to be able to pass unboxed integer values in a single register, and b) need a means for the GC to distinguish (when crawling the heap) whether a value in a heap block represents an unboxed integer or a pointer to another heap block, c) being as simple as possible for the sake of maintainability. In OCaml, the compromise between performance and simplicity that was chosen is to unbox integer values by shifting them left by one bit and adding one. Since pointers are always word-aligned, this made it trivial to distinguish unboxed integers from values that live behind heap pointers. While this is not the best-performing solution (because all integer arithmetic has to operate on tagged values), it is a simple one.
Note that there exist compilation targets of OCaml that use 32-bit integer arithmetic, and the OCaml ecosystem largely accounts for that. Having libraries also consider the case where integers have 64-bits seems feasible. Some code will get faster if we can use native 32-bit integer arithmetic.
Ideally, for the sake of simplicity, we would like to emit one type
Value
to WebAssembly, which represents an OCaml value, which is either:anyref
or some address in the linear memory of another module)A heap block of OCaml traditionally consists of
The most trivial representation (i.e. the one matching most closely the existing one) that I see when I look at the MVP spec is an
anyref
array that holds both references to other heap blocks andi31ref
values. So, from the viewpoint of having to do as little work as possible in order to compile to WebAssembly and keeping the implementation simple,i31ref
is certainly looking very attractive for an OCaml-to-WASM compiler MVP.In #53 (comment), @rossberg summarized:
From OCaml's perspective, I think that (2) and (4) don't seem acceptable as a long-term solution in terms of performance. Here, compiling to the WASM linear memory and shipping our own GC seems a more attractive choice.
So, that leaves (3) and (5).
(3) seems fairly complex. If the WebAssembly engine would do the runtime code specialization, or if we could reuse some infrastructure from another, similar language, it could be worthwhile for us to work with that. It currently seems unlikely that OCaml in general will switch to (3) in the foreseeable future, unless we can come up with a simple model of runtime code specialization. I expect that implementing runtime code specialization in a WebAssembly engine goes way beyond a MVP, so it seems unlikely this will happen.
(5) is simpler than (3) in the sense that we do not have to ship a nontrivial runtime. If we analyze the whole program in order to emit precise types (
struct
instead ofanyref array
) for our heap blocks on WebAssembly, we wouldn't need to usei31ref
and we can reap the other benefits of whole-program optimization (e.g. dead-code elimination, operating with native unboxed values, no awkward 31-bit arithmetic). Still, this will be a sizeable amount of work (possible too much to do it right away).I also can't say how bad the size of the emitted code will be in terms of all the types we need to emit. Instead of emitting a single
Value
type, we need to emit onestruct
type for every "shape" of heap block that can occur in the program. To keep this manageable, we need to unify all the types whose heap block representations have the same shape.Then, static code specialization kills one nice feature of OCaml: separate compilation of modules. However, instead of doing static code specialization before emitting WebAssembly, maybe it is possible to implement a linker for our emitted WebAssembly modules that does code specialization at link time if we emit some additional information to the compiled WebAssembly modules? This kind of linker could possibly be interesting to other languages that are in a similar position as us, as well. Obviously, link time will be slower than we are used to. I haven't thought this through in detail at all.
It seems likely that these issues are manageable, if enough effort is put into them.
Edit: while the previous paragraph sounds fairly optimistic, looking into whole-program monomorphization (turning one polymorphic function into several non-polymorphic ones) more closely, it is definitely not trivial to implement. Types that we would need for this are no longer present at the lower compilation stages. When I look at the MLton compiler (a whole-program-optimizing compiler for Standard ML), it seems that it is a good idea to monomorphize early, in order to be able to optimize based on the types of parameters. Features like GADTs, and the ability to store heterogenous values or polymorphic functions in hash maps (or other data types) do not make it simpler. It looks to me like this would mean almost a full rewrite of the existing compiler and it is not obvious whether every function can be monomorphized (without resorting to runtime type dispatch).
Are we missing something here, are there other techniques that we have been overlooking so far? Feel free to drop pointers to good papers on the topics in general, if you know some.
Also, I am very interested what perspective other languages with similar value representations have on this and whether there is interest in collaborating on a code-specializing and dead-code eliminating linker.
The text was updated successfully, but these errors were encountered: