Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta: Musings on our design process and goals #111

Closed
jakobkummerow opened this issue Aug 7, 2020 · 19 comments
Closed

Meta: Musings on our design process and goals #111

jakobkummerow opened this issue Aug 7, 2020 · 19 comments

Comments

@jakobkummerow
Copy link
Contributor

After following the various discussions here for some time, and participating in some of them, I've had the following thoughts, which I wanted to share with you. To give credit where it's due: I first came upon this line of reasoning in my conversations with @tebbi.

I. Observation: there are several fundamentally different approaches to GC design.

There are several fundamentally different approaches to designing WasmGC, at least including the following examples:
(1) statically typed, optimized for AOT compilation (i.e.: all information required for compiling optimized native code is available at validation time), designed such that extremely low overhead can be achieved (e.g. no checks at all before accessing a field of a struct with statically known type). Requires more smarts in the Wasm producer, whose work is somewhat similar to compiling the respective source language to machine code. Allows comparatively simple engine implementations, or viewed differently: prefers primitives where there's a fairly obvious way to implement them efficiently. (The current proposal is somewhat close to this concept.)
(2) higher level (e.g. might provide built-in vtable primitives, letting the engine take care of the implementation details), more reliant on JIT-compilation (e.g.: allowing import of compilation-relevant things from other modules, i.e. not known before instantiation time), allowing more flexibility (e.g. not requiring recompilation when dependencies/libraries change). Inlining together with the other optimizations that depend on it could then be done by the engine on the client. This allows simpler Wasm producers, but requires more complicated engines. It probably leads to smaller modules in many cases. (This concept would be somewhat comparable to compiling to JVM bytecode.)
(3) very simple, flexible design, e.g. "butterfly objects": there's only one static type "object", each object has a number of pointer fields and a number of raw data bytes, the size is set at allocation time, each access requires a bounds check. Great fit for some scenarios (especially untyped/polymorphic/generic source languages), arguably decently compatible with many scenarios due to simplicity and lack of need to contort one language's semantics into another system's opinionated object model. Allows fairly simple engines, but also doesn't provide much potential for engine-side optimizations, i.e. might (likely?) have both better worst-case behavior and a lower peak performance ceiling than the other approaches. (Issue #109 explores this direction.)

(Side note: rather than looking at such individual points in a multi-dimensional spectrum, one could try to tease apart the one-dimensional, mostly-orthogonal "design axes" spanning up the design space. Here's an attempt to start that, it's probably missing several additional axes:
Axis 1: low-level, assembly-like instructions (i.e. very few CPU instructions per Wasm instruction), with predictable behavior/performance. Modules build their own abstractions/flexibilities on top of that, likely increasing module size and Wasm producer implementation effort. <-> higher-level (more declarative/flexible) instructions, engines have lots of freedom how to implement/optimize. Simpler Wasm producers, smaller modules, potentially higher performance ceiling in the limit, but also less reliable performance, might take many more years until Wasm producers can safely assume that most engines implement most optimizations; and even then basic implementation design choices might mean that pattern A will always be slower than B on engine 1, and on engine 2 it'll be the other way around.
Axis 2: support optimized AOT compilation <-> have features that require information gathered at instantiation time or even run time in order to get maximally optimized. (This is notably different from axis 1!)
Axis 3: optimize for cases where most types are known statically and few run-time type checks are needed <-> assume that most types will not be known statically, optimize for cheapest possible run-time type checks.
Axis 4: pragmatism: aim for (an initially small set of) currently-popular languages, take inspiration from currently-successful engine designs, i.e. standardize what's useful and easy to implement today. Ship something to users soon, even though it will be limited in features/applicability/reach as well as possibly performance <-> idealism: try to find the perfect language-agnostic platform that can do anything, and in particular that can support the most popular languages as well as VM research insights of the 2030s (which we don't know yet). Might take years (or decades…) of discussion.
Axis 5: ???
Observe that for every single one of these axes, both opposite extremes are worthy goals with certain arguments going for them. We probably want to do our best to find compromises where possible, but in some cases we might have to make fundamental either-or decisions. It is natural to expect different people to assign different priorities to the various goals. Talking about these abstract goals, and then looking at specific design questions in light of them, will hopefully bring some structure to the consensus-finding process, e.g. by highlighting which specific decisions are (i) philosophically well-aligned, (ii) diverse but compatible, or (iii) outright incompatible with each other.
An example of possible compromises is axis 2: not only can Wasm have both some instructions that are suitable for AOT optimization and some that rely on JIT compilation, the latter can also be designed such that they provide decent baseline performance before optimization kicks in, or in engines that choose to prefer lightweight simplicity over heavy optimization machinery.
-- end of side note.)

II. Consequence: we're not going to agree on one concept, so we should design a union.

Given that these different approaches have different strengths and weaknesses, we're realistically not going to agree on an overall "winner". (As one commenter put it: "AOT, flexible separate compilation, good performance -- choose 2." Notably that means that a given module or set of related modules can only get two of these three properties, but we can aim to make Wasm flexible enough that each module creator can choose their desired set of benefits.)

Being a platform seeking popularity, our goal is to cater to as many use cases as possible.
Also, being a platform targeted by tools (as opposed to a programming language written by humans), Wasm has far less need than other languages to be aesthetically pleasing or consistent or elegantly concise.
So the way forward is to try to find a compromise, a hybrid solution, a union of designs that's flexible (and large!) enough to contain several of these approaches.

One guiding principle for that is to avoid "bad features", defined as: adding one feature (significantly) slows down other features that didn't care for it, or even rules out other features. If a feature is not "bad" in this sense, then there's no harm in adding it if some use case benefits from it, even if other use cases don't need it at all.

III. Translating to specific action: talk about additions, not competing replacement designs.

So tactically, in order to make progress towards finding a GC proposal that the majority of us can agree with (and which we can hence implement and ship, which is the prerequisite for it being useful at all), I suggest that we all focus on incremental changes, and in particular: additions to the current proposal to make it serve more needs; as opposed to trying to decide between a bunch of competing proposals that aim to replace each other.
(That doesn't necessarily mean that we have to cram everything into the first version: some of these additions we may postpone to post-MVP versions; as long as we have a vision/plan for them that's okay.)

IV. Example.

As a specific example of how to apply this tactic: the current proposal could be slightly extended to cover the ideas in #109. It already has (array anyref), which is the "just n pointers" part of a butterfly object. It also has (array int8), which is the "just n bytes" part. We could add instructions that read i32/f64/etc from int8-arrays, then a (struct (field ref (array anyref)) (field ref (array int8))) would be semantically equivalent to a butterfly object. My understanding is that the plan is to allow some form of object nesting post-MVP, i.e. at some point we'll be able to combine an (array int8) and an (array anyref) into a single object, avoiding the need for the struct-plus-two-arrays split. Similarly, an assert_bounds instruction could be added.

(To clarify: my understanding is that there are two primary motivations behind #109: (a) come up with an as-simple-as-possible first version of WasmGC, to be extended in the future; (b) design a system that works well for some use cases, and in fact might be the best possible system for some use cases. By suggesting to merge it into the existing proposal, motivation (a) is obviously discarded, but motivation (b) is well served. If folks think that (b) alone is not sufficient motivation, then we can of course also not merge it unless and until there is evidence of demand.)

As another example: @tebbi is working on some (likely post-MVP) ideas to add vtable instructions, bringing some of concept (2) to the concept (1) inspired MVP.

@aardappel
Copy link

I would agree that at this point a plan for "how we make progress with the design" is more important than trying to further debate which design is better, not just for GC, but for a number of the ongoing Wasm proposals. We have an explosive amount of options that seems inversely proportional to progress in decision making.

I'd suggest:

  • We ensure each proposal has a clear champion who is actively driving things forward. This person's goal is: making forward progress, not necessarily defend their original design.
  • This person, from discussions, collects a list of most salient design space parameters that people appear to disagree on. Subtyping vs coercion? Modularity support at the Wasm level yes/no? Should tagged integers be a thing? etc etc. (These are just random examples, please don't take them literally).
  • This person then helps order these questions to which needs to be answered first to make the most progress towards a final design, and brings it in front of the (sub) committee. This gets voted on and becomes binding, and thus potentially makes the other choices easier, since they'll have to fit with this one. Repeat.

I'd agree with @jakobkummerow that once a proposal is significantly far ahead, that any competing proposals should be formulated in terms of how exactly they would change the existing proposal rather than be a clean slate design. So the butterfly proposal would be "should struct and array types be merged?" with a list of exactly all the consequences and changes in terms of the existing design. Then we can vote on it, and if the answer is no, we can move on.

@tlively
Copy link
Member

tlively commented Aug 7, 2020

I am concerned about the idea of designing a union of multiple proposed designs.

  1. Saying we will investigate/implement many ideas eventually does not help us decide which to focus on now. We already know we don't have the resources to investigate many ideas in parallel and there is no certainty that deferred ideas will ever be followed up on.

  2. Saying we will investigate/implement many ideas makes it unclear whether any particular idea will ever get investigated/implemented. Clearly there will be some ideas that will not be implemented, so we would still have the problem of agreeing on whether or not an idea is worth investigating and implementing.

To ignore or defer these questions would be to give up on allocating resources to ideas based on technical merit, and I don't think that's something anyone wants to give up on. To address these questions, we basically get back to where we are today. I also think that despite being a compilation target, there is value in WebAssembly proposals being cohesive and self-consistent, because that makes them easier to understand, use, and implement.

Instead, I think that the best way to make forward progress is to discuss and have a vote on more specific design principles and goals for an initial proposal. Essentially, we should all agree on where we want to be on the axes identified in the opening post. From there, we would have a shared measure against which we would evaluate proposals. Our design principles and goals can be further refined as disagreements arise.

  • This person then helps order these questions to which needs to be answered first to make the most progress towards a final design, and brings it in front of the (sub) committee. This gets voted on and becomes binding, and thus potentially makes the other choices easier, since they'll have to fit with this one. Repeat.

This is similar to what I'm thinking, except that I think it's important that we strongly prefer voting on design principles and goals over specific features.

@RossTate
Copy link
Contributor

RossTate commented Aug 7, 2020

I know you put it as a side note, but I really like your axes. I think it's useful to have those listed, and I bet we'll think of more as the project continues.

On the other hand, I'm concerned about the action plan of unioning. Language features generally don't compose so easily. There can be both compile-time complications and run-time complications with combining features. As a compile-time example, subtype polymorphism and parametric polymorphism are both very useful and work perfectly fine as independent features but are notoriously difficult to combine. The various idealistic combinations of the two are undecidable, and typed low-level systems often require substantial type annotation (i.e. as much annotation as executable code) to reliably bridge the decidability gap. One necessarily needs to restrict one or even both of the two features to accommodate the other, and it generally takes substantial exploration of the design space and thorough analysis of the application space to identify the right balance of tradeoffs for the specific constraints and needs at hand. As a run-time example, the most efficient casting mechanisms rely on having complete understanding and control over what values can be cast to and from, as well as an understanding of which casts are most frequent. The best cast mechanism for one application can be completely different than that for another, and so forcing applications to share the same cast mechanisms (especially in an unbiased manner) often results in a worst of all worlds.

This post is great in openly acknowledging and working to address the diversity in the design space. But, as much as I wish it weren't the case, I think we also need to openly acknowledge and work to address the known fundamental conflicts in the design space. With every feature that is proposed, we should acknowledge the features that it conflicts with. In prior collaborations, I generally would provide teams with various designs and describe what they can do well and what they likely prevent ever doing well.

A key aspect of #109 is not just what it can do well, but also that this second list is fairly empty, certainly relative to the current proposal and even relative to the SOIL Initiative's. It is designed to avoid conflicts with features we might want to add and design directions we might want to take forward.

So if you want to describe changes, you need to describe both what they add and what they block. Two designs can add the same features but with one blocking fewer features than the other, and not having that be part of the evaluation metric fails to recognize that WebAssembly is an evolving system in a novel space filled with uncertainty. And given that #109 is a sort of conflict-free minimal core, it makes more sense to describe changes on top of #109 than the current MVP.

@titzer
Copy link
Contributor

titzer commented Aug 8, 2020

Thanks for the thoughts Jakob and everyone. I wanted to reply to this part below before too much reasoning based on it, because I don't think it's quite true:

II. Consequence: we're not going to agree on one concept, so we should design a union.

Given that these different approaches have different strengths and weaknesses, we're realistically not going to agree on an overall "winner". (As one commenter put it: "AOT, flexible separate compilation, good performance -- choose 2." Notably that means that a given module or set of related modules can only get two of these three properties, but we can aim to make Wasm flexible enough that each module creator can choose their desired set of benefits.)

I don't agree that the only solution is to take the union. Rather to me that suggests that WebAssembly should be parameterizable in different ways and that we be careful about layering so that problems that are not solved at a lower level can be solved by another layer up the stack.

I don't know the context of the comment "AOT, flexible separate compilation, good performance -- choose 2." but I also don't agree this is true. For example, the JVM allows AOT, flexible separate compilation, and achieves good performance, but it does this by being a little less "AOT" and not compiling all the way to machine code. Instead it solves this problem with late binding--i.e. language types and operators are still in the bytecode and they are lowered dynamically to machine code (i.e. JITed). WebAssembly is the same, except the level of the bytecode is very close to the machine, so it's much "more" AOT by being lower-level, and lowering is almost a triviality. It's a difference in degree, not kind. WebAssembly is late bound. WebAssembly is JITed, but it has a very good caching story, so it almost seems like AOT.

The separate compilation issues being raised today are a consequence of considering a new set of requirements that were at first considered out of scope for the MVP. I think that it's good we look at those requirements now before we take the plunge. But--I think I see to this end road now, and I see our way out. Expecting to do late binding at the lowered wasm level--even if it's lowered to the WebAssembly GC proposal as it is today--instantly creates an ABI problem that forces every language to contort itself into unholy implementation choices to try to make ABIs that work around a giant hole, which is that all of their language information is missing. And the ABI problem is actually not because the GC proposal is somehow deficient, is missing a feature, or even has too many features.

It's a fundamental problem that binding language modules requires language-level types and operations. Those need to be expressible in Wasm. Since we cannot enumerate all languages now and in the future, Wasm must be parameterizable over languages. We just need to make sure that late binding (which implies late lowering) is super efficient and we can get at least 2 and a half of the things mentioned. And, if we succeed in making a good caching story, it pretty much counts as all 3 things, IMO.

@titzer
Copy link
Contributor

titzer commented Aug 8, 2020

@RossTate

So if you want to describe changes, you need to describe both what they add and what they block. Two designs can add the same features but with one blocking fewer features than the other, and not having that be part of the evaluation metric fails to recognize that WebAssembly is an evolving system in a novel space filled with uncertainty.

I like the idea of describing what you add and what you block, but to be honest, the most important criteria is what problem does this solve. Not only that, but what classes of problem does this solve.

This was supposed to be a meta discussion, so I find advocacy for #109 here distracting. I don't think we should rathole on it here. It's best to keep discussion of that proposal over there.

@RossTate
Copy link
Contributor

RossTate commented Aug 8, 2020

the most important criteria is what problem does this solve

I disagree. It is important to consider what problems a design solves and creates. As a toy example, we can solve the problem "WebAssembly needs to efficiently support class X of languages" by specializing the design of WebAssembly around class X. But that solution creates problems for many other classes of languages because now the design has been built around the wrong invariants for them. This then puts these language classes and related proposals in competition with each other.

At the meta level, it is these created problems that causes clashes. By being cognizant of, and ideally minimizing, the problems created by designs, it will be easier to strive for the suggested additive approach.

@titzer
Copy link
Contributor

titzer commented Aug 9, 2020

I disagree. It is important to consider what problems a design solves and creates.

Tech is fractal; it is like Medusa. Every solution creates new problems. It is an inescapable law of human technological advancement. Too much hesitation in the face of potential problems can cause the whole decision making process to get locked in analysis paralysis, a failure mode we need to actively avoid. Yes, we need to be aware of the problems we create. This has been part of WebAssembly's design process since the very beginning. I think we've been good about avoiding mistakes because we do think ahead to the next generation of problems beyond. But our decision making process is not solely driven by avoiding mistakes. We need to focus on actual problems and solve them. Which is why I said that the most important criteria is what problem does this solve.

The design of WebAssembly's GC support is a difficult and important problem. There are a lot of angles to consider and a danger of making a mistake. However, we sometimes get sidetracked talking about problems that are not even concrete enough to focus effort and too much disconnected reasoning is creating doubt and uncertainty that actually slows down some people who are actually trying to solve problems. Some recent discussions also make me wonder if the shared context of what problem we are actually trying to solve isn't crisp enough to get us all pulling in the same direction. Actually, to be honest, I don't wonder--we definitely aren't all pulling in the same direction.

The most productive discussions in my opinion have all centered around specific language implementation issues and strategies. Those are best had in the context of a concrete attempt to actually implement a language and make a working system. Those discussions have a narrower cone of uncertainty than vague notions of future problems. Basically, build some stuff. One complete but limping implementation of an important language is worth 20 long threads on subtyping, IMHO.

@tebbi
Copy link

tebbi commented Aug 10, 2020

I don't think anyone suggested that taking a union of all possible proposals is a good idea. The union aspect is not about proposals and not about designs, it is about goals or abilities. Should Wasm have separate client-side compilation of modules or late binding? There are good arguments to be made for both. If we find a design that can achieve either, depending on the needs of the respective situation, that would be ideal, because it would make Wasm applicable and useful in more situations. Of course, combining features is hard and sometimes impossible. And this doesn't make the design work easier. Sometimes it might be best to add both of two separate designs that achieve different goals. But if it is possible to find a single design that elegantly solves both goals at the same time without harming either of them, then I'd argue that this is even better. To talk about the GC proposals as an example: There is the goal to use static types to reduce dynamic checking, which is addressed in the current MVP proposal. There is the goal to not model source language types at all but only have a very limited set of types, which makes lowering complex or dynamic type systems easier. It would be possible to add both separately. But since it’s possible to slightly extend the current MVP proposal to allow for unaligned access into byte arrays and the combination of two arrays in one object, we can get all the abilities with a single unified design, which is simpler and more uniform than the sum of both proposals.

@jakobkummerow
Copy link
Contributor Author

Thanks for your thoughts, everyone! I'll try to respond to some of them:

So the butterfly proposal would be "should struct and array types be merged?"

Or, paraphrasing slightly, it can be "should we add an ability to have objects that contain two (or more) arrays?", with possible answers "Yes, now", "Yes, post-MVP", and "No, never", and the "union of features" strategy would imply that if there's evidence that having such objects would be great for some use cases (and nobody can see severe downsides of having them), then that eliminates the third option and it only boils down to when we add them. Notably, we don't need to throw out and replace the current proposal in order to add butterfly objects, we can extend it.

I am concerned about the idea of designing a union of multiple proposed designs. (-- @tlively)
I'm concerned about the action plan of unioning. Language features generally don't compose so easily. [...] With every feature that is proposed, we should acknowledge the features that it conflicts with. (-- @RossTate)

Certainly, we can't and shouldn't just throw everything together, for example:

  • we don't need two very similar mechanisms that solve the same problem
  • for some of the design axes, the opposites ends are incompatible with each other and we'll have to make an either-or decision
  • as I wrote before, some features impede or block (in terms of performance or feasibility) other features, and we must certainly be extremely careful about that.

To me, the two salient points are:
(1) As far as the design process is concerned, I do think that "unioning" our ideas into a single consolidated design proposal is more likely to lead to consensus than creating (further) competing proposals, or debating their relative pros and cons. (A variant of https://xkcd.com/927/ with s/standards/Wasm GC design proposals/, essentially.)
(2) As far as the resulting overall design, or maybe I should say "feature set", is concerned, I think that we widely agree that WebAssembly should support many use cases and support them well, and in cases where that is best done by "unioning" features into the proposal, I think we should feel free to do so. (If language A has no use for feature X, but language B really benefits from it, then that's fine and we should probably add it.)

I think many of the features that took up discussion time recently can just be added to each other. Should we have i31ref? Sure, if some use cases benefit from it, and we can make it work without harming other features, why not! Should we have "butterfly objects"? Sure, if some use cases benefit from them, and we can make them work nicely with other types/objects, why not! (I see this as something of a parallel to "should Wasm have i32 arithmetic? i64 arithmetic? float arithmetic? Simd?", the answer to all of which is "sure, they're useful for some cases, and don't harm each other, so Wasm should have them all!")

To give an example for "very similar ideas": folks have been discussing alternative ideas for tagged scalars that would replace the current proposal's i31ref. To me, that seems to be a case where one mechanism to solve the problem would be enough. We can have this discussion (as some folks certainly did!) by framing it as "should we amend the current proposal by generalizing its i31ref feature to tagged i31 anyref (or whatever)?", rather than "in the great contest of competing proposals, its approach to tagged scalars is a reason to prefer contestant A over B".

Rather to me that suggests that WebAssembly should be parameterizable in different ways

I don't think that's so different from what I'm proposing: wherever possible (of course it's not always possible), we should avoid having to make either-or decisions by providing several ways of doing things, so that modules can choose what they need.

the JVM allows AOT, flexible separate compilation, and achieves good performance, but it does this by being a little less "AOT"

I think this characterization actually makes the JVM a datapoint in support of the original statement: no system can support full AOT, separate compilation, and perfect optimization all at once; but we can design compromises, where e.g. there's an AOT part that can be done separately, decent baseline performance, and a JIT part that takes care of "late binding" aspects and advanced optimizations (and cannot be done separately or updated incrementally because it bakes in whole-system knowledge). And we can also devise a system where modules can, for instance, choose either of "I want to be fully optimized AOT, here's all the type information the compiler needs" and "I want to provide a fast edit-compile-test developer experience by only recompiling what changed, please send me down the baseline+latebinding+JIT pipeline"; or maybe even a sort-of hybrid where engines choose what to do automatically based on the features that a module uses.

Too much hesitation in the face of potential problems can cause the whole decision making process to get locked in analysis paralysis, a failure mode we need to actively avoid.

+1. We'll try our best, and yet we'll inevitably make mistakes. We certainly should put in due diligence, but at some point we should also be pragmatic and go ahead with something. One tried-and-true way of dealing with aspects of initial specifications that later on turn out to have been mistakes is to add alternatives afterwards. That's part of what motivated the meta-suggestion here: when we're torn between options that don't strictly conflict with each other (which, again, applies to some but not all of the things we've been discussing), then the fastest way towards consensus is probably to include both options in the proposal.

@RossTate
Copy link
Contributor

RossTate commented Aug 10, 2020

Again, you need to consider the conflicts in the space. @tebbi and @jakobkummerow both suggest combining features that conflict with each other.

But since it’s possible to slightly extend the current MVP proposal to allow for unaligned access into byte arrays and the combination of two arrays in one object, we can get all the abilities with a single unified design, which is simpler and more uniform than the sum of both proposals.

I wish this were true, but it is not. #109 not only introduces butterflies, it also introduces tagref. Many of the examples rely on coercions to and from tagref, and the efficiency of those examples relies on those coercions being efficient, especially for the canonical funcref and structref tags.

However, the current MVP requires all reference types to be subtypes of anyref. This then requires all references to use the same downcasting mechanism. This leaves us with two choices:

  1. Use the existing downcasting mechanism, namely rtts. But let's consider coercing to funcref. As @jakobkummerow pointed out in rtt implied by ref.func #110, the current design doesn't enable one to cast values to funcref. Let's suppose we fix that, say by having a canonical rtt 1 funcref value that all typed function references are subclasses of. So then we have to use ref.cast anyref funcref : [ anyref (rtt 1 funcref) ] -> [ funcref ]. This instruction loads an rtt array, then does an array-bounds check, then does another load, and then does a comparison. Contrast that with tagref's design, in which in the worst case it's a load and a comparison, and in the best case it's just a pointer-tag check. A similar problem comes up with structref, except now every structref needs to have an additional field storing its particular rtt, so there's space overhead as well.
  2. Add a new downcasting mechanism using tags. But now an rtt downcast is slower because it first has to check if the reference being cast was allocated with the rtt mechanism or with a different mechanism. Similarly, a downcast using tags has to check if the reference being cast was allocated with the tagging mechanism.

Notice how the design choice of having a top type (i.e. anyref) creates the problem here. It causes downcasting mechanisms to clash with each other (and right now "unifies" them by imposing the slowest of constant-time mechanisms upon everyone). anyref was introduced to address the issue that many languages need a "uniform" representation across their reference values. Certainly anyref solves that problem but creates another in doing so. If we were more cognizant of the problems created by designs, we would investigate where there were other ways to solve that problem without creating this issue with downcasting. (And in fact, there are known alternatives.)

One tried-and-true way of dealing with aspects of initial specifications that later on turn out to have been mistakes is to add alternatives afterwards.

This approach is hardly tried-and-true. In various collaborations, I've worked with language teams to solve problems that have an obvious alternative solution that they cannot adopt because it would not be compatible (in some sense or another) with the prior solution they regret adopting.

For example, once you design all of WebAssembly around the expectation that anyref is the top reference type, you cannot simply solve problems that creates by introducing an alternative. For example, even if you decided to not make anyref the top type anymore, the entire ecosystem was built around this assumption. All imported types in all preexisting modules are (constrained to be) subtypes of anyref, and so any modules built using an alternative would be incompatible with all those modules and the entire preexisting ecosystem.

To give an example for "very similar ideas": folks have been discussing alternative ideas for tagged scalars that would replace the current proposal's i31ref.

A lot of that discussion was on whether to have unboxed scalars in the context of a design using a uniform representation (e.g. the current proposal's anyref or #109's tagref). When you rephrase it as just "should we amend the current proposal by generalizing its i31ref feature to tagged i31 anyref (or whatever)?", you have dodged the hard question. In particular, you've avoided considering the problems created by having unboxed scalars, especially 31-bit ones. For example, for #109 it creates the problem that there's no longer enough free low bits in the pointer to have both funcref and structref canonincal tags be implemented through bit flagging. (This doesn't matter so much for the current MVP because it has already adopted slow casting to all reference types.) These sorts of conflicts are why systems with frequent casts generally do not guarantee 31 bits for unboxed scalars; they have generally found that those lower bits are more useful for other things.

Too much hesitation in the face of potential problems can cause the whole decision making process to get locked in analysis paralysis, a failure mode we need to actively avoid.

This is a mischaracterization of what I stated, which was

It is important to consider what problems a design solves and creates.

I was suggesting the importance of risk awareness, which should not be conflated with aversion. I was also reemphasizing this point:

Two designs can add the same features but with one blocking fewer features than the other

That is, we can consider multiple ways to solve the same problem, and then choose to prefer one over the other because it creates fewer problems. But since we only focus on the problem solved, the only way to address a problem that is created by the current proposal is to find a better way to solve the same problems.

I think the meta-level purpose of this thread was to reduce competition. That is certainly a worthwhile purpose. But to achieve that purpose, we need to address what creates competition. It is the problems created by designs that cause competition because they are what put design interests into conflict. If two groups develop designs that solve their problems in a way that creates problems for the other group, then that puts their designs in conflict. If, on the other hand, we were to collectively start recognizing that one way to improve a design is to solely reduce the problems it creates, then we could help these groups each find a way to satisfy their needs while not blocking the other's. I had tried this approach here before, and I gave up. But I'm happy to try it again if people indicate they will actually appreciate feature-conflict-reduction as a valuable improvement in its own right. So far, though, that's not what the conversation here indicates.

@binji
Copy link
Member

binji commented Aug 11, 2020

@RossTate: I think the meta-level purpose of this thread was to reduce competition.

I agree. I believe we'll have to have some amount of competition to evaluate the designs, but ultimately we should align on one MVP GC proposal. This proposal doesn't need to solve every problem at first, but ultimately should aim to eventually cover the union of "abilities" as @tebbi put it.

@tlively: Instead, I think that the best way to make forward progress is to discuss and have a vote on more specific design principles and goals for an initial proposal. Essentially, we should all agree on where we want to be on the axes identified in the opening post. From there, we would have a shared measure against which we would evaluate proposals. Our design principles and goals can be further refined as disagreements arise.

I think this is a good idea, and @fgmccabe brought up a similar idea before. But I think to fully evaluate an idea, we'll have to prototype it (at least some of it) before we can make a decision. I think this aligns with...

@aardappel: This person then helps order these questions to which needs to be answered first to make the most progress towards a final design, and brings it in front of the (sub) committee. This gets voted on and becomes binding, and thus potentially makes the other choices easier

... assuming that we are answering these questions by evaluating designs with prototypes, as necessary. In other words...

@titzer: Basically, build some stuff.

Which is what we've already done, to some extent. The current MVP proposal is currently implemented in v8 (and AFAIK partially in SM too), and there has been some effort to use it already. We should look at these implementations as experiments, and evaluate them using our requirements. As other ideas become fleshed-out enough to prototype (such as #109), we should try to find someone to implement it.

@tebbi
Copy link

tebbi commented Aug 11, 2020

I agree that we shouldn’t have both tags and RTTs. They solve more or less the same problem and are difficult to implement at the same time.
It would be easy to extend the RTT system slightly so it shares the advantages of tags:
We could add an instruction ref.cast_exact that checks if an object has exactly the provided RTT, independently of the subtyping hierarchy. This instruction would be very fast: It just checks equality of the RTT value in the object header of an object and the provided RTT, so it’s a single load and equality check. If we have certain well-known RTTs, the engine could even do encoding tricks to store the RTT information as pointer tags. That would certainly be possible for i31ref, but also for other types like structref (in the setting of the current MVP, structref would be an ordinary type, but implementations could still special-case it for performance reasons).
Note that it would also be possible to change the current MVP proposal to have a single RTT for all funcref types and to use a second function entry to dynamically check a call on a funcref type.
I’m not necessarily saying this is better, just that these are possible and orthogonal design decisions that are independent of the decision to have RTTs or not.

@fgmccabe
Copy link

I believe that the primary issue 'against' the Rossberg MVP is not a run-time issue. Essentially, the issue (AFAIT) is one of type checking; or, more specifically, checking compatibility of types. (I.e., not necessarily the runtime cost of a cast instruction itself).
This is inherent in the so-called structural typing model (coinductive subtyping anyone?).
There are a couple of resolutions to this, one of them involves exchanging a complex type equivalence problem for occasional run-time type checks. This is the (again, AFAICT) motivation for the 'butterfly' proposal.
Another resolution would be to introduce a full scheme of so-called nominal types - this would have the practical effect of simplifying individual type equivalence problems but at the cost of a more elaborate type system - especially when you need to include generic (parametric polymorphic) types.
Prototyping the run-time aspects of a GC system is not going to 'get at' this issue.

@aardappel
Copy link

aardappel commented Aug 11, 2020

@binji ... assuming that we are answering these questions by evaluating designs with prototypes, as necessary.

As much as I am a fan of making decisions based on real data, that is often not feasible. To prove out a feature, you'd have to:

  • Prototype a Wasm backend for a production language compiler using the new feature.
  • Prototype that feature in a production engine (a simpler engine will not allow realistic performance testing).
  • Possibly in common toolchain parts the language uses.
  • Port several representative programs to that language/feature to be able to benchmark.

But then you still don't know anything, because you essentially only have 1 data point! You additionally need:

  • Repeat the same for alternative features and additional languages.
  • Proof that your benchmarks are representative and represent competing features fairly, and that any inefficiencies measured are not due to the prototype nature of the compiler or engine.

To really have any data at at all to make definitive conclusions with. Without that last point, it may still be meaningless if the chosen benchmarks inner loops use one proposals features more or less than the other, which seems impossible to balance fairly (if your goal were to make a point about the amount of dynamic checks needed, for example).

Then there is the question, even with the above data, if a future non-prototype slightly smarter compiler or engine could remove whatever overhead you measured for one or more of your data points.

In all, this is not a realistic engineering burden you can put on anyone suggesting a new feature, and even if you did, results in remarkably little useful information.

So often we'll need to make decisions with nothing but the hunches of a group of people with certain past experiences in language implementation.

Most important is that we start agreeing on something, be that the decision making process, the desirable outcomes we want to have from a Wasm GC (and their relative weights), relative importance of features/properties, and relative importance of languages this could work well for, or less well.

@binji
Copy link
Member

binji commented Aug 12, 2020

@aardappel Fair points, but I think it's too strong to say we shouldn't bother at all with experimentation. For example, I still think that one data point is better than zero data points. And I agree that there's a significant burden if we do full implementations, but I think we can find ways to experiment with partial, focused implementations too. It's possible that some heroic optimization can come along later to upset our results, but in general we've been trying to avoid those anyway.

Anyway, I agree that some of this will have to be a hunch. But right now we have disagreement from knowledgeable people about which design will be too slow. How do we resolve that? Part of the answer is deciding what matters to us as a group (the relative weights/importance you mention). But I think another part of that is doing some experiments.

@RossTate
Copy link
Contributor

this is not a realistic engineering burden you can put on anyone suggesting a new feature

It is particularly frustrating to have this burden imposed upon you when the existing proposal has had no similar such evaluation. The existing proposal is even the only one to have been developed before considering any in-depth case studies, and it still has yet to develop one, let alone multiple covering a variety of languages.

How do we resolve that?

We can walk through implementation strategies of various languages and compare. Ideally these strategies would be informed by experts on the specific language's implementation challenges and techniques as well as experts on the strengths and weaknesses of the proposals so that we can collaboratively develop solutions each well fit for the various proposals.

This comment to this comment is a discussion of how to implement OCaml in two proposals.

Here is a (single) comment comparing two proposals on a Java program and on a C# program, going into detail about counting loads and such.

Now no single comparison is going to be perfect, including the ones I linked above. But collectively they can provide a pretty reasonable impression. Beyond evaluating, they can also serve as useful documentation to future language implementers on strategies to consider and pitfalls to be aware of, since WebAssembly will be fairly different from other backends they've targeted, and they can identify useful opportunities for extension (or revision).

Even if there weren't disagreements, these sorts of walkthroughs should be part of the design process. In the other language-design teams I have worked with, designs are generally not solidified or advanced to implementation until such case studies have been performed (at least informally) because they are often quite informative.

@aardappel
Copy link

aardappel commented Aug 13, 2020

Apologies in advance.. but maybe.. we should all take the hint.

Already at this stage, the diversity of opinion on how we should approach GC is substantial, and we're only just getting started. It is becoming clear that, whatever design we end up choosing, it is going to be rather easy to find implementation techniques for which it is unsuitable, and it will create unnecessary overheads for many/most languages.

If you look at the literature for GC techniques for various languages, and the associated ways a language can choose to represent its types, the diversity of representations and algorithms acting on them is staggering. And of course it can be this diverse because on traditional architectures we can choose for every bit in memory what it will represent and how it will be modified. It is clear any "middle ground" GC design is going to fall well short of these abilities.

So maybe we should take a step back and ask.. why are we doing this? Why are we doing this impossible design exercise that is going to arbitrarily restrict the ability of this most expressive device we have to represent languages (linear memory)? Can we solve our goals in a different way?

  • Certainly, "Wasm modules produced by GC languages should not be bloated by a runtime" is easiest to solve if we focused more on dynamic linking, such that we can guarantee that language runtime version X is going to be pre-compiled on the users machine, guaranteeing small downloads, quick startup times, and best of all: full flexibility in implementation techniques. We will need to do this anyway, since language runtimes can be rather large even once you "substract" the GC.

  • Second pillar we need is to restore some of the flexibility those native CPU GC's had, because we moved the stack out of linear memory. There are already proposals for stack walking / inspection that can help with this. Maybe there are other common GC implementation techniques we can help with some extra instructions. This in general seems easy to make forward progress in, since it only expands our options rather than limits them, and should be less controversial. We can even have more of these once we have experience with initial GC languages in the wild, there's no need to hurry.

  • Then the third, and hardest pillar is of course host/JS interop. This is indeed going to be hard to find a satisfying solution for, but I can't believe that we're willing to give up all this flexibility of GC implementation, invent an entirely new way of where to store data that is not ideal for anyone, just because we don't think we can't safely interop any other way? I would really encourage people to think of (if the above 2 pillars above were solid), what would be the best design for interop we can come up with? I suggested using IT in Sketch: Linear memory GC with IT #78, but maybe a design that allows actual sharing is feasible. I think it be fine to put some limitations on how this interop may happen, given that we have not done any of it yet :)

Don't worry, I have no illusions the above is going to make us all change our plan of a host managed GC, but I do want us all to be aware of what we're doing, the alternatives, and the cost. If you feel the cost is warranted (seamless interop is more important than language implementation flexibility), than we're good to continue as-is.

@dcodeIO
Copy link

dcodeIO commented Aug 13, 2020

I would really encourage people to think of (if the above 2 pillars above were solid), what would be the best design for interop we can come up with?

Regarding interop with JS specifically, I conducted a little thought experiment in #113 (comment) by basing GC around JS's object model. TL;DR: Close to ideal interop; reasonably close to the current MVP; Wasm JIT as a way to go full circle eventually for dynamic (parts of) languages.

Probably an unpopular take, as it conflicts with the notion that JS shouldn't get any special attention (which I btw think is a mistake). But who knows, maybe there's something good to extract from it.

This was referenced Aug 14, 2020
@tlively
Copy link
Member

tlively commented Feb 16, 2022

Since this discussion, we've established a strong implementer feedback loop for the proposal and have been able to make progress on multiple thorny design problems, so it seems we've settled into a good design process. I'll close this for now, but feel free to reopen if you have more to add to the discussion.

@tlively tlively closed this as completed Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants