Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

State principal types#91

Merged
rossberg merged 2 commits into
subsumptionfrom
principal
Feb 21, 2023
Merged

State principal types#91
rossberg merged 2 commits into
subsumptionfrom
principal

Conversation

@rossberg
Copy link
Copy Markdown
Member

@rossberg rossberg commented Feb 14, 2023

As discussed in the last meeting, this describes the relevant Principal Types properties for Wasm, both informally and as a formal statement. It is written in the form of a small new section for the spec Appendix.

It states two forms of principality: the general property, assuming nothing about the type, and a non-standard partial "forward" property, assuming the input types are already known (as in an engine validator).

I checked all typing rules. Mostly these properties are straightforward to show, but there are two familiar friends causing extra complication for general principality:

  • br_table: Requires a glb over the types of the labels, which in turn requires proving that subtyping forms a semi-lattice. That is currently the case, but may be worth stating explicitly as well.

  • select without annotation: Requires introducing a special case type variable that only ranges over number and vector types.

These properties also hold for the GC proposal, provided we go with source + target type annotation for br_on_cast and br_on_cast_fail (ref.test and ref.cast only need the target type, since they can be typed with the respective top type for input).

@tlively, @conrad-watt, PTAL.

Copy link
Copy Markdown
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me.


.. note::
For example, in isolation, the instruction :math:`\REFASNONNULL` has the type :math:`[(\REF~\NULL~\X{ht})] \to [(\REF~\X{ht})]` for any choice of valid :ref:`heap type <syntax-type>` :math:`\X{ht}`.
Moreover, if the input type :math:`[(\REF~\NULL~\X{ht})]` is already determined, i.e., a specific :math:`\X{ht}` is given, then the output type :math:`[(\REF~\X{ht})]` is fully determined as well.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to clarify somewhere that "input type" can mean both type parameters and input stack types?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Input types only refers to stack operands here, i.e., the l.h.s. of an instruction type's arrow. Type immediates are always given, since they are a mandatory part of an instruction itself.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I had previously misunderstood the example. Makes sense!

@tlively
Copy link
Copy Markdown
Member

tlively commented Feb 15, 2023

  • br_table: Requires a glb over the types of the labels, which in turn requires proving that subtyping forms a semi-lattice. That is currently the case, but may be worth stating explicitly as well

I do think this is worth stating as well.

@rossberg
Copy link
Copy Markdown
Member Author

@tlively, done. Also added the symmetric statement about lubs, and a statement about the disjointness of subtyping hierarchies. PTAL.

@rossberg
Copy link
Copy Markdown
Member Author

PS: I wonder if there is a concise mathematical term for a structure that's a (semi)lattice up to the lack of a top element (i.e., where elements can be unrelated), but I couldn't find anything.

Copy link
Copy Markdown
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I'm happy to see conditional LUBs in there, since we heavily depend on their existence in Binaryen.

The trick of saying "this would be a lattice if you added this provisional top type" is what I would expect to see, FWIW. I don't know of a single term for that, either.

@tlively
Copy link
Copy Markdown
Member

tlively commented Feb 15, 2023

Do you think it would be worth presenting these properties briefly to the larger CG as an FYI since they in principle will constrain the design of other proposals as well?

@rossberg
Copy link
Copy Markdown
Member Author

Yes, presenting them to the CG probably makes sense. I'll add an agenda item.

@rossberg
Copy link
Copy Markdown
Member Author

@conrad-watt, please let me know if you still want to review this as well, otherwise I'll land it.

@askeksa-google
Copy link
Copy Markdown

(ref.test only needs the target type, since it can be typed with the respective top type for input).

Isn't this also the case for ref.cast? The output type of ref.cast is exactly the target type, independently of its input.

@rossberg
Copy link
Copy Markdown
Member Author

@askeksa-google, yes, you're right. I corrected the comment.

@tlively
Copy link
Copy Markdown
Member

tlively commented Feb 16, 2023

@rossberg, multiple stakeholders on the V8 team still aren't sure the principle types property is worth the extra code size from annotations. Would you be able to connect us to a researcher (or anyone else) who has benefited or will benefit from this property?

@rossberg
Copy link
Copy Markdown
Member Author

rossberg commented Feb 17, 2023

There is a meet-up of Wasm researchers next month, where I could try to gather some infos.

That said, I don't think it's productive to ask for reopenening the debate. We made the decision, it took a long time to agree, there are other non-trivial issues that depend on the decision made, and most importantly, it is a conservative choice that could be amended later, which is not true the other way round.

The usual criterion for revisiting a decision is significant new data. The only new discovery right now is the need for an additional annotation on two relatively cold and heavyweight instructions, which is likely in the noise. I don't think that justifies restarting the entire debate all over again, and after the prior resolution has already been implemented by everybody. We are on the final stretch of the MVP and I much rather focus on finishing up.

@tlively
Copy link
Copy Markdown
Member

tlively commented Feb 17, 2023

There is a meet-up of Wasm researchers next month, where I could try to gather some infos.

That would be great, thanks!

FWIW, we've measured these annotations to have a code size impact of a couple percentage points on top of the ~5% from the previous round of annotations, so the impact is small but not in the noise, and the cumulative impact of the annotations is moving out of "small" territory. This is a very real cost our users will be paying, so we'd just like to understand who benefits from it and how.

@tlively
Copy link
Copy Markdown
Member

tlively commented Feb 17, 2023

I just measured with the latest sheets calc engine binary and the numbers are much better than the estimates I reported above:

uncompressed uncompressed ratio gzip gzip ratio brotli brotli ratio
no annotations 7186993 1 1862429 1 1244783 1
current annotations 7513274 1.04539882 1928221 1.03532591 1285588 1.032780814
all annotations 7513274 1.04539882 1928221 1.03532591 1285588 1.032780814

Edit: I just discovered that the module does not actually contain any br_on_cast{_fail} instructions and the reported size increase above is simply due to roundtripping through binaryen without performing optimizations 🙃 I've edited the table above to remove that effect.

This points to missing optimization opportunities in Binaryen since it is not transforming any ref.test + br_if + ref.cast patterns into br_on_cast{_fail} instructions. However there are only 796 instances of ref.test in the module I'm looking at, so even if we managed to optimize all of them to br_on_cast, the effect of making br_on_cast larger would be negligible.

@rossberg
Copy link
Copy Markdown
Member Author

Thanks for the (updated) numbers! I take it then that you are fine with including the extra annotations on br_on_cast[_fail]?

@askeksa-google
Copy link
Copy Markdown

Corresponding numbers for a module produced by dart2wasm:

uncompressed uncompressed ratio gzip gzip ratio brotli brotli ratio
no annotations 1584286 1 524384 1 395927 1
current annotations 1655357 1.044860 551765 1.052216 415750 1.050067
all annotations 1655373 1.044870 551703 1.052097 416245 1.051318

The code section (which is the only one affected by the annotations) is about 73% of the module size.

As can be seen, the br_on_cast[_fail] instructions are also very sparingly used here. Due to type canonicalization, they are generally not very useful as a component in source-level type checking.

@tlively
Copy link
Copy Markdown
Member

tlively commented Feb 21, 2023

Thanks for the (updated) numbers! I take it then that you are fine with including the extra annotations on br_on_cast[_fail]?

Yes, this PR LGTM.

@rossberg rossberg merged commit 314a335 into subsumption Feb 21, 2023
@rossberg rossberg deleted the principal branch February 21, 2023 17:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants