Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET Notes #77

Open
vargaz opened this issue Feb 11, 2020 · 14 comments
Open

.NET Notes #77

vargaz opened this issue Feb 11, 2020 · 14 comments

Comments

@vargaz
Copy link

vargaz commented Feb 11, 2020

I was asked to add some notes based on the needs of Microsoft .NET implementations wrt GC in WebAssembly.

  1. Object layout
    .NET objects in memory usually consist of a header followed by object data. The header contains data
    such as:
  • the vtable/type pointer
  • a sync word for locking on the object
  • a length field for arrays/strings
  • for multi-dimensional arrays, a pointer to a struct describing the dimensions.

The type system of the current proposal doesn't seem to be able to support this layout, i.e. a header followed by array data. Also, .NET supports arrays of structs, i.e. an array of {ref,non-ref} would
look like in memory: [ref, non-ref, ref, non-ref, etc.].

In general, it seems very difficult to model all possible object layouts used by GCd languages.

  1. Interior pointers
    In .NET, its quite common to have pointers into the middle of objects (arrays), and pointers to one past
    the end of an array. anyref doesn't seem to be able to support this.

  2. Interop with C/C++ code
    The .net runtimes are written in C/C++ and assume that object references are normal C pointers which
    point to linear memory, and objects can be accessed from C code as a pointers to C structs. The current
    proposal places allocated objects outside linear memory and adds new accessors to read/write their
    contents. To allow manipulation of these objects from C code would require extensions to the C compilers.

  3. Finalization
    The .net runtime needs to be notified somehow when an object with a finalizer dies.

  4. Weak references
    .net supports multiple kinds of weak references which might not be supported by the underlying JS GC.

  5. Non web runtimes
    Non-web runtimes would need to add a GC implementation, since GC is such a core feature that it
    probably cannot be treated as an optional feature like SIMD.

  6. LLVM support
    The new types/type constructs don't exist in LLVM, not clear how they can be added.

@rossberg
Copy link
Member

  1. Object layout
    [...]
    The type system of the current proposal doesn't seem to be able to support this layout, i.e. a header followed by array data. Also, .NET supports arrays of structs, i.e. an array of {ref,non-ref} would look like in memory: [ref, non-ref, ref, non-ref, etc.].

This form of nesting aggregates is supported (and I agree essential) as a Post-MVP feature. We cut it from the MVP because it can always be replaced with an indirection, so isn't strictly required for functional completeness. That decision could be reversed, of course, but it is tricky keeping the MVP small.

  1. Interior pointers
    In .NET, its quite common to have pointers into the middle of objects (arrays), and pointers to one past the end of an array. anyref doesn't seem to be able to support this.

Interior references are part of nested aggregates extension described above, so equally Post-MVP atm. They will be a distinct type from regular references (which can be converted into them). The reason is that this allows engines to represent them differently, e.g., as fat pointers. In past discussions with .NET folks they believed this would probably be good enough for .NET, because inner pointers only arise in specific contexts, but this requires further investigation.

  1. Interop with C/C++ code
    The .net runtimes are written in C/C++ and assume that object references are normal C pointers which point to linear memory, and objects can be accessed from C code as a pointers to C structs.

Yes, this is a known limitation. It is unlikely that much can be done about it directly. However, interface types may be able to emulate this interop.

  1. Finalization
  2. Weak references

These are tough ones, and we don't have a good idea yet how to support the myriads of different finalisation semantics out there without creating a zoo. Very likely Post-post-MVP, but suggestions are welcome.

  1. Non web runtimes
    Non-web runtimes would need to add a GC implementation, since GC is such a core feature that it probably cannot be treated as an optional feature like SIMD.

It is a stated goal that GC remains an optional feature. We have been very careful to design this and other features such that there are no unwanted dependencies on the presence of GC.

  1. LLVM support
    The new types/type constructs don't exist in LLVM, not clear how they can be added.

True, but that's a tool chain problem that needs solving and that is fundamentally unavoidable.

@Horcrux7
Copy link

@vargaz There is an alternative GC suggestion https://github.com/soil-initiative/gc/pull/1 that should better match for .NET.

The example for an OO language can be interesting https://github.com/soil-initiative/gc/blob/103eb72aaa7f3a7ecb3a436ce95ae9f108311799/proposals/gc/NomOO.md

@aardappel
Copy link

(3) in interesting to me, in the sense that I've mentioned it before as a major issue with the current GC design. Most languages come with significant C or C++ runtimes that assume direct access to language data. These runtimes would need to be rewritten in a Wasm-GC aware language, which in some cases may be impractical. If these languages wish to keep using their existing runtime, they will be forced to do their own GC in linear memory, losing out on many benefits.

I have no good solutions either, I just think we should be more aware of this tradeoff. I can imagine that an alternative GC proposal that allows objects to live in linear memory would work much better for many existing languages, but is likely much harder to make work with host interop.

In some sense, the current GC proposal favors new language implementations, or even new languages/dialects.

@tlively
Copy link
Member

tlively commented Feb 14, 2020

I am interested in the LLVM support problem, but I'm not familiar with the .NET ecosystem. Does .NET currently use LLVM, or is the question of LLVM support only coming up because LLVM is the only compiler toolkit that currently targets WebAssembly?

@vargaz
Copy link
Author

vargaz commented Feb 14, 2020

What I meant at (7) is that if this proposal is implemented, then LLVM would have to add all these type constructs to their IR somehow, and it's not clear how that can be done. Perhaps by using llvm metadata on types which is read by the wasm backend.

@tlively
Copy link
Member

tlively commented Feb 14, 2020

Right, but I'm wondering how LLVM relates to the .NET ecosystem. It's actually possible that we will not end up implementing GC or other features in LLVM if for instance no LLVM frontend could feasibly make use of those features. If you have an LLVM frontend that would like to use GC types, that would be very useful information.

@vargaz
Copy link
Author

vargaz commented Feb 14, 2020

Currently, our .net for WebAssembly project is built on top of LLVM, i.e. we compile .net bytecode to LLVM bitcode to wasm.

@aykevl
Copy link

aykevl commented Mar 20, 2021

@tlively A late reply from my side, now that I've discovered this issue.

I'm hitting very similar issues as .NET with TinyGo (which is based on LLVM), which also has a runtime that assumes objects are in linear memory. While the proposal seems interesting, I have a hard time imagining how this would fit in LLVM. It will probably also require major changes to the TinyGo compiler.

I really wish I could use the WebAssembly GC in TinyGo as using the regular TinyGo GC has many problems (such as circular references), but as it is now I don't see how that would feasibly work.

@tlively
Copy link
Member

tlively commented Jul 25, 2022

FWIW, @pmatos and @asb are working on adding support for reference types (eventually including GC types) to LLVM and even clang, so it is likely that LLVM-based languages and languages with C-based runtimes will eventually be able to use WasmGC, although it will still require some source changes.

@pmatos
Copy link

pmatos commented Jul 26, 2022

@aykevl As mentioned by @tlively , LLVM already has reference types support and we are in the process of adding support for this in Clang, see:
https://reviews.llvm.org/D122215
https://reviews.llvm.org/D128440
https://reviews.llvm.org/D123510
https://reviews.llvm.org/D124162

Next our work will focus on implementing the GC proposal in LLVM (something @asb has already started thinking about), and bringing that proposal to Clang.

@bashor
Copy link

bashor commented Sep 22, 2022

@pmatos, @asb
I'm wondering for which use cases you are going to support GC proposal in LLVM? In other words, who is the target client/audience of the feature?
How it would looks like for projects that wanted to use it? Will it be implemented on top of generic GC support in LLVM or something else?

(I'm not an expert in LLVM)

@asb
Copy link

asb commented Sep 23, 2022

@bashor primarily sharing object graphs across the wasm boundary.

Whether it will be built on LLVM's GC support is a good question (and a common one). Wasm's GC types and instructions are actually at quite a different level - you could imagine how it might have taken a different path where wasm code generators had to communicate information about the locations of GC types in memory (which would make LLVM's GC support more relevant), but that's not the direction it went in. Wasm GC types are heavily restricted in that you can't store them in linear memory, only store them in Wasm tables, locals, globals, and function params/returns. See the Wasm GC overview for a little bit more background.

Hope that helps.

@malekbr
Copy link

malekbr commented Sep 16, 2023

  1. Finalization
  2. Weak references

These are tough ones, and we don't have a good idea yet how to support the myriads of different finalisation semantics out there without creating a zoo. Very likely Post-post-MVP, but suggestions are welcome.

Is there documentation/a discussion log of the different finalization semantics to consider?

@rossberg
Copy link
Member

@malekbr, AFAICT, nobody has made any concrete suggestions so far. As mentioned in my reply, it's rather non-obvious. Now that the GC MVP is done, this topic could use a champion to investigate as a separate proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants