Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation-defined limits #335

Closed
tlively opened this issue Nov 1, 2022 · 26 comments · Fixed by #360
Closed

Implementation-defined limits #335

tlively opened this issue Nov 1, 2022 · 26 comments · Fixed by #360

Comments

@tlively
Copy link
Member

tlively commented Nov 1, 2022

The JS spec has a section on implementation-defined limits. Several new limits will have to be introduced for the GC proposal.

  1. The limit on the number of types is currently one million. We should define a limits on the number of rec groups and the size of each rec group. Should we reinterpret the current limit as the limit on rec groups or should we additionally enforce a limit on the total number of types across all rec groups? Should we put the total number of types into the binary format of the type section so engines can continue to check this limit eagerly?

  2. We should define a limit on the number of fields allowed in a struct.

  3. We should define a limit on the number of arguments array.new_fixed can take (keeping in mind that this will be used to initialize very large global arrays assuming Initializing array from data or element segment #260 is not solved for the MVP).

  4. We should define runtime limits on the size of arrays.

And I am probably missing additional limits as well.

@titzer
Copy link
Contributor

titzer commented Nov 2, 2022

I don't think we should limit (4); it's more akin to a limit on the number of runtime memory pages, or an OOM situation. The other examples are limitations on the size of modules the engine will accept.

For 1, I suggest we keep the limit of 1 million (total) types and additionally limit the number in a recursion group to 1 million.

@tlively
Copy link
Member Author

tlively commented Nov 2, 2022

I don't think we should limit (4); it's more akin to a limit on the number of runtime memory pages, or an OOM situation.

FWIW this section of the spec also specifies that at runtime "the maximum number of pages of a memory is 65536" so there is precedent and opportunity to specify runtime limits as well. I don't feel strongly that it's necessary to do so, though.

@bvibber
Copy link

bvibber commented Nov 3, 2022

The memory size limit is the maximum addressable size, so isn't really a policy limit. :)

With arrays there are two problems: exhausting the heap with a single object that can't be allocated (especially when using 32-bit heaps such as using compressed pointers), or allocating and using so much memory that it causes the client machine to swap to death (if the arrays themselves are stored outside the heap, like ArrayBuffers in JS, or with a 64-bit heap). In general browser and node environments handle these cases poorly but everybody assumes it's fine to crash the tab or hang the computer.

@eqrion
Copy link
Contributor

eqrion commented Dec 7, 2022

Another limit we'll need to decide is the maximum subtype depth for type definitions. This is useful for preventing an n^2 metadata size growth for implementations that store all transitive super types in a vector for fast subtyping checks. I believe V8 has a limit of 31, and SM has followed that for our casting implementation.

@tlively
Copy link
Member Author

tlively commented Dec 7, 2022

31 seems low. In my brief searching I couldn't find any good measurements of maximum Java class depth seen in the wild, but this post from 2006 suggests that at that point the JVM supported a depth of 60: https://www.javaspecialists.eu/archive/Issue121-How-Deep-is-Your-Hierarchy.html. Would a limit of 64 (or 63) be acceptable?

@eqrion
Copy link
Contributor

eqrion commented Dec 7, 2022

That seems fine to us. I'm not sure if @jakobkummerow has a different opinion though as 31 came from V8.

Also regarding dynamic array size limits, we have one internally to simplify some code generation logic. It's a byte length limit of ~1.9 GiB (1987654321 bytes). We probably could round that up to 2GiB, and that would align with our 32-bit platform ArrayBuffer limit.

@eqrion
Copy link
Contributor

eqrion commented Dec 7, 2022

Regarding recursion group and type definitions limits, I would suggest we limit the total recursion groups to 1 million (effectively re-interpreting the existing limit) and add a limit of 1 million total types (when recursion groups are flattened to form the type index space). This will ensure that either index space (recursion groups or type definitions) have a nice upper bound, which is where these limits help out the most IMO.

A pre-declaration of the total type definitions would be nice for being able to pre-size some buffers, but I don't think that's easily add-able to the binary format and so I'm fine with the limit not being triggered eagerly.

@jakobkummerow
Copy link
Contributor

We picked 31 as the subtyping depth limit back when we needed to store value kind, heap type index, and subtyping depth in a single 32-bit field (for explicit "rtt with depth" values, which have since been scrapped from the proposal). With the current design there is no shortage of bits, so I have no hard objections to raising the subtyping depth limit to 63 (or 60, if we think the ability to store internal sentinel values might come in handy). That said, I also haven't yet heard of a case where 31 would have been insufficient, so I'm also fine with keeping 31 until someone actually runs into that limit.

@eqrion
Copy link
Contributor

eqrion commented Dec 8, 2022

If we add explicit rtt's back, will they require the depth to be part of the value type as before? If so, then I'd prefer to start with limiting to 31 and expand if needed in the future. We ran into the bit packing problem as well, and it's nice to spare the bits.

@tlively
Copy link
Member Author

tlively commented Mar 3, 2023

I've scheduled this for discussion at Tuesday's subgroup meeting. It would be great if we could close this issue out at that meeting.

Right now the leading proposed limits are:

  • 31 as the maximum subtyping depth limit
  • 1 million recursion groups
  • 1 million total types (unchanged from existing limit)
  • No limit on array sizes (besides what is addressable)

We have not discussed limits on:

  • number of struct fields
  • number of operands to array.new_fixed

@rossberg
Copy link
Member

rossberg commented Mar 4, 2023

Probably won't be able to attend the next meeting, but I'm generally agnostic to the specifics of limits. Merely the 31 subtyping depth limit seems very low -- if I understood @chambart and @zapashcanon correctly, then they currently use much deeper hierarchies for their Ocaml backend.

@jakobkummerow
Copy link
Contributor

Array sizes: I'm fine with not specifying a particular limit. FWIW, V8 currently has a general per-object size limit of 1 GiB on its managed heap.

Struct fields: We initially tried to get by with a limit of 999 struct fields, but got feedback from partners that that wasn't enough. Our currently implemented limit is 2,000, and that appears to be sufficient. I wouldn't object to raising it a little further; that said for implementing fast trap-handler based null checks it's convenient if the maximum struct size isn't overly large, so I wouldn't raise the limit much farther than necessary.

array.new_fixed: We initially had the same 999 limit there, and got feedback from partners that that wasn't enough, so we raised it to 10,000, which appears to be sufficient. The operands are accumulated on the value stack, which is typically implemented on the machine stack, which imposes much lower limits than the resulting heap-allocated array: 10,000 is fine, 50,000 would probably also still be fine, 100,000 is seriously doubtful, more than that would cause major headaches. (Somewhat related: Wasm limits the number of locals in a function to 50,000.) So I'd vote for 10,000 until evidence emerges that we need to increase it a bit.

Subtyping depth: I have nothing to add to the discussion above (the handful of posts dated December 2022).

@askeksa-google
Copy link

For its closure representation, Dart currently uses a subtyping depth corresponding to the (maximum) number of positional parameters. This could be optimized to skip parameter counts that do not occur in the program. Still, 31 seems low.

@chambart
Copy link

chambart commented Mar 7, 2023

For OCaml we have different representation strategies for structs.
One is obviously to encode OCaml struct to wasm struct, but we need size n struct is a subtype of size n + k. So we have a very long subtyping chain like

(type $s1 (struct (field (ref eq))))
(type $s2 (sub $s1 (struct (field (ref eq)) (field (ref eq)))))
(type $s3 (sub $s2 (struct (field (ref eq)) (field (ref eq)) (field (ref eq)))))
...

And the largest 'struct' in OCaml are compilation units (which are also modules that the user can manipulate). Those tend to be quite large and can have hundred or thousands of fields.

For that compilation strategy we would need long subtyping chains. The alternative would be to compile every struct to an array which would require bound checks for every field access and limit the precision of the type that we represent since every field would have be of the same type.

@manoskouk
Copy link
Contributor

Since we have some minor concerns about the struct field limit, we might be incentivised to keep it low. Implementations can always implement larger structs with an additional indirection.

@tlively
Copy link
Member Author

tlively commented Mar 7, 2023

In the subgroup meeting today we discussed these limits:

  • Struct size: 10k elements
  • array.new_fixed operands: 10k operands
    • We should try to support larger arrays by making array.new_data and array.new_elem constant post-MVP

For subtype depth, there were concerns about allowing extremely deep subtyping hierarchies because the space or time costs of casts would become unreasonable. @chambart, how high would you need the subtype depth limit to be for your compilation scheme to work on real-world code?

I'm also pretty sure it would be possible to implement casting for deep hierarchies with a logarithmic time and space cost. In addition to the array of k immediate supertypes used for constant time casting, each RTT would store an array of exponentially more distant supertypes. For example, each RTT could store up to 32 immediate supertypes, then the 64th, 128th, 256th, etc. supertypes. To find the 100th supertype, it would suffice to find the 4th supertype of the 32nd supertype of the 64th supertype. Obviously this would be more complicated, but it would be a significant improvement over linear search for large hierarchies.

@chambart
Copy link

chambart commented Mar 8, 2023

I don't know which limit exactly would be the point where I consider switching, but if it is in the thousand I would be happy with it.
Also, if there is this kind of strategy for implementing RTT checks, It is probably a good idea to specify limits where the casts are expected to be fast and when it is not. Otherwise the cost model of different runtimes could potentially differ a lot.

@tlively
Copy link
Member Author

tlively commented Mar 9, 2023

After talking with the V8 team today, it sounds like we would be ok with having a higher limit (e.g. 1000), but we probably wouldn't spend much time optimizing for deep hierarchies right away. We're mostly likely to keep constant time casting without any additional casting schemes to start, so deep hierarchies would consume a good deal of memory. Does that sound like an acceptable outcome?

Edit: alternatively we might switch over to linear-time casting for large hierarchies (e.g. depth more than about 100), which would have a performance downside rather than the space downside.

@chambart, unfortunately the spec doesn't have any precedent or mechanism to specify algorithmic complexity or performance. In principle we could add a non-normative note, but I think typically we try to stay out of the business of specifying performance.

@tlively
Copy link
Member Author

tlively commented Mar 9, 2023

@rossberg, since you have historically been a proponent of predictable performance, I'd be curious to hear your thoughts on how to balance the desire to support deep casting hierarchies with the time/space trade off that would require.

@chambart
Copy link

@tlively I kind of feel that on the contrary, the cost model has always been obvious for everything. The gc and exception proposals are the first ones where there are real choices in potential implementation strategies that would affect the algorithmic cost. For instance, it has always been quite clear that a cast until now was a cheap constant time operation.

I would rather have a somewhat low limit where I know the cost of casting than a high limit where the best compilation strategy depends on the runtime.

Of course for my purpose it is simpler to have a high limit and don't care about the size of structs, but with short subtyping chain we could still emulate large structs with struct trees. This would have a logarithmic cost like what you are suggesting. with a cast per tree level. But given that the vast majority of struct allocated are small, very few trees of would be created and the overall overhead would be quite small.

@tlively
Copy link
Member Author

tlively commented Mar 13, 2023

Thanks for the input, @chambart. In that case, it sounds like we'll get the strongest consensus for a depth limit of 63 for now. We can consider raising it in the future and discuss complexity/performance/memory trade offs at that point.

@rossberg
Copy link
Member

Yes, I don't think engines should do a linear fallback. Better have a hard limit. 63 still seems rather low, though.

Out of curiosity, all other limits are exponents of 10, I believe, why not this one?

@tlively
Copy link
Member Author

tlively commented Mar 15, 2023

63 comes from a desire to make it possible for the depth to be encoded in some of the representation bits of a future explicit RTT type without an extra indirection. I'm not sure exactly how all the bits are allocated, so I don't know if 127 or 255 would also work, for example.

@eqrion and @jakobkummerow, how many bits do we have available, assuming we want to maintain the ability to use bit packing for future RTT types?

So far we have not seen any use cases that need a depth greater than 63 but also do not need depths that would require linear or logarithmic fallbacks, so it seems like a fine initial limit. We can always consider raising it later if a need arises.

@jakobkummerow
Copy link
Contributor

The bit packing we had (back when we supported RTT-with-depth) for a 32-bit ValueType representation was:

  • 5 bits for the "kind" (i32|...|ref|ref null|...). Checking this today, I think we currently only store 12 different values there, and I don't remember why we reserved more than 4 bits; possibly for shorthands like funcref|eqref|etc.
  • 20 bits for the referenced type index (if "kind" is a reference type). This is capped by 1 million max types, which is just under 1<<20, conveniently leaving space to encode a handful of generic types using sentinel values.
  • 5 bits for the depth (if "kind" is "rtt with depth"), leading to a max depth of (1<<5)-1 == 31 (which apparently was enough for anyone who used our prototype back then).
  • 2 unused bits for future use (e.g. more "kinds"), one of which would be used up for increasing the max depth from 31 to 63.

We could probably come up with a different packing scheme if necessary, but why make it necessary without a driving use case?

Most but not all other limits are powers of 10. Module size, function body size, number of locals, and number of memory pages are not.

@eqrion
Copy link
Contributor

eqrion commented Mar 15, 2023

We currently use 64-bits for ValType with:

  • 8 bits for 'type code' of i8-f64, externref, anyref, type reference, etc
  • 1 bit for nullability
  • 48 bits for a pointer to a canonicalized type definition for when the code is a type reference
  • 2 bits reserved for tagging of common cases of ResultType/BlockType

That leaves 5 bits free currently, for 0-31 range of rtt depths. There is some slop though, we probably could rework the type code and nullable to free a bit. We also could free one to three bits from the type definition pointer by taking advantage of alignment and pointer ranges better.

So we're comfortable with a limit of 63 as 6-bits for depth seems reasonably future proof, with the ability to be raised later.

tlively added a commit that referenced this issue Mar 22, 2023
Based on discussion in #335 and subgroup meetings.
tlively added a commit that referenced this issue Mar 23, 2023
Based on discussion in #335 and subgroup meetings.
tlively added a commit that referenced this issue Mar 23, 2023
Based on discussion in #335 and subgroup meetings.
tlively added a commit that referenced this issue Mar 23, 2023
Based on discussion in #335 and subgroup meetings.
@tlively
Copy link
Member Author

tlively commented Mar 23, 2023

Resolved by #360. If there are any other limits we should specify, please open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants