-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VM: Introduce a new method of encoding arrays. #199
Conversation
This introduces an experimental new feature called "direct arrays". One of the most confusing and complicated aspets of SourcePawn is how multi-dimensional arrays are encoded. Each non-terminal level requires an "indirection vector" - an intermediate array that represents the next set of arrays. That's normal, and is similar to how any language implements multi-level array pointers. The difference with SourcePawn is that the addresses are encoded relative to the base address that was last computed. It makes the compiler code *extremely* gross, it adds extra instructions on the array access path, and it makes the VM support code for GENARRAY very complicated. Lastly, it makes it very difficult to swap two slots in a nested dimension: all the relative calculations are wrong. Direct arrays are much simpler. Addresses are absolute. Generating the indirection vectors is trivial, and so is accessing slots in the array. Unfortunately, the old scheme did have an advantage: it made it possible to memcpy() an initializer out of DAT, and into the stack, to populate an array. Since all the internal addresses were relative, nothing needed to be rebased or fixed up. With this new scheme we have to introduce a new opcode called REBASE. REBASE takes an address in PRI, and three constants: the DAT offset where the array initializer lives, the size of the indirection vector table, and the size of the terminal dimension data. REBASE performs a memcpy() over to the stack address, and then goes through and applies an offset to fix each address in the indirection vector section. It will be slightly less performant, but it should still be quite fast, and I'd like to inline this later on if/when the feature becomes permanent. Compilers that use this scheme must error when using a native that takes a multi-dimensional array. The old algorithms for accessing them no longer work. I'll propose an alternative and give sorting.inc a treatment before merging this.
Currently, no compiler supports this new array schema. When it does, it will be illegal to call old natives like "SortCustom2D" that take a multi-dimensional array. Natives like this are extremely rare since there is no API to decode the indirection vectors. It has to be done by hand. Nonetheless, at least two exist, and they'll have to be versioned so they can use the correct API. My tentative plan is to extend The binding logic in PluginRuntime would need to be updated to make sure a native is only bound if it can satisfy the feature level a plugin requested. This scheme is a bit complicated since it requires a new version of the |
so multi-dim arrays are implemented similar to an |
tools/smxtools/smxdasm/Headers.cs
Outdated
// This case is easy... we can just read the rest of the file. | ||
rd.Read(header.Data, Size, header.ImageSize - Size); | ||
var new_stream = new MemoryStream(header.Data, Size, header.ImageSize - Size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is redundant since that exact code was moved below the switch already.
eb4a2e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I missed that.
@assyrianic That would be a fine thing to do under other circumstances... it's not clear to me it would work in Pawn. You can coerce an array to a set of pointers, which you can't do in C (it has the data layout you mentioned):
This wouldn't work since the sizes of the intermediate dimensions are not known. We couldn't embed the sizes either, because Pawn supports "slicing" arrays. Another thing is it would make supporting garbage-collected arrays much more difficult. You wouldn't be able to pass around interior references without holding the entire base object alive. |
Leaving sorting.inc for later. I think the simplest thing is to decorate the name. |
This introduces an experimental new feature called "direct arrays". One of the most confusing and
complicated aspets of SourcePawn is how multi-dimensional arrays are encoded. Each non-terminal
level requires an "indirection vector" - an intermediate array that represents the next set of
arrays. That's normal, and is similar to how any language implements multi-level array pointers.
The difference with SourcePawn is that the addresses are encoded relative to the base address that
was last computed. It makes the compiler code extremely gross, it adds extra instructions on the
array access path, and it makes the VM support code for GENARRAY very complicated. Lastly, it makes
it very difficult to swap two slots in a nested dimension: all the relative calculations are wrong.
Direct arrays are much simpler. Addresses are absolute. Generating the indirection vectors is
trivial, and so is accessing slots in the array.
Unfortunately, the old scheme did have an advantage: it made it possible to memcpy() an initializer
out of DAT, and into the stack, to populate an array. Since all the internal addresses were
relative, nothing needed to be rebased or fixed up. Even when a multi-dimensional array has no initializer, it still has a "template" in DAT that needs to be memcpy'd to lay out the levels of indirection.
With this new scheme we have to introduce a new opcode called REBASE. REBASE takes an address in
PRI, and three constants: the DAT offset where the array initializer lives, the size of the
indirection vector table, and the size of the terminal dimension data. REBASE performs a memcpy()
over to the stack address, and then goes through and applies an offset to fix each address in the
indirection vector section. It will be slightly less performant, but it should still be quite fast,
and I'd like to inline this later on if/when the feature becomes permanent.
Compilers that use this scheme must error when using a native that takes a multi-dimensional array.
The old algorithms for accessing them no longer work. I'll propose an alternative and give
sorting.inc a treatment before merging this.