Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
VM: Introduce a new method of encoding arrays. #199
This introduces an experimental new feature called "direct arrays". One of the most confusing and
The difference with SourcePawn is that the addresses are encoded relative to the base address that
Direct arrays are much simpler. Addresses are absolute. Generating the indirection vectors is
Unfortunately, the old scheme did have an advantage: it made it possible to memcpy() an initializer
With this new scheme we have to introduce a new opcode called REBASE. REBASE takes an address in
Compilers that use this scheme must error when using a native that takes a multi-dimensional array.
This introduces an experimental new feature called "direct arrays". One of the most confusing and complicated aspets of SourcePawn is how multi-dimensional arrays are encoded. Each non-terminal level requires an "indirection vector" - an intermediate array that represents the next set of arrays. That's normal, and is similar to how any language implements multi-level array pointers. The difference with SourcePawn is that the addresses are encoded relative to the base address that was last computed. It makes the compiler code *extremely* gross, it adds extra instructions on the array access path, and it makes the VM support code for GENARRAY very complicated. Lastly, it makes it very difficult to swap two slots in a nested dimension: all the relative calculations are wrong. Direct arrays are much simpler. Addresses are absolute. Generating the indirection vectors is trivial, and so is accessing slots in the array. Unfortunately, the old scheme did have an advantage: it made it possible to memcpy() an initializer out of DAT, and into the stack, to populate an array. Since all the internal addresses were relative, nothing needed to be rebased or fixed up. With this new scheme we have to introduce a new opcode called REBASE. REBASE takes an address in PRI, and three constants: the DAT offset where the array initializer lives, the size of the indirection vector table, and the size of the terminal dimension data. REBASE performs a memcpy() over to the stack address, and then goes through and applies an offset to fix each address in the indirection vector section. It will be slightly less performant, but it should still be quite fast, and I'd like to inline this later on if/when the feature becomes permanent. Compilers that use this scheme must error when using a native that takes a multi-dimensional array. The old algorithms for accessing them no longer work. I'll propose an alternative and give sorting.inc a treatment before merging this.
Currently, no compiler supports this new array schema. When it does, it will be illegal to call old natives like "SortCustom2D" that take a multi-dimensional array. Natives like this are extremely rare since there is no API to decode the indirection vectors. It has to be done by hand. Nonetheless, at least two exist, and they'll have to be versioned so they can use the correct API.
My tentative plan is to extend
The binding logic in PluginRuntime would need to be updated to make sure a native is only bound if it can satisfy the feature level a plugin requested.
This scheme is a bit complicated since it requires a new version of the
@assyrianic That would be a fine thing to do under other circumstances... it's not clear to me it would work in Pawn.
You can coerce an array to a set of pointers, which you can't do in C (it has the data layout you mentioned):
This wouldn't work since the sizes of the intermediate dimensions are not known. We couldn't embed the sizes either, because Pawn supports "slicing" arrays.
Another thing is it would make supporting garbage-collected arrays much more difficult. You wouldn't be able to pass around interior references without holding the entire base object alive.