You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've done some initial protoyping for (reified) generics in nikic/php-src#3. I want to discuss some implementation aspects here. Some additional reading on relevant issues:
Some considerations on why I don't think a purely monomorphized implementation is going to fly: Monomorphized generics #44
Important foundational question on when generic type arguments are resolved. A lot of implementation considerations would ride on this: When are generic arguments resolved? #43
The prototype focussed on making something minimal work, which is "purely abstract" generics involving simple inheritance:
Note that only the abstract class has unbound type parameters. The concrete class doesn't, so there are no generic parameters on the object instantiation.
Generic arguments are stored using the same structure as union types:
A big part of implementing generics is replacing existing zend_class_entry* pointers with a structure that holds both zend_class_entry* and the list of generic type arguments:
As most class references are expected to be to classes without generic arguments, we apply the following optimization for that case: Each class entry is allocated with an additional header, that effectively looks like this:
This allows cheaply converting any class entry into a class reference without generic arguments. Additionally, accesses to the class entry through such a reference should be rather cheap, because the header will be in the same cache line as start of the class entry.
Unfortunately, there are quite a few cases where we don't work on class entries, but on string names of classes. Important examples are unresolved type declarations, as well as pre-inheritance parent/interface references and class references inside opcodes. The structure for this type of reference is principally the same, just replacing a class entry with a name:
However, in this case we do not have such a convenient way to represent name references without arguments. After all, we do not want to add an additional header to each interned string.
Instead, we define an additional "packed name reference" (abbreviated PNR in macros and function names):
typedef uintptr_t zend_packed_name_reference;
This is an align-tagged union between a simple zend_string* (no arguments) or zend_name_reference (arguments).
A quick grep tells me that php-src has 1500 references to zend_class_entry, and probably quite a few more uses than that through members, and a large part of these need migration to zend_class_reference. I only migrated parent references and type declarations in my prototype.
Apart from storing generic type arguments on class references, we also need to deal with references to generic type parameters, such as in public function method(T $param). Each generic parameter is assigned an parameter ID, which is stored as a new zend_type kind.
We now need to consider how we can access the actual bound type argument efficiently. Consider the following example:
Pre-inheritance, the generic parameter ID of T1, T2 and T3 is 0, as they are each the first generic parameter of their respective class. During inheritance, we shift the generic parameter ID by the number of bound generic parameters on the parent. This means that T1 has ID 0, T2 has ID 1 and T3 has ID 2. Class D would store the following array of bound generic arguments:
bound_generic_args[0]: int
bound_generic_args[1]: string
bound_generic_args[2]: array
This includes the necessary elements for parent and grandparent classes as well.
While this scheme is sufficient for the purpose of simple abstract inheritance, as implemented in the protoype, we should also consider how we can extend it to other situations.
First, for interfaces we do not have to worry about resolution of generic type parameters at runtime, as interfaces are only checked during inheritance.
For traits, generic parameter IDs cannot be shifted in this manner, as there may be multiple traits. Instead, we would probably actually perform some form of monomorphization for this particular case, where we copy the trait methods and replace type parameters.
One larger problem is what to do with generic parameter references in opcodes, for example for $x instanceof T. As opcodes are immutable during inheritance, we cannot shift the parameter in this case. I don't have an idea on how to make this really efficient. I believe this will involve looking up the correct offset to use at runtime, for the particular declaring class and the particular calling class.
Finally, this only considers generic arguments bound in the class declaration. What about free generic arguments bound on instantiation? These would be stored in the class reference on the object. A generic parameter lookup would have to picked one or the other using something like:
uint32_t num_bound_generic_args = ce_ref->ce->num_bound_generic_args;
if (param_id < num_bound_generic_args) {
type = &ce_ref->ce->bound_generic_args[param_id];
} else {
type = &ce_ref->args.types[param_id - num_bound_generic_args];
}
The alternative is to copy the class arguments in the object class reference as well. This might be a non-trivial increase in memory usage though, if deep hierarchies are involved.
The text was updated successfully, but these errors were encountered:
Another thing to consider is how we create and memoize the zend_class_references. In my prototype they're only used for the list of parent references, of which there are few, so I just allocate them. If they're used for each object (those class has free generic parameters), these will have to be memoized.
For that purpose, I would suggest to use a hashtable mapping from type string to class reference on a best-effort basis. The class references will not be unique, for example Collection<Foo> and Collection<Bar> might refer to an equal class reference if Foo and Bar are aliases.
The zend_name_reference could get an extra field that stores the full type string and will be used to look up an existing class reference, before running full type resolution. This class reference could also be cached in an opline runtime cache easily, because the opline doesn't own it.
A caveat here are generic parameters used inside the type arguments, such as Collection<T>. I think the way to go about that would be to use an encoded string like `"Collection<\0{generic-param-id}>" and look up the class reference that still contains the parameter ID (as that can be cached) and then create the actual class reference from that ... somehow. Possibly such a class reference would hold a hashtable for specific type substitutions.
I've done some initial protoyping for (reified) generics in nikic/php-src#3. I want to discuss some implementation aspects here. Some additional reading on relevant issues:
The prototype focussed on making something minimal work, which is "purely abstract" generics involving simple inheritance:
Note that only the abstract class has unbound type parameters. The concrete class doesn't, so there are no generic parameters on the object instantiation.
Generic arguments are stored using the same structure as union types:
A big part of implementing generics is replacing existing
zend_class_entry*
pointers with a structure that holds bothzend_class_entry*
and the list of generic type arguments:As most class references are expected to be to classes without generic arguments, we apply the following optimization for that case: Each class entry is allocated with an additional header, that effectively looks like this:
This allows cheaply converting any class entry into a class reference without generic arguments. Additionally, accesses to the class entry through such a reference should be rather cheap, because the header will be in the same cache line as start of the class entry.
Unfortunately, there are quite a few cases where we don't work on class entries, but on string names of classes. Important examples are unresolved type declarations, as well as pre-inheritance parent/interface references and class references inside opcodes. The structure for this type of reference is principally the same, just replacing a class entry with a name:
However, in this case we do not have such a convenient way to represent name references without arguments. After all, we do not want to add an additional header to each interned string.
Instead, we define an additional "packed name reference" (abbreviated PNR in macros and function names):
This is an align-tagged union between a simple
zend_string*
(no arguments) orzend_name_reference
(arguments).A quick grep tells me that php-src has 1500 references to
zend_class_entry
, and probably quite a few more uses than that through members, and a large part of these need migration to zend_class_reference. I only migrated parent references and type declarations in my prototype.Apart from storing generic type arguments on class references, we also need to deal with references to generic type parameters, such as in
public function method(T $param)
. Each generic parameter is assigned an parameter ID, which is stored as a new zend_type kind.We now need to consider how we can access the actual bound type argument efficiently. Consider the following example:
Pre-inheritance, the generic parameter ID of
T1
,T2
andT3
is 0, as they are each the first generic parameter of their respective class. During inheritance, we shift the generic parameter ID by the number of bound generic parameters on the parent. This means thatT1
has ID 0,T2
has ID 1 andT3
has ID 2. ClassD
would store the following array of bound generic arguments:This includes the necessary elements for parent and grandparent classes as well.
While this scheme is sufficient for the purpose of simple abstract inheritance, as implemented in the protoype, we should also consider how we can extend it to other situations.
First, for interfaces we do not have to worry about resolution of generic type parameters at runtime, as interfaces are only checked during inheritance.
For traits, generic parameter IDs cannot be shifted in this manner, as there may be multiple traits. Instead, we would probably actually perform some form of monomorphization for this particular case, where we copy the trait methods and replace type parameters.
One larger problem is what to do with generic parameter references in opcodes, for example for
$x instanceof T
. As opcodes are immutable during inheritance, we cannot shift the parameter in this case. I don't have an idea on how to make this really efficient. I believe this will involve looking up the correct offset to use at runtime, for the particular declaring class and the particular calling class.Finally, this only considers generic arguments bound in the class declaration. What about free generic arguments bound on instantiation? These would be stored in the class reference on the object. A generic parameter lookup would have to picked one or the other using something like:
The alternative is to copy the class arguments in the object class reference as well. This might be a non-trivial increase in memory usage though, if deep hierarchies are involved.
The text was updated successfully, but these errors were encountered: