Skip to content

Idea: new prioritisation scheme for pyconvert rules #661

@cjdoris

Description

@cjdoris

Current status

Currently a pyconvert rule consists of:

  • the source python type t, which the rule can convert from;
  • the target julia type, T, which the rule can convert to;
  • the priority of the rule; and
  • the function func implementing the rule.

When pyconvert(R, x) runs, it first filters the list of rules according to t and T (roughly pyisinstance(x, t) and typeintersect(R, T) != Union{}). The rules are then ordered first by priority, then by the specificity of t, then by the order the rules were defined.

The priorities are:

  • jlwrap: for wrapped julia objects by just unwrapping them;
  • array: for array-like objects (buffers, numpy arrays, ...);
  • canonical: for the canonical conversion for a type, e.g. float to Float64;
  • normal: for all other reasonable conversions.

The priorities are a bit of a hack, to work around the fact that ordering by specificity of t isn't quite right. For example, we always want to convert julia objects by unwrapping them first, so we need their rules to come first, even if the object also happens to be a Mapping and we are converting to Dict, we don't want to use the generic Mapping to Dict rule. And if the object is array-like, we want to convert by getting at the underlying memory instead of using the generic Sequence to Array rule.

The proposal

So my proposal is to remove priority and replace it with:

  • the scope julia type S, which must be a supertype of T.

We further filter rules by S (R <: S, except if R isa Union then just one component has to match).

For ordering rules, we no longer order by priority, but do order by specificity of S, i.e. we prefer the smallest S that contains R.

You are only allowed to create rules where you "own" either t or S.

Discussion

This means you can only have S=Any if you own t. PythonCall will continue to "own" the Python standard library, and most rules in PythonCall will have S=Any. The exception is for some things currently in the normal priority. For example we convert None to Nothing canonically but can also go to Missing. In the new system, the rules will have T=Nothing, S=Any and T=S=Missing, so you generically get Nothing but can get Missing if you ask for it. Similarly tuple canonically converts to Tuple but can also go to Array, the rules for which will become T=Tuple, S=Any and T=Array S=AbstractArray, so you will get an Array if you specify Array or AbstractArray.

If you don't own t, then you must own S. This lets you define e.g. a generic conversion rule for list to some new MyArray you invented. But you can only use the rule if you specify pyconvert(MyArray, x). Doing pyconvert(AbstractArray, x) or pyconvert(Any, x) will not use the rule. Hence we have well-scoped rules, avoid piracy, avoid cases where the conversion rules applied depend on which packages are loaded.

In particular, since passing Python objects to Julia in JuliaCall normally uses pyconvert(Any, x), only rules created by the "owner" of pytype(x) are applied. This makes passing Python values around predictable - some third-party package defining their list to MyArray rule will not affect how list gets passed to Julia by default.

What about jlwrap and array?

I don't think this new scheme lets us still enforce doing these rules first. For example, we have a general rule t=JuliaAnyValue, T=Any, S=Any which unwraps the Julia object and converts it to the target type. We also have a general rule t=Mapping, T=Dict, S=Any. If we have a JuliaDictValue, which subtypes both JuliaAnyValue and Mapping and do pyconvert(Any, x), then filtering keeps both rules, and ordering by specificity of t is arbitrary - the MRO could be either way around. The only distinction is in T - does it make sense to order by the least specific T too??

Let's consider arrays, where we have t=<buffer>, T=PyArray, S=Any but also t=Sequence, T=PyList, S=Any. Again the ordering is arbitrary. In this case, we actually insert the pseudo-type <buffer> into the type ordering ourselves - currently we put it as high (near object, less specific) but maybe we should put it at the bottom (most specific) so it's always picked?? Or do this but with a second <high-priority-buffer> pseudo-type.

Or we just continue to use priorities, or special-case handling of rules for jlwrap and array - assume they are always high-priority. I don't know. Either way, all of this should be internal - the user-exposed functionality for rules should only specify t, T, S and func.

Worked examples

Here are some rules for t=list:

  • T=PyArray, S=Any: canonical conversion to a PyArray, used if you specify converting to PyArrayorAbstractArrayorAny`.
  • T=Array, S=DenseArray: used if you specify converting to Array or DenseArray, but AbstractArray gets you a PyArray.
  • T=Set, S=AbstractSet: used if you specify converting to Set or AbstractSet.
  • T=Tuple, S=Tuple: used if you specify converting to Tuple.

Some rules for t=None:

  • T=Nothing, S=Any: canonical
  • T=Missing, S=Missing: specify Missing (or Union{Missing, Foo})

Some rules for t=float:

  • T=Float64, S=Any: canonical
  • T=Float32, S=Float32: specify another float type
  • T=Number, S=Number: specify another non-float number type such as Integer
  • T=Missing, S=Missing (for NaN)
  • T=Nothing, S=Nothing (for NaN)

Some examples for converting a float:

  • to Any: only rule 1 applies (filtering on S)
  • to Float32: rules 2 and 3 apply (rule 1 ignored due to T, others due to S), and Float32 <: Number so rule 2 is tried first.
  • to Integer: only rule 3 applies (filtering on S).
  • to Union{Integer, Missing}: rules 3 and 4 apply (filtering on S). No subtype relationship between Number and Missing so fall back to definition order so rule 3 is tried first.

Pros and cons

Pros:

  • Strict ownership of rules - avoids piracy.
  • Return type of pyconvert more predictable.
  • Clearer semantics/rule ordering than currently.
  • Rule ordering on t and S will be mostly unique - definition order is mostly for unions.
  • The number of applicable rules is massively cut down by filtering on S (usually to 1).
  • Easy to "opt in" to a conversion rule by being more specific about what you are converting to (see the MyArray example above).

Cons:

  • People might still pirate (i.e. make rules with S=Any for which they don't own t).
  • pyconvert(Union{AbstractArray,MyArray}, x) does not do what you might expect (use the generic AbstractArray rules plus the special MyArray rule) because the union gets normalised down to AbstractArray first, so the MyArray rule is never considered. You need to take more specific unions like Union{PyArray,Array,MyArray} which is annoying. We could make a helper function to create such a union for you.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions