Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Multiple dispatch with existing inline cache infrastructure and abstractions for dispatch specifiers #2430

Draft
wants to merge 45 commits into
base: master
Choose a base branch
from

Conversation

timor
Copy link
Contributor

@timor timor commented Feb 14, 2021

Motivation

This is intended to provide a backend implementation for #2364 which re-uses the
existing inline caching and generics infrastructure as much as possible.

Modifications

classes.dispatch

dispatch-type is a new classoid which captures the notion of having distinct types for
describing dispatch at generic call sites. This is intended as a basis for
having more expressiveness for the dispatch specifier, and being able to add
them in a modular way. This PR uses covariant tuples to implement regular
multiple dispatch, and adds eql specifiers and class methods as an example.
As a preliminary syntax, MM: generic ( <typed-effect> ) ... ; is used for
multi-method definition and a D( ) construct is used with M: D( ... ) generic ... ; to
illustrate how more advanced dispatch specifiers could be written down if it
turns out that not everything is easily expressible in the stack effect.
M: D( class1 class2 ) generic ... ; is the same as
MM: generic ( x: class1 x: class2 -- ... ) ... ;.

Constructor protocols defined explicitly on classes also come to mind (UPDATE: see example below).

Covariant Tuples

This is the dispatch specifier implemented as a dispatch-type which captures
the notion of covariant subtyping
(inspired by Julia) of call-site parameter tuples. This is also allows re-using
the existing method definition infrastructure and class-algebra functionality,
because internally, multi-methods are specialized on an instance of this.

This has nothing to do with Factor's tuple classes, and is only an
instance of a dispatch-type.

Eql specializers

The Eql specializer is incorporated into this with a \=, i.e.
M: D( \= fixnum ) generic ... ; would actually dispatch on the literal word
fixnum on the stack. This can be used to dispatch on special cases of a class's instance
The main difference to predicate classes is that this
is ad-hoc at the method definition.

Example:

USE: classes.dispatch.syntax
IN: scratchpad GENERIC: answer? ( question -- ? ) multi
IN: scratchpad M: object answer? drop f ;
IN: scratchpad M: D( \= 42 ) answer? drop t ;
IN: scratchpad 111 answer? .
f
IN: scratchpad 42 answer? .
t

Class specializers and Class Methods

A class-specializer dispatch type is defined which can be used to specialize methods on class words, respecting inheritance. This is used to define an example syntax for class methods.

Here is a simple example of constructor inheritance. CM: is like M:, but dispatches on the class itself, not on an instance.

USE: classes.dispatch.syntax
IN: scratchpad TUPLE: foo a ;
IN: scratchpad TUPLE: bar < foo b ;
IN: scratchpad GENERIC: <frob> ( class -- instance ) multi ! multi decorator wouldn't be needed here because CM: turns it into a multi-combination
IN: scratchpad CM: foo <frob> new 42 >>a ;
IN: scratchpad CM: bar <frob> call-next-method 69 >>b ;
IN: scratchpad foo <frob>

--- Data stack:
T{ foo f 42 }
IN: scratchpad bar <frob>

--- Data stack:
T{ foo f 42 }
T{ bar f 42 69 }

For multiple dispatch with the D( ) syntax, it re-uses the wrapper syntax.
So to implement double dispatch on classes, the following would be used right now:

IN: scratchpad M: D( \ foo ) <frob> new 42 >>a ;
! Same as CM: foo <frob> new 42 >>a ; for single dispatch

! Double dispatch example
IN: scratchpad MGENERIC: 2class-thing ( class class -- obj )
IN: scratchpad M: D( \ foo \ bar ) 2class-thing nip new ;
IN: scratchpad M: D( \ object \ bar ) 2class-thing 2drop 47 ;
IN: scratchpad foo bar 2class-thing .
T{ bar }
IN: scratchpad tuple bar 2class-thing .
47

classes.algebra

(class<=) is extended to dispatch into dispatch<= if one of the two
arguments is a `dispatch-type**. This is just a convenient way to re-use most
of the class algebra and caching infrastructure. Conceptually, the dispatch
types probably live in their own space...

generic, generic.single

Added lookup-methods generic, which is used to find all methods that depend on a specific class. For single generics this is simply the old behavior, while for multi-generics, a list of methods can be returned for a single class. This is used to correctly forget all methods which have that class in it's dispatch specification. The alternative would be to actually generate a new classoid for each dispatch specifier, which would then also be forgotten along with the class it contains, and the corresponding methods with it.

generic.multi

This vocab implements a new method combination multi-combination, and some
very basic Syntax support, but mostly the dispatch code generation logic.

This is inspired by amongst others, by this
paper
,
but for the generation of the multiple dispatch decision tree. However, there
is zero optimization implemented yet.

The basic idea is to insert a step which converts the "methods" word property,
which contains specializers of type covariant-tuple into nested regular
dispatch decision trees, inserting indirections (similar to predicate engine
words), which have their own inline cache though. For this purpose, a new
dispatch engine is provided which is basically the same as the
tag-dispatch-engine, but compiles intermediate tag dispatchers instead of only
top-level ones.

Compiler

A new entry point was added to allow combinations to save compilation errors.
This is used to detect ambiguous methods, which can then be resolved in later
compilation units. Maybe some definition warning infrastructure would be better
suited for that?

Optimizing Compiler

Based on the code for inlining single-generics, the same thing is done for
multi-method call sites, so that the most specific method is also selected if
possible.

What works

Defining Multimethods

Use MGENERIC: syntax to define a multi-combination generic word. For MM:, the classes are specified in the effect of the
method definition like the TYPED: syntax. Methods can be defined using MM:, CM:, and M:, including the special dispatch syntax M: D( ... ) method described above.

e.g.

MGENERIC: foo ( x x -- x )
MM: foo ( x: fixnum y: tuple -- x ) ... ;

! And
MM: foo ( x: object y: object -- x ) ... ;
! Is the same as
M: object foo ... ;

Generation of Dispatch logic

Naive successive lookup on the arguments using the megamorphic inline
cache and PIC mechanism. Note that the nested dispatchers by default do not
generate a PIC stub, only a mega-cache-lookup. This can be enabled, I have
only done 2 tests on this, and for one of them the additional indirection
actually seemed to have a negative impact on performance.

Note that most of the benefits only apply to tuple and built-in class dispatch,
and multiple dispatch on predicate classes (including singletons) will generate nested
predicate engine words like before.

Detection of ambiguous method definition

This PR adopts the "symmetric multi-dispatch" approach, where it is an error if
there are two equally specific methods for a set of values to dispatch on. This
is in contrast to e.g. CLOS, and maybe also to the existing single dispatch
mechanism, where there is a "tie-breaker" order imposed in these cases.
This is not a strict requirement for this code though, and it should be
possible to switch to asymmetric dispatch without much effort.

However, in my experience it is better to get explicit warnings for ambiguous dispatch, because these kinds of errors can be hard to detect.
Currently, this PR includes #2452, so DISJOINT: and CLASSIFY can be used to explicitly handle ambiguous definitions.

What does not work

  • This branch does not contain anything in the direction of partial inlining.
  • There is no attempt to simplify the decision tree.
  • There is no support for hook combinations
  • Bootstrapping, probably due to circular dependencies Fixed that by having to do without smart combinators, unfortunately. (Would also be nice to have some way to check modifications for bootstrap-time cycle-introducing vocab usage changes)
  • There is no ambiguity check yet. If there are two equally applicable methods, one will be selected based on lexicographic ordering. For symmetric dispatch, this should throw an error or a warning.
  • predicate generation for eql specializers in the predicate engines actually wrong
  • Useful error messages when no applicable method has been found because of non-top-of-stack dispatch position
  • probably a lot more... this needs to be tested more thoroughly on existing code

Things to test

  • Impact on compile time. For single methods, this should be currently constant.
  • Efficiency of generated code compared to math combinations, e.g. when using
    this to implement math operations on non-math classes like vectors, matrices, etc.

Some more ideas

More explicit dispatch specification

EDIT: This is incorporated as a proof of concept in classes.dispatch.

I think one of the things that is actually even more suitable for reducing
redundancy is Julia's notion of parametric dispatch. This
paper
, although quite old,
also describes an interesting mechanism of specifiying dispatch based on
predicates.

If, for example, singletons and mixins were based on tuples, they could take advantage of
the existing tuple dispatch engines due to the inheritance mechanism. Also, it would be interesting to either
infer or provide special types of predicate classes (on tuples) which can be
proven to be subtypes. A predicate class which tests a certain tuple slot, for
example, will always be a subtype of the corresponding tuple class.

This kind of thinking was also one motivation for adding the covariant-tuple
classoid, as a device to separate dispatch specification from parameter data
types a bit.

Get rid of nested dispatch tests for tuple-only participants

This is related to the previous point. If all participating classes can be
ordered in the builtin/tuple class at compile time, it should be possible to generate the fast-hash
structure over the combination of all the objects involved at a call site
instead of successive tests. That way, there would be virtually no overhead at all when
dispatching only on a limited number of combinations of tuple classes. This
would probably involve some modifications to the lookup-method code inside the VM.

(edit: updated examples with D( ) preliminary syntax, added example for eql specalizers)
(edit: added the multi decorator)
(edit: removed the multi decorator, need to use MGENERIC: now, explain disambiguation with DISJOINT:)

@timor timor force-pushed the multi-method-nested-dispatch branch from f2e11b9 to 6c551a6 Compare February 15, 2021 10:22
@timor
Copy link
Contributor Author

timor commented Feb 17, 2021

Edited description to reflect changes in detection of ambiguous methods.

@timor timor force-pushed the multi-method-nested-dispatch branch from 5e4031d to b2a104b Compare February 17, 2021 12:23
@timor
Copy link
Contributor Author

timor commented Feb 17, 2021

I am confused about bootstrapping:
What does work is:

  1. Check out master, build master image
  2. Switch to this branch
  3. refresh-all, without dependency problems
  4. execute make-my-image
  5. build.sh bootstrap with that image

What does not work is:

  1. Check out this branch
  2. build.sh net-bootstrap

The second variant dies with

*** Stage 2 early init... done
Loading resource:basis/bootstrap/stage2.factor
Loading resource:basis/command-line/command-line.factor
Loading resource:basis/debugger/debugger.factor
Loading resource:basis/compiler/errors/errors.factor
Loading resource:basis/grouping/grouping.factor
Loading resource:basis/io/styles/styles.factor
Loading resource:basis/colors/colors.factor
Loading resource:basis/delegate/delegate.factor
Loading resource:basis/delegate/protocols/protocols.factor
Loading resource:basis/deques/deques.factor
Loading resource:basis/io/streams/string/string.factor
Loading resource:basis/present/present.factor
Loading resource:basis/alien/c-types/c-types.factor
Loading resource:basis/cpu/architecture/architecture.factor
Loading resource:basis/strings/tables/tables.factor
Loading resource:basis/libc/libc.factor
Loading resource:basis/alien/destructors/destructors.factor
Loading resource:basis/functors/functors.factor
Loading resource:basis/functors/backend/backend.factor
Loading resource:basis/interpolate/interpolate.factor
Loading resource:basis/multiline/multiline.factor
Loading resource:basis/alien/syntax/syntax.factor
Loading resource:basis/alien/enums/enums.factor
Loading resource:basis/alien/libraries/libraries.factor
Loading resource:basis/alien/libraries/unix/unix.factor
Loading resource:basis/alien/parser/parser.factor
Loading resource:basis/alien/arrays/arrays.factor
Loading resource:basis/libc/linux/linux.factor
Loading resource:basis/prettyprint/prettyprint.factor
Loading resource:basis/colors/constants/constants.factor
Loading resource:basis/ascii/ascii.factor
Loading resource:basis/hints/hints.factor
Loading resource:core/generic/multi/multi.factor
Loading vocab:bootstrap/bootstrap-error.factor
Loading resource:basis/debugger/debugger.factor
Loading resource:basis/prettyprint/prettyprint.factor
Loading resource:basis/colors/constants/constants.factor
Loading resource:basis/ascii/ascii.factor
Loading resource:basis/hints/hints.factor
Loading resource:core/generic/multi/multi.factor
You have triggered a bug in Factor. Please report.
critical_error: The die word was called by the library.: 0
Starting low level debugger...
Basic commands:
  q ^D             -- quit Factor
  c                -- continue executing Factor - NOT SAFE
  t                -- throw exception in Factor - NOT SAFE
  .s .r .c         -- print data, retain, call stacks
  help             -- full help, including advanced commands

>

I noticed this behavior whenever I had a vocab dependency problem, but that was usually accompanied by refresh-all failing with the fresh image in the first variant above. Is there some way to debug vocab dependencies from the low-level debugger?

@mrjbq7
Copy link
Member

mrjbq7 commented Feb 17, 2021 via email

@timor
Copy link
Contributor Author

timor commented Feb 17, 2021

If the first works then your process is pretty clean. That implies you added code that requires new boot images. And so a self-bootstrap works but a net-bootstrap doesn’t.

Thanks. Are there certain criteria as to what kind of changes require new boot images?

@mrjbq7
Copy link
Member

mrjbq7 commented Feb 17, 2021 via email

@timor timor marked this pull request as draft February 18, 2021 18:58
@timor
Copy link
Contributor Author

timor commented Feb 19, 2021

I factored out the whole covariant tuple stuff into its own logic with an abstract dispatch-type type.

This PR basically now contains two separate concepts:

  1. Generating nested inline caches from the class tuples of multi-methods for dispatch
  2. Proof of Concept for a modular approach of parametrizing dispatch.

@timor timor force-pushed the multi-method-nested-dispatch branch from 6490460 to ba0cd5a Compare February 19, 2021 20:57
@timor timor changed the title Extend inline cache mechanism to support multiple dispatch Multiple Dispatch wih existing inline cache infrastructure and abstractions for dispatch specifciers Feb 20, 2021
@timor timor changed the title Multiple Dispatch wih existing inline cache infrastructure and abstractions for dispatch specifciers [RFC] Multiple dispatch with existing inline cache infrastructure and abstractions for dispatch specifciers Feb 20, 2021
kusumotonorio added a commit to kusumotonorio/factor that referenced this pull request Feb 21, 2021
…r#2430 "Multiple dispatch with existing inline cache infrastructure and abstractions for dispatch specifciers" and added I made some changes to use the cache in multiple dispatch without hook variables. His PR continues to grow and is already different from the latest version.

Currently, it is not the default because it is for testing purposes, and it works when the multi-method generic word without hook variables is not specified as `inline` or `partial-inline`, but is specified as `cached-multi`. Depending on whether `cached-multi` is added or not, it is possible to compare with conventional multi-dispatch.

e.g.
```factor
MGENERIC: cached-md-beats? ( obj1 obj2 -- ? ) cached-multi

MM: cached-md-beats? ( :paper :scissors -- ? ) 2drop t ;
MM: cached-md-beats? ( :scissors :rock -- ? )  2drop t ;
MM: cached-md-beats? ( :rock :paper -- ? )  2drop t ;
MM: cached-md-beats? ( :thing :thing -- ? )  2drop f ;
```
@timor timor changed the title [RFC] Multiple dispatch with existing inline cache infrastructure and abstractions for dispatch specifciers [RFC] Multiple dispatch with existing inline cache infrastructure and abstractions for dispatch specifiers Mar 8, 2021
@timor timor mentioned this pull request Mar 9, 2021
@timor
Copy link
Contributor Author

timor commented Apr 3, 2021

I added classes.dispatch.class which illustrates how this can be used to implement class methods. Updated the description to incorporate an example using some quickly invented CM: word for simple constructor inheritance.

@timor
Copy link
Contributor Author

timor commented Apr 6, 2021

@kusumotonorio I just pushed a fix in 1890f5a, which you would need to apply to the corresponding code in your branch, if you want to stay synchronized.

@kusumotonorio
Copy link
Contributor

kusumotonorio commented Apr 6, 2021

@timor Roger that. I'm going to differentiate between those that actively follow your changes and those that don't by creating a new branch.

@timor
Copy link
Contributor Author

timor commented Apr 6, 2021

@kusumotonorio I really should split this PR into the dispatch-type abstraction stuff and the inline-cache implementation stuff, just haven't had time yet.

@timor timor force-pushed the multi-method-nested-dispatch branch 2 times, most recently from f5a61f6 to 8df71a9 Compare April 7, 2021 13:54
timor added 2 commits May 6, 2021 13:42
Predicate classes can be declared disjoint from one-another.  This is inherited,
so predicate classes that specialize on a predicate class with an existing
disjoint specification will inherit the set of declared-disjoint classes.

Adds the `DISJOINT:` syntax word for that purpose.
@timor timor force-pushed the multi-method-nested-dispatch branch from 594969b to 5e21bc0 Compare May 6, 2021 12:21
timor and others added 28 commits May 6, 2021 14:39
…s...

... with unions and intersections.  If anonymous intersections/unions of these
are never created, that code can be removed.
This generates compilation errors without preventing compilation of the methods.
The idea is that vocabs loaded afterwards can define additional methods which
resolve these ambiguities, in which case the compiler error should be cleared.
of classes.algebra and generic.multi into seqparate vocabs.
This was used when arity had to be known for comparisons, and when classes could
be used to compare methods in dispatch positions.
... old code from when it was possible to compare regular classes with dispatch
types for method ordering.
…tch-types...

... in specializer position for multi-generics.  That way no comparisons between
concrete classes and covariant tuples are possible per default, separating the dispatch type
space from the concrete type space.
… hierarchy

Uses the classes.dispatch protocol.  Is created by using wrappers in the
`D{ }` helper syntax word.
…ix eql-spec

A bit of a workaround because of missing multi-methods.
Correct class algebra for eql specializers.
If only one method was applicable, we took a short cut to that method, without
inserting checks which would dispatch to the default method, if it was called
with something non-defined.
Corresponding CLOS test code:

```lisp
(defclass t0 () ())
(defclass t1 (t0) ())
(defclass t2 (t1) ())
(defclass t3 (t2) ())

(defmethod m-cl ((c1 t3) (c2 t0) (c3 t2)) (format t "No.1-") (call-next-method))
(defmethod m-cl ((c1 t2) (c2 t2) (c3 t2)) (format t "No.2-") (call-next-method))
(defmethod m-cl ((c1 t2) (c2 t0) (c3 t0)) (format t "No.3-") (call-next-method))
(defmethod m-cl ((c1 t0) (c2 t0) (c3 t0)) (format t "No.4~%"))

* (m-cl (make-instance 't3) (make-instance 't2) (make-instance 't0))
No.3-No.4
NIL
* (m-cl (make-instance 't2) (make-instance 't2) (make-instance 't2))
No.2-No.3-No.4
NIL
* (m-cl (make-instance 't3) (make-instance 't2) (make-instance 't2))
No.1-No.2-No.3-No.4
NIL
```
Was only installing under the word decision tree, while it needs to be installed
for the classoid part of the decision tree.
Also fix test
Also make that `CM:` turns a single-generic into a multi-generic.
Looks more like a stack effect, so round parens make more sense
Use with `GENERIC:` to make sure it is a multi-generic.
Used to determine whether a dispatch specifier depends on a specific class.
Used for dependency checking.
Was causing problems with redefinition.  Since the "combination" word-prop did
not match, the generic was always reset, removing existing methods.
Make `?lookup-method` generic for now, but regular single-dispatch lookup should
return the correct dispatch engine instead, probably.
@timor timor force-pushed the multi-method-nested-dispatch branch from 5e21bc0 to c67ed94 Compare May 6, 2021 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants