Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polymorphic operators #55

Closed
denismerigoux opened this issue Jan 6, 2021 · 2 comments
Closed

Polymorphic operators #55

denismerigoux opened this issue Jan 6, 2021 · 2 comments
Assignees
Labels
🔧 compiler Issue concerns the compiler 💬 discussion Further discussion is needed ✨ enhancement New feature or request

Comments

@denismerigoux
Copy link
Contributor

denismerigoux commented Jan 6, 2021

Right now, binary and unary operators are specialized depending on their type : +$ for money addition, -@ for date substraction, etc. This is cumbersome and ideally we would want to write + for adding money, dates, integers, decimals, etc.

Why is it like this?

The need for specialized operators stems form Catala's type system, and more precisely from the default calculs typing algorithm. It uses a classical Hindley-Milner inference and the W algorithm. In classical Hindley-Milner inference, operators either operate on a fixed type, or are completely polymorphic. This is the typing procedure for Catala operators right now :

let op_type (op : A.operator Pos.marked) : typ Pos.marked UnionFind.elem =
let pos = Pos.get_position op in
let bt = UnionFind.make (TLit TBool, pos) in
let it = UnionFind.make (TLit TInt, pos) in
let rt = UnionFind.make (TLit TRat, pos) in
let mt = UnionFind.make (TLit TMoney, pos) in
let dut = UnionFind.make (TLit TDuration, pos) in
let dat = UnionFind.make (TLit TDate, pos) in
let any = UnionFind.make (TAny (Any.fresh ()), pos) in
let array_any = UnionFind.make (TArray any, pos) in
let any2 = UnionFind.make (TAny (Any.fresh ()), pos) in
let arr x y = UnionFind.make (TArrow (x, y), pos) in
match Pos.unmark op with
| A.Ternop A.Fold -> arr (arr any2 (arr any any2)) (arr any2 (arr array_any any2))
| A.Binop (A.And | A.Or) -> arr bt (arr bt bt)
| A.Binop (A.Add KInt | A.Sub KInt | A.Mult KInt | A.Div KInt) -> arr it (arr it it)
| A.Binop (A.Add KRat | A.Sub KRat | A.Mult KRat | A.Div KRat) -> arr rt (arr rt rt)
| A.Binop (A.Add KMoney | A.Sub KMoney) -> arr mt (arr mt mt)
| A.Binop (A.Add KDuration | A.Sub KDuration) -> arr dut (arr dut dut)
| A.Binop (A.Sub KDate) -> arr dat (arr dat dut)
| A.Binop (A.Add KDate) -> arr dat (arr dut dat)
| A.Binop (A.Div KMoney) -> arr mt (arr mt rt)
| A.Binop (A.Mult KMoney) -> arr mt (arr rt mt)
| A.Binop (A.Lt KInt | A.Lte KInt | A.Gt KInt | A.Gte KInt) -> arr it (arr it bt)
| A.Binop (A.Lt KRat | A.Lte KRat | A.Gt KRat | A.Gte KRat) -> arr rt (arr rt bt)
| A.Binop (A.Lt KMoney | A.Lte KMoney | A.Gt KMoney | A.Gte KMoney) -> arr mt (arr mt bt)
| A.Binop (A.Lt KDate | A.Lte KDate | A.Gt KDate | A.Gte KDate) -> arr dat (arr dat bt)
| A.Binop (A.Lt KDuration | A.Lte KDuration | A.Gt KDuration | A.Gte KDuration) ->
arr dut (arr dut bt)
| A.Binop (A.Eq | A.Neq) -> arr any (arr any bt)
| A.Binop A.Map -> arr (arr any any2) (arr array_any any2)
| A.Unop (A.Minus KInt) -> arr it it
| A.Unop (A.Minus KRat) -> arr rt rt
| A.Unop (A.Minus KMoney) -> arr mt mt
| A.Unop (A.Minus KDuration) -> arr dut dut
| A.Unop A.Not -> arr bt bt
| A.Unop A.ErrorOnEmpty -> arr any any
| A.Unop (A.Log _) -> arr any any
| A.Unop A.Length -> arr array_any it
| A.Unop A.IntToRat -> arr it rt
| Binop (Mult (KDate | KDuration)) | Binop (Div (KDate | KDuration)) | Unop (Minus KDate) ->
Errors.raise_spanned_error "This operator is not available!" pos

Why can't we make all operators completely polymorphic ? Because it does not make any sense to + two enumerations together, or * two dates.

Hindley-Milner inference with constraints

What we really want is a type system where we say : + operates only on a fixed list of types. This corresponds to Hindley-Milner inference with constraints, a notoriously hard problem. Some libraries exist to perform the typing, like https://gitlab.inria.fr/fpottier/inferno, but overall it is much more complicated from the classical inference I'm doing right now.

Because I don't want to spend too much time on Catala's type system, I prefer we stick to classical inference and hence we're stuck with the default calculus operators as they are.

A compromise

But this does not mean we can't have what we want ! Indeed, we can keep the differentiated default calculus operators but have undifferentiated surface syntax operator. We can write |01/01/2020| - |01/01/2019| and have the compiler figure out that - is actually -@ during desugaring.

This system is nice because it does not complicate the default calculus. Instead, we rely on a conservative desugaring pass that would sit somewhere in Desugared or Scopelang and that would specialize the operators with the information provided by a conservative, high-level additional type checking pass done in Desugared or Scopelang.

If this pass does not manage to determine the type of the operator, we send an error message back to the user saying that desambiguation is needed and the user actually has to write -@.

What are your thoughts on this?

@denismerigoux denismerigoux added ✨ enhancement New feature or request 💬 discussion Further discussion is needed 🔧 compiler Issue concerns the compiler labels Jan 6, 2021
@AltGr
Copy link
Contributor

AltGr commented Jan 6, 2021

That last suggestion sounds great; I'd add that this desugaring pass would be based on a simple type propagation, so that variables and expression could be handled too, if there is a direct path to get their type.

I would go further and advise using something like x as a date - |01/01/2019| rather than x -@ |01/01/2019| in the surface syntax: from a user point of view, requiring the explicit -@ instead of - only sometimes could actually be more confusing than requiring it always, even if it improves readability in general.
(because it implies that the user understands that both are the same but they are allowed a shortcut, depending on higher order concerns on the structure of the expression)

Having generic type constraints, on the other hand, should be something we can do in a way that is clearly understood (e.g. x$ or x@ or x as a date (but that sounds like a conversion rather than a constraint, so maybe not)?)

Probably the best indicator of whether this is worth it will be the quality of the error messages we are able to print when either we are missing information, or the type is incorrect.

@AltGr
Copy link
Contributor

AltGr commented Oct 17, 2022

Now that we can run our typer on earlier passes, it may sound reasonable to call it on expressions as soon as desugaring takes place, to resolve the actual operator to insert in the AST. The simplicity of Catala (explicitely typed variables, no polymorphism) should ensure that we can uniquely resolve the operator from the type of its operands using a pre-defined table.

Using the typer this way at that stage would also have application in the resolution of struct/enum names from ambiguous field/variant names.

@AltGr AltGr closed this as completed Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔧 compiler Issue concerns the compiler 💬 discussion Further discussion is needed ✨ enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants