New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: List append operator proposal #68

Closed
Gabriel439 opened this Issue Jun 15, 2017 · 15 comments

Comments

Projects
None yet
3 participants
@Gabriel439
Collaborator

Gabriel439 commented Jun 15, 2017

This is a request for comments on a proposal to add support for a list append operator

Right now there is no list append operator. The closest thing Dhall has to such an operator is ./Prelude/List/concat (which is defined in terms of List/build and List/fold). The original reason for this was to encourage efficient concatenation of lists since lists are stored under the hood as Vectors, and if you want to concatenate N vectors the most efficient way to do so is to concatenate them all at once (which is O(N) time complexity) instead of pair-wise (which is O(N^2) time complexity).

However, there are some times when all you want to do is to append two lists. For example, I ran into this issue when trying to translate the second Jsonnet example on this page (and that was the original motivation for this RFC), where I had to append two lists and awkwardly had to use something like ./Prelude/List/concat Text [x, y] when I would have preferred x ++ y or something similar.

The reason that Text provides a built-in ++ operator is that there is no similar efficiency concern. Text is implemented under the hood as a Builder which has O(1) time complexity for append.

There is also a separate and orthogonal issue, which is what to name this list operator if Dhall were to provide support for one. ++ is already taken for appending Text, even though it is also the most natural thing to name the operator to append two lists.

So I will structure this three sets of orthogonal proposals for:

  • How to support the list append operator
  • What to name the operator
  • How to efficiently support the operator

In each set of proposals, I will also indicate which one I prefer. I also invite people to submit proposals of their own if they can think of a better way to support this.

Proposal set (A) - How to support the list operator

Proposal (A0) - Do nothing

Maybe this isn't such a big deal and users can live with using either ./Prelude/List/concat or a newly added ./Prelude/List/append that is not used as an infix operator.

Proposal (A1) - Add support for user-defined infix operators

Allow users to define new operators using the syntax:

let (++) = ./Prelude/List/append SomeType
in  ...

... and add ./Prelude/List/append to the ./Prelude

Proposal (A2) - Add a built-in list concatenation operator

This would add a new ListAppend constructor to the core syntax tree. See Proposal set (B) which covers what to name this operator

My preference

I prefer proposal (A2).

I don't recommend proposal (A1) because a user-defined list operator cannot take advantage of type inference. The type of ./Prelude/List/append would be:

(a : Type)  List a  List a  List a

... so you would need to instantiate a to a specific type each time you bound ./Prelude/List/append to the operator name. A built-in operator could take advantage of type inference when type-checking to infer the type of the list being concatenated.

I also don't think it hurts to add a new list append operator to the language since it's very simple to implement and I'm pretty sure some users will want to use this operator. It also is associative and has an identity, so it passes the bare minimum criteria for being an operator in Dhall. So this is why I don't think we should do nothing as in Proposal (A0).

Proposal set (B) - Name of the append operator

If we go with proposal (A1) then we should recommend a naming convention for this operator and if we go with proposal (A2) then we have to pick a name.

Proposal (B0) - Reuse the operator (++)

This would add new type-checking logic to infer which (++) operator the user meant by the types of the arguments and provide a helpful error message if there is a type mismatch.

Proposal (B1) - Name list append (++) and rename text append to (**)

This would make it unambiguous which append the user intended, which would improve error messages

Proposal (B2) - Name the list append or text append operator something else

Some other ideas for what to name either operator to avoid a conflict:

  • (<|>) (which is a valid list concatenation operator in Haskell, too)
  • (+++)
  • (#)
  • (%)

My preference

I prefer proposal (B1). The rationale behind these two names is to open the door in the future to (++) being overloaded to work on both List/Optional and (**) working on Text/List Text/Optional Text/List (List Text)/...

Or more, generally, using a mix of Dhall and Haskell pseudo-code, this is what I envision

class Alternative f where
    zero :: f a
    (++) :: f a -> f a -> f a

instance Alternative (List a) where
    zero = [] : List a
    (++) = Data.List.(++)

instance Alternative (Optional a) where
    zero = [] : Optional a

    -- Same as `(<|>)` in Haskell
    ([] : Optional a) ++ [x] = [x]
    [x] ++ _ = [x]

class Monoid m where
    one :: m
    (**) :: m -> m -> m

instance Monoid Text where
    one = ""
    (**) = Data.Text.append

instance Monoid m => Monoid (List m) where
    one = pure one
    (**) = liftA2 (**)

instance Monoid m => Monoid (Optional m) where
    one = pure one
    (**) = liftA2 (**)

These would have the nice property that (**) and (++) behave like multiplication and addition (thus the operator names):

x ** (y ++ z) = (x ** y) ++ (x ** z)

zero ** x = zero

... and all the other semiring laws.

I'm not sure that I will ever implement these more general overloaded versions of (++) and (**) but I would still like to have some underlying consistency explaining why they are named the way they are.

The other reason I prefer distinct names is that Dhall tries to encourage the user to express their intent as much as possible in order to improve error messages. Distinct operator names would make it clearer which concatenation operator the user meant to use.

The downside of this is that it would break existing code since it's definitely not a backwards-compatible change. However, Dhall is still a pretty young language so I think it's okay to still make breaking changes at this point.

Proposal set (C) - Performance

This section will discuss how to represent Dhall Lists internally and how that effects the time complexity of the most performance-sensitive operations that Dhall currently supports. All of the time complexities are in terms of E, the total number of list elements.

This decision is not as important to get right now since we can always change it later without changing Dhall language. This only affects the Dhall API.

Proposal (C0) - Continue to use Vector to store Lists

  • (++) - O(E)
  • List/length - O(1)
  • List/indexed - O(E)
  • List/last - O(1)

Proposal (C1) - Use Data.Sequence.Seq to store Lists

Better time complexity than Vector for append, but possibly worse constant factors:

  • (++) - O(log E)
  • List/length - O(1)
  • List/indexed - O(E) (Using Data.Sequence.mapWithIndex)
  • List/last - O(1)

Proposal (C2) - Use VectorBuilder.Builder.Builder to store Lists

Most efficient for append, but worst in other respects:

  • (++) - O(1)
  • List/length - O(E) (Converting to a Vector first)
  • List/indexed - O(E) (Converting to a Vector intermediate)
  • List/last - O(E) (Converting to a Vector first)

Proposal (C3) - Use a wrapper around VectorBuilder.Builder.Builder

This takes advantage of the fact that Dhall doesn't support very many operations, so we can use a final encoding to efficiently cache their results. This takes advantage of Haskell's laziness to only compute what we actually need:

data List a = List
    { head :: Maybe a
    , last :: Maybe a
    , length :: !Natural
    , elements :: VectorBuilder.Builder.Builder a
    , reversed :: VectorBuilder.Builder.Builder a
    }

The main operation that this doesn't improve the efficiency of is List/indexed. However, it might be possible to add more efficient support for that with an upstream patch to vector-builder

My preference

I'm partial to both (C1) and (C3) but I slightly lean towards (C1) (using Seq) because it's a simple and versatile data structure, and it's an improvement over Vector from a time-complexity standpoint for append. Constant factors don't matter as much for Dhall since it's not intended to be a high performance language.

Conclusion

So to summarize, my proposal is to:

  • Rename the text append operator to (**)
  • Add a new list append operator named (++)
  • Use Seq instead of Vector internally to represent Dhall lists to improve the efficiency of append

Anybody who is interested in this can suggest any other proposals. If nobody objects to the above proposal or suggests any other alternatives then I'll implement this after a week has passed.

@PierreR

This comment has been minimized.

Show comment
Hide comment
@PierreR

PierreR Jun 15, 2017

Contributor

Sorry to hijack the discussion but would you mind explaining a bit more how the haskell semigroup <|> operator related to + when the monoid append operation <> related to *.

Contributor

PierreR commented Jun 15, 2017

Sorry to hijack the discussion but would you mind explaining a bit more how the haskell semigroup <|> operator related to + when the monoid append operation <> related to *.

@Gabriel439

This comment has been minimized.

Show comment
Hide comment
@Gabriel439

Gabriel439 Jun 16, 2017

Collaborator

@PierreR: (<|>) is Haskell's Alternative operator, not the semigroup operator:

class Alternative f where
    (<|>) :: f a -> f a -> f a

    empty :: f a

The reason that it relates to (+) is that for any type F that is an Alternative, you can get a Monoid instance for free like this:

-- Assuming that we have:
instance Alternative F where ...

-- ... that implies that we also have:
instance Applicative F where ...
-- ... because `Applicative` is a superclass of `Alternative`

-- ... which means that we can also define this `Monoid` instance for `F`:
instance Monoid a => Monoid (F a) where
    mempty = pure mempty

    mappend = liftA2 mappend

... and if you have the above Alternative and Monoid instances then (<>) and (<|>) will usually have the following behavior:

x <> (y <|> z) = (x <> y) <|> (x <> z)  -- x * (y + z) = (x * y) + (x * z)

x <> empty = empty  -- x * 0 = 0

In other words, (<>) and (<|>) interact with each other in the same way as (*) and (+), respectively

Collaborator

Gabriel439 commented Jun 16, 2017

@PierreR: (<|>) is Haskell's Alternative operator, not the semigroup operator:

class Alternative f where
    (<|>) :: f a -> f a -> f a

    empty :: f a

The reason that it relates to (+) is that for any type F that is an Alternative, you can get a Monoid instance for free like this:

-- Assuming that we have:
instance Alternative F where ...

-- ... that implies that we also have:
instance Applicative F where ...
-- ... because `Applicative` is a superclass of `Alternative`

-- ... which means that we can also define this `Monoid` instance for `F`:
instance Monoid a => Monoid (F a) where
    mempty = pure mempty

    mappend = liftA2 mappend

... and if you have the above Alternative and Monoid instances then (<>) and (<|>) will usually have the following behavior:

x <> (y <|> z) = (x <> y) <|> (x <> z)  -- x * (y + z) = (x * y) + (x * z)

x <> empty = empty  -- x * 0 = 0

In other words, (<>) and (<|>) interact with each other in the same way as (*) and (+), respectively

@josefs

This comment has been minimized.

Show comment
Hide comment
@josefs

josefs Jun 28, 2017

I think the proposal looks fine. I just want to comment briefly on your remark on overloading.

Overloading the operators the way you suggest above would mean that the operator (**) would behave as bog standard concatenation for text but as an outer product for lists. That would be terribly confusing. If you want to introduce overloading, I'd rather you made sure that each operator intuitively has the same behavior for every type it is defined for.

josefs commented Jun 28, 2017

I think the proposal looks fine. I just want to comment briefly on your remark on overloading.

Overloading the operators the way you suggest above would mean that the operator (**) would behave as bog standard concatenation for text but as an outer product for lists. That would be terribly confusing. If you want to introduce overloading, I'd rather you made sure that each operator intuitively has the same behavior for every type it is defined for.

@Gabriel439

This comment has been minimized.

Show comment
Hide comment
@Gabriel439

Gabriel439 Jun 28, 2017

Collaborator

Having (**) behave as List concatenation would violate the distributivity law:

x ** (y ++ z) = (x ** y) ++ (x ** z)

Giving it outer product semantics is consistent with that law

The semantics of (**) are also consistent if you explain it as: (**) only concatenates Text and never concatenates any other type

Collaborator

Gabriel439 commented Jun 28, 2017

Having (**) behave as List concatenation would violate the distributivity law:

x ** (y ++ z) = (x ** y) ++ (x ** z)

Giving it outer product semantics is consistent with that law

The semantics of (**) are also consistent if you explain it as: (**) only concatenates Text and never concatenates any other type

@josefs

This comment has been minimized.

Show comment
Hide comment
@josefs

josefs Jun 28, 2017

Perhaps you can explain why you think that the distributivity law is important in this context. I certainly don't understand why it needs to be there.

To be clear, here's the law that I think would make sense:

textToList (x ** y) = textToList x ** textToList y

It involves the as yet undefined, but intuitively sensible, function textToList which converts Text to a list of characters. The above law is a homomorphism law between Text and lists that matches our intuition that Text can be thought of as simply a list of characters even though the actual implementation might differ. Your proposed implementation of (**) breaks this law.

josefs commented Jun 28, 2017

Perhaps you can explain why you think that the distributivity law is important in this context. I certainly don't understand why it needs to be there.

To be clear, here's the law that I think would make sense:

textToList (x ** y) = textToList x ** textToList y

It involves the as yet undefined, but intuitively sensible, function textToList which converts Text to a list of characters. The above law is a homomorphism law between Text and lists that matches our intuition that Text can be thought of as simply a list of characters even though the actual implementation might differ. Your proposed implementation of (**) breaks this law.

@Gabriel439

This comment has been minimized.

Show comment
Hide comment
@Gabriel439

Gabriel439 Jun 29, 2017

Collaborator

My reasoning goes something like this:

For the type List Text, there are two valid ways to combine that type:

  • (A) append lists
  • (B) append elements (i.e. outer product)

We obviously have to support appending lists (that's the whole point of this proposal!), but some users (like myself) would also like to be able to append elements. For example, I would like to be able to write something like this:

    let Text/concat = ...
in  let names = ["John", "Lucy", "Brad"]
in  let greeting = ["Hello", "Goodbye"]
in  Text/concat (greeting ** [" "] ** names ** ["!\n"])

The whole reason for having two separate operators for appending Text and appending Lists is so that if Dhall ever does need to provide an outer product then we don't need to introduce yet another operator because we can reuse the (**) operator.

As a bonus, (**) would naturally generalize to other types, too (like Optional Text, List (Optional Text), etc.) As another bonus, these two operators obey the distributive laws. However, these are just bonuses: the core reason for the two operators in the first place is to be able to support the outer product in the future without introducing yet another operator.

Regarding functor laws, the laws that I am proposing if there were a hypothetical textToList function would be:

textToList (x ** y) = textToList x ++ textToList y

textToList "" = [] : List Char
Collaborator

Gabriel439 commented Jun 29, 2017

My reasoning goes something like this:

For the type List Text, there are two valid ways to combine that type:

  • (A) append lists
  • (B) append elements (i.e. outer product)

We obviously have to support appending lists (that's the whole point of this proposal!), but some users (like myself) would also like to be able to append elements. For example, I would like to be able to write something like this:

    let Text/concat = ...
in  let names = ["John", "Lucy", "Brad"]
in  let greeting = ["Hello", "Goodbye"]
in  Text/concat (greeting ** [" "] ** names ** ["!\n"])

The whole reason for having two separate operators for appending Text and appending Lists is so that if Dhall ever does need to provide an outer product then we don't need to introduce yet another operator because we can reuse the (**) operator.

As a bonus, (**) would naturally generalize to other types, too (like Optional Text, List (Optional Text), etc.) As another bonus, these two operators obey the distributive laws. However, these are just bonuses: the core reason for the two operators in the first place is to be able to support the outer product in the future without introducing yet another operator.

Regarding functor laws, the laws that I am proposing if there were a hypothetical textToList function would be:

textToList (x ** y) = textToList x ++ textToList y

textToList "" = [] : List Char
@josefs

This comment has been minimized.

Show comment
Hide comment
@josefs

josefs Jun 29, 2017

I agree with everything you say up to the point of reusing the (**) for the outer product for lists. The reason you mention for the reuse is to not have to introduce yet another operator. I don't see that as a problem. What's wrong with another operator? Rather, I see reusing (**) as a problem as I explained above.

Furthermore, the fact that (**) would generalize to other types would exacerbate the problem in my eyes. It would not be a bonus.

josefs commented Jun 29, 2017

I agree with everything you say up to the point of reusing the (**) for the outer product for lists. The reason you mention for the reuse is to not have to introduce yet another operator. I don't see that as a problem. What's wrong with another operator? Rather, I see reusing (**) as a problem as I explained above.

Furthermore, the fact that (**) would generalize to other types would exacerbate the problem in my eyes. It would not be a bonus.

@Gabriel439

This comment has been minimized.

Show comment
Hide comment
@Gabriel439

Gabriel439 Jun 29, 2017

Collaborator

My reasoning is that operator reuse is good as long as the operator obeys mathematical laws. The reason why is that if the operator obeys laws then you can reason abstractly about the code's behavior independent of what type you instantiate the operator to work on.

The point of reuse is not just to avoid wasting operator namespace. It's also about preserving mathematical intuitions as we transition between different types. The whole reason that they are named (++) and (**) in the first place is to evoke their similarity to (+) and (*) so that the user can think "Aha! These operators behave just like addition and multiplication". Indeed, we have the following functor laws between the two sets of operators:

-- Given:
zero = [] : List Text

one = [""]

x, y : List Text

List/length Text (x ++ y) = List/length Text x + List/length Text y

List/length Text zero = +0

List/length Text (x ** y) = List/length Text x * List/length Text y

List/length Text one = +1

However, that still doesn't address the separate question of whether (++) or (**) should be the operator for appending Text. In my eyes, that basically boils down to whether or not you view Text as a primitive and opaque element type or as a List wrapping an even more primitive type: Char. The reason this distinction matters is that the semantics of (**) is essentially to "append the innermost element type", whereas the semantics of (++) is to "append the outermost type constuctor". If you view Text as List Char then there is no meaningful way to append Chars but there is a meaningful way to append Lists, so (++) is the operator you'd naturally prefer. However, if you view Text as the primitive type then (**) is the only operator that makes sense.

In this case, I prefer to view Text as a primitive irreducible type because it's principal type in Dhall is Text and it's not a type synonym for List Char. (Also, I'm not even sure List Char is semantically correct for Unicode)

Collaborator

Gabriel439 commented Jun 29, 2017

My reasoning is that operator reuse is good as long as the operator obeys mathematical laws. The reason why is that if the operator obeys laws then you can reason abstractly about the code's behavior independent of what type you instantiate the operator to work on.

The point of reuse is not just to avoid wasting operator namespace. It's also about preserving mathematical intuitions as we transition between different types. The whole reason that they are named (++) and (**) in the first place is to evoke their similarity to (+) and (*) so that the user can think "Aha! These operators behave just like addition and multiplication". Indeed, we have the following functor laws between the two sets of operators:

-- Given:
zero = [] : List Text

one = [""]

x, y : List Text

List/length Text (x ++ y) = List/length Text x + List/length Text y

List/length Text zero = +0

List/length Text (x ** y) = List/length Text x * List/length Text y

List/length Text one = +1

However, that still doesn't address the separate question of whether (++) or (**) should be the operator for appending Text. In my eyes, that basically boils down to whether or not you view Text as a primitive and opaque element type or as a List wrapping an even more primitive type: Char. The reason this distinction matters is that the semantics of (**) is essentially to "append the innermost element type", whereas the semantics of (++) is to "append the outermost type constuctor". If you view Text as List Char then there is no meaningful way to append Chars but there is a meaningful way to append Lists, so (++) is the operator you'd naturally prefer. However, if you view Text as the primitive type then (**) is the only operator that makes sense.

In this case, I prefer to view Text as a primitive irreducible type because it's principal type in Dhall is Text and it's not a type synonym for List Char. (Also, I'm not even sure List Char is semantically correct for Unicode)

@josefs

This comment has been minimized.

Show comment
Hide comment
@josefs

josefs Jun 29, 2017

I still disagree. But if I were to argue my position I'd just be repeating myself. I'm not swayed by what you're saying and you don't seem to be swayed by my arguments either. Let's just agree to disagree.

josefs commented Jun 29, 2017

I still disagree. But if I were to argue my position I'd just be repeating myself. I'm not swayed by what you're saying and you don't seem to be swayed by my arguments either. Let's just agree to disagree.

@Gabriel439

This comment has been minimized.

Show comment
Hide comment
@Gabriel439

Gabriel439 Jun 29, 2017

Collaborator

Alright, but I have just one last question before you go: the strongest alternative approach that I'm still considering is using (++) for both Text and List (but not for outer product). Would you object to that for the same reason: overloading the meaning of an operator?

Collaborator

Gabriel439 commented Jun 29, 2017

Alright, but I have just one last question before you go: the strongest alternative approach that I'm still considering is using (++) for both Text and List (but not for outer product). Would you object to that for the same reason: overloading the meaning of an operator?

@josefs

This comment has been minimized.

Show comment
Hide comment
@josefs

josefs Jun 29, 2017

Using (++) for concatenating both Text and List is a good use of overloading the way I see it. Even though the meaning is overloaded, there is still common intuition between the two meanings: concatenating sequences of elements. I think that would be a good design to shoot for. But I also understand your hesitation about introducing overloading in Dhall, since it would deviate somewhat for the very clean cut and simple design the language has now.

josefs commented Jun 29, 2017

Using (++) for concatenating both Text and List is a good use of overloading the way I see it. Even though the meaning is overloaded, there is still common intuition between the two meanings: concatenating sequences of elements. I think that would be a good design to shoot for. But I also understand your hesitation about introducing overloading in Dhall, since it would deviate somewhat for the very clean cut and simple design the language has now.

@Gabriel439

This comment has been minimized.

Show comment
Hide comment
@Gabriel439

Gabriel439 Jun 29, 2017

Collaborator

I still might go for (++) for both, mainly because I'm getting cold feet about breaking backwards compatibility. Give me some more time to reflect on it

Collaborator

Gabriel439 commented Jun 29, 2017

I still might go for (++) for both, mainly because I'm getting cold feet about breaking backwards compatibility. Give me some more time to reflect on it

Gabriel439 added a commit that referenced this issue Jun 30, 2017

@Gabriel439

This comment has been minimized.

Show comment
Hide comment
@Gabriel439

Gabriel439 Jun 30, 2017

Collaborator

So I put up a pull request to just overload (++) to work on Lists: #81

The deciding factor in my decision was the desire to not break existing Dhall code

I'll let that pull request sit for a week and if nobody objects then I'll merge

Collaborator

Gabriel439 commented Jun 30, 2017

So I put up a pull request to just overload (++) to work on Lists: #81

The deciding factor in my decision was the desire to not break existing Dhall code

I'll let that pull request sit for a week and if nobody objects then I'll merge

@Gabriel439 Gabriel439 closed this in 8c340c1 Jul 7, 2017

Gabriel439 added a commit that referenced this issue Jul 15, 2017

Revert "Overload `(++)` to work on `List`s. Fixes #68 (#81)"
This reverts commit 8c340c1.

The reason for reverting this change is that using the same
operator name for appending `Text` and `List`s complicates
compilation to other languages.  For example, the `dhall-to-nix`
compiler currently translates `TextAppend` to `+`, but if you
consolidate `List` and `Text` append under a single `Append`
constructor then the translation to Nix is harder: you'd have to
do type inference as part of the translation process to know which
Nix operator to translate to.
@Gabriel439

This comment has been minimized.

Show comment
Hide comment
@Gabriel439

Gabriel439 Jul 15, 2017

Collaborator

I ended up reverting this change temporarily because using the same operator name is problematic for downstream compilers (such as dhall-to-nix). Previously these compilers would unambiguous translate TextAppend to the text append operator, but now they would have to do type inference as part of the translation process to know which operator to select.

For now I'll just reopen this issue until I figure out how to resolve this. Most likely I will just end up selecting a unique operator and constructor name for appending Lists. For example, the operator for appending lists might end up being +++ or some unused symbol like # (since it resembles the shape of a plus symbol)

Collaborator

Gabriel439 commented Jul 15, 2017

I ended up reverting this change temporarily because using the same operator name is problematic for downstream compilers (such as dhall-to-nix). Previously these compilers would unambiguous translate TextAppend to the text append operator, but now they would have to do type inference as part of the translation process to know which operator to select.

For now I'll just reopen this issue until I figure out how to resolve this. Most likely I will just end up selecting a unique operator and constructor name for appending Lists. For example, the operator for appending lists might end up being +++ or some unused symbol like # (since it resembles the shape of a plus symbol)

@Gabriel439 Gabriel439 reopened this Jul 15, 2017

Gabriel439 added a commit that referenced this issue Jul 22, 2017

@Gabriel439

This comment has been minimized.

Show comment
Hide comment
@Gabriel439

Gabriel439 Jul 22, 2017

Collaborator

I introduced #90 to reintroduce the same change except with (#) as the operator name for appending two lists. If nobody objects to the pull request I'll merge after a few days

Collaborator

Gabriel439 commented Jul 22, 2017

I introduced #90 to reintroduce the same change except with (#) as the operator name for appending two lists. If nobody objects to the pull request I'll merge after a few days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment