Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax to break an expression over multiple lines? #819

Closed
ajfriend opened this issue Jul 17, 2022 · 26 comments · Fixed by #3408
Closed

Syntax to break an expression over multiple lines? #819

ajfriend opened this issue Jul 17, 2022 · 26 comments · Fixed by #3408
Labels
language-design Changes to PRQL-the-language

Comments

@ajfriend
Copy link

The PRQL syntax currently usually (with the exception of lists?) treats newlines as terminating characters for statements. (And please do correct me if I'm wrong about this!)

For long expressions (and perhaps also statements?), it would be nice to have a syntax that would allow users to break that expression up over multiple lines.

This idea is motivated by the discussion in #788.

Is there a current way to do this? If not, would parentheses be the natural way to express this?

@max-sixty
Copy link
Member

This is a good point.

I don't think we should need it that often — one of the benefits of PRQL should be to split up logic into modular pieces, represented by transforms on separate lines.

But it would be presumptuous to assume we can always do this, and #788 is a good case — even if the "split up into multiple filters" suggestion is reasonable, it doesn't cover the case of a long list of or-ed conditions.

Newlines have an important semantic meaning, so we've "used up" that syntax. So a couple of options:

  • Use \ to allow line breaks
  • Find the cases that force long lines and offer a solution to them (e.g. an any function to take a list of conditions and string them together with ors)

Any thoughts?

@ajfriend
Copy link
Author

My comment around parentheses was wondering if you could do something like what Python does to "free up" the newline syntax inside of parentheses:
Screen Shot 2022-07-18 at 10 50 55 AM

@mklopets
Copy link
Collaborator

\ also seems like a decent solution to me

@max-sixty
Copy link
Member

Yes, interesting. So we could have a rule that finishing the line on an operator (e.g. + / or / etc) causes a line continuation.

Are there any instances where we want to finish on an operator?

One concern is that while editing a line in the middle of a query, the error messages might be quite confusing:

from foo
derive bar = baz or<cursor>
aggregate (
  max fuz
)

...would this have a confusing error message saying "baz or aggregate is invalid"?

@ajfriend
Copy link
Author

ajfriend commented Jul 19, 2022

Yes, interesting. So we could have a rule that finishing the line on an operator (e.g. + / or / etc) causes a line continuation.

But that's maybe valid only inside a set of parens? (I'm not sure what the full Python parsing logic is.) Additionally, it might also be valid if the next line starts with an operator, which is that last Python example (inside parens).

@ajfriend
Copy link
Author

...would this have a confusing error message saying "baz or aggregate is invalid"?

"Unexpected newline"?

@max-sixty
Copy link
Member

"Unexpected newline"?

The challenge is that it wouldn't be parsed as a newline, it would be parsed as derive bar = baz or aggregate (, unless I'm misunderstanding something?

But that's maybe valid only inside a set of parens?

Yes, this could be possible. It starts to get a bit complicated — copy something from inside parens to outside parens and it doesn't work any longer.

But python is good prior art, I think this should be OK.

Additionally, it might also be valid if the next line starts with an operator, which is that last Python example (inside parens).

Yes, good idea. We'll have to see whether pest supports this, I think it should be possible with positive predicates

@ajfriend
Copy link
Author

Cool! And to be clear, I'm just offering my random thoughts as a fan in the peanut gallery. Appreciate all the work you've already done!

@max-sixty
Copy link
Member

Cool! And to be clear, I'm just offering my random thoughts as a fan in the peanut gallery. Appreciate all the work you've already done!

Thoughts generally welcome, and in the specific case they're also great ideas @ajfriend !

@aljazerzen
Copy link
Member

I'd argue against trying to parse newlines after operators or inside parenthesis. I like \ far more, regardless of the fact that I find it quite ugly.

That's because newline (as a token) is already "used up" on higher levels of parsing, as @max-sixty said. Precedence goes roughly like:

newline, |
and, or
<, >, <=, >=
+, -
*, /
(), .., []

Simplified, tokens are first split by newlines and |, then by and and or, and so on.

Now, if we would want to ignore newlines in () or after +, we'd have to parse () and + higher in precedence list, which would add a lot of complexity.

But why can python support this? Python has two precedence lists: one for expressions and one for statements. Newline is a part of statement precedence list and it means "end of statement". And because statements cannot be part of expressions, and because ( indicates unfinished expression, Python can only use expression precedence list and resolve newline to mean "whitespace".

Back to PRQL, newlines can be part of expressions - actually the whole pipeline is an expression! That's why even in (), newlines must mean "pipe".

I came up with another idea: newlines meaning "pipe" make sense only in pipelines. Pipelines can be just top-level plain function calls, or function calls wrapped in (). But they cannot be function calls in []! Which means that without much hussle, I think we can make this work:

derive [
  some + long + expr + 
    spans + multiple + lines
]

or

derive [some + long + expr + 
    spans + multiple + lines]

@sietse
Copy link

sietse commented Dec 23, 2022

I have a question, not about future multiline syntax, but about a potential workaround: perhaps in Prql 0.3.1, we can use currying to add arguments one at a time. Perhaps not.

This is not meant as a derailing feature request, only as a "can this workaround be made to work" query. If it can't, no worries.

Alright, the would-be workaround:

# This function takes three arguments, so maybe you want to write it on multiple lines
func addthree<column> a b c -> s"a + b + c"

# 'arg x' takes a curried function and adds x to the end of its argument list.
# addthree a | arg b  -> addthree a b
# because newlines are equal to pipes, this allows building up a function's arguments over multiple lines
func arg myarg myfunc -> ( myfunc myarg )

from mytable
derive [
  my_thing = (  # a multiline pipe that builds up a function call
    addthree apples  # addthree apples _ _
    arg bananas  # addthree apples bananas _
    arg citrus  # `addthree apples banana citrus`, a.k.a. `apples + bananas + citrus`
  )
]

Compiling this with 0.3.1 gives an error (below) that I think I can summarize as "I expected a column expression, but I found a function with signature infer -> infer". Is it possible to write the code differently so it works in 0.3.1? If not, no matter. Thanks for all you've done and are doing on PRQL!

# Linebreaks added. Character span not accurate.
Error {
  span: Some(span-chars-110-176),
  reason: Expected {
    who: Some("function std.derive, param `columns`"),
    expected: "type `column`",
    found: "type `func infer -> infer`"
  },
  help: None
}

@aljazerzen
Copy link
Member

Woah, a great workaround using currying! Maybe it was worth implementing 😄

Your example should work and the fact that it doesn't is a bug. I'll look into it. There seems to be a problem with pipelines, because normal function calls do work (but defeats the whole purpose of the workaround):

func addthree<column> a b c -> s"{a} + {b} + {c}"

func arg myarg myfunc -> ( myfunc myarg )

from mytable
derive [
  my_thing = (
    arg "citrus" (arg "bananas" (addthree "apples"))
  )
]

@aljazerzen
Copy link
Member

Cancel that, even my example produces a wrong output.

@sietse
Copy link

sietse commented Dec 29, 2022

Well, thank you very much for taking a look. PRQL is a beautiful thing. Have a good slide into the new year!

@sietse
Copy link

sietse commented Dec 29, 2022

A thought about backslash \ for line continuations: it can interact badly with line comments.

  • In languages where \ at the end of a line indicates a line continuation ...
  • ... and '#' comments all characters until the end of the line ...
  • ... in those languages, you can't have line comments on multiline expressions.

For example, this is invalid bash:

tar \
  -x  # extract \
  -z  # gzipped \
  -f  # read from file \
  myfile.tar.gz

and so is this:

mycommand \
  | head --bytes=-2  # remove spurious CR LF from end \
  | tail +2  # remove header 

Some potentional ways to allow comments AND line continuations, if that is important to you.

Contrived example featuring addthree. Find shows with free places, by adding up how many places each kind of ticket represents, and then comparing that to the theatre's capacity.

Flavour 1: 'backslash EOL' for line continuation. 'backslash space comment EOL' also for line continuation. 'EOL' for pipe.

from tickets_per_show
filter addthree \
    seats \
    couches * 2 \   # couches are double-sized
    standing / 2 \  # standing people take less room
    != max_capacity

Flavour 2: 'EOL space backslash' for line continuation. 'EOL' for pipe.

In other words, line that starts with spaces + backslash gets glued to previous line, after comment removal.

from tickets_per_show
filter addthree
    \ seats
    \ couches * 2   # couches are double-sized
    \ standing / 2  # standing people take less room
    \ != max_capacity

Flavour 3: infix comments

from tickets_per_show
filter addthree \
    seats \
    couches * 2   /* couches are double-sized */ \
    standing / 2  /* standing people take less room */ \
    != max_capacity

Flavour 4: line continuations allow separate-line comments, not end-of-line comments,

from tickets_per_show
filter addthree \
    seats \
    # couches are double-sized
    couches * 2   \
    # standing people take less room
    standing / 2  \
    != max_capacity

@aljazerzen
Copy link
Member

Good breakdown of options for the backslash escaping route. I prefer flavour 4 the most.

But as I said, backslash looks ugly (that's just my preference) and has to be adjusted when adding a new line at the end. It's like a trailing comma. And we cannot allow trailing backslashes...

@max-sixty
Copy link
Member

I agree for Flavour 4! (I think the example is missing +s — is that right? Or I'm misunderstanding the example.)

Overall I'd be fine with this; it doesn't conflict with other syntax. I think we should try and reduce the amount it's needed — but sometimes it will be, as the example shows.

And agree with @aljazerzen on this point; specifically that it's not "composable", in the same way as a trailing comma isn't:

has to be adjusted when adding a new line at the end. It's like a trailing comma. And we cannot allow trailing backslashes...

If I can suggest something to the design — trailing spaces after the backslash should be ignored. In bash, \ does not count as a trailing backslash, which can be really confusing.

@aljazerzen aljazerzen added the needs-discussion Undecided dilemma label Jan 8, 2023
@max-sixty
Copy link
Member

As discussed on Discord — are we broadly in agreement on this option?

Flavour 4: line continuations allow separate-line comments, not end-of-line comments,

from tickets_per_show
filter addthree \
    seats \
    # couches are double-sized
    couches * 2   \
    # standing people take less room
    standing / 2  \
    != max_capacity

Note:

(I think the example is missing +s — is that right? Or I'm misunderstanding the example.)

@aljazerzen
Copy link
Member

This does seem like the best option. Nevertheless, I'd prefer not add, for the reasons above.

I'm -0.5 on it.

@max-sixty
Copy link
Member

How would you represent the a long expression @aljazerzen ? I agree the trailing backslashes are ugly! But is there a better way?

FWIW this is longer the case:

But they cannot be function calls in []! Which means that without much hassle, I think we can make this work:

@aljazerzen
Copy link
Member

Within lists, we could make this work:

from a
select [
  sum
  b,
  avg
  b
]

... and an idea: non-pipelining parenthesis!

from a
select ((
  sum
  b
))

@max-sixty
Copy link
Member

... and an idea: non-pipelining parenthesis!

The abyss just stared back at me! (niche joke / reference!)

Within lists, we could make this work:

How about for a long expression with operators? I think that's the most likely long expression, rather than function calls. As above, we could allow ending a line with those (though it may create some odd error messages)?

I think that's possible. I'm not sure whether that's better than the trailing slashes.


I'm also OK saying "the current state isn't great. But the options we're looking at aren't good either, we're going to leave this open for a while and reflect, rather than implementing something bad", if we think that reflection might pay off.

Or "we're OK with X, but it's not a priority to implement for us, we'll accept a PR", which is an implicit filter without us getting in the way of anyone.

I think both of those are better than leaving something hanging for ages (as you pointed out on Discord / on the call!)

@snth
Copy link
Member

snth commented Jan 9, 2023

FWIW, my favourite is probably Flavour 1 and I'm not such a fan of Flavour 4 - it just seems harder to read to me (but I'm just speaking as a user, not an implementer).

Wouldn't Flavour 2 possibly address the "trailing comma" type issue as only the first line is different and that's special anyway. It wouldn't be my first choice as you can't tell on each line that there is more coming but to me it seems acceptable if it deals with the trailing commas issue. How would this flavour deal with commenting things out though?

Would the following work?

from tickets_per_show
filter addthree
    \ seats
    # \ couches * 2   # couches are double-sized
    \ couches * 3   # couches are triple-sized
    \ standing / 2  # standing people take less room
    \ != max_capacity

@max-sixty 's suggestion of "leave this open and accept a PR" seems good for now.

@max-sixty
Copy link
Member

max-sixty commented Jan 9, 2023

Would the following work?

from tickets_per_show
filter addthree
    \ seats
    # \ couches * 2   # couches are double-sized
    \ couches * 3   # couches are triple-sized
    \ standing / 2  # standing people take less room
    \ != max_capacity

This actually might work! It makes the parser ambiguous for one item, but that's OK I think.

Or are there things I'm missing?

(I'm going to take the executive decision of filling in the +, which I'm fairly convinced are missing in error and make the examples wrong, since no one corrected me above)

@snth
Copy link
Member

snth commented Jan 9, 2023

@max-sixty The reason there are no +s is because of the addthree function that was defined higher up (here).

@sietse
Copy link

sietse commented Jan 10, 2023

snth is right. I felt that a long expression like seats + couches + standing risked a sidetrack into "a line ending in + is obviously incomplete" territory, hence why I made up addthree seats couches standing for the examples.

@aljazerzen aljazerzen removed the needs-discussion Undecided dilemma label Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language-design Changes to PRQL-the-language
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants