Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standard functions sometimes can't be used after the as function #3127

Closed
2 tasks done
eitsupi opened this issue Jul 27, 2023 · 8 comments · Fixed by #3923
Closed
2 tasks done

standard functions sometimes can't be used after the as function #3127

eitsupi opened this issue Jul 27, 2023 · 8 comments · Fixed by #3923
Labels
bug Invalid compiler output or panic compiler

Comments

@eitsupi
Copy link
Member

eitsupi commented Jul 27, 2023

What happened?

From https://github.com/eitsupi/querying-with-prql/blob/4acff8fa395a9a2c33cad15ed61866dabac25157/docs/method_chaining.qmd#L128-L158

I rewrote the example that worked in PRQL 0.8 with the 0.9 syntax and an error occurred.

let time_to_datetime = string -> (FlightDate | as timestamp) + s"""
  TRY_CAST(regexp_replace({string}, '^2400$', '0000').substr(1, 2).concat(' hours') AS INTERVAL) +
  TRY_CAST(regexp_replace({string}, '^2400$', '0000').substr(3, 2).concat(' minutes') AS INTERVAL)
"""

from tab
select {
  FlightDate,
  DepTimeOld = DepTime
}
derive {
  DepTime = (time_to_datetime DepTimeOld)
}
take 5

For example, the following minimal example will result in the following error:

Error: 
   ╭─[:2:23]
   │
 2 │ select col1 = (col1 | as int) + 1
   │                       ───┬──  
   │                          ╰──── function std.add, param `left` expected type `int || float || timestamp || date`, but found type `scalar`
───╯

PRQL input

from tab
select col1 = (col1 | as int) + 1

SQL output

NA

Expected SQL output

SELECT CAST(col1 AS int) + 1 AS col1
FROM tab

MVCE confirmation

  • Minimal example
  • New issue

Anything else?

No response

@eitsupi eitsupi added the bug Invalid compiler output or panic label Jul 27, 2023
@eitsupi eitsupi changed the title standard functions can't be used after as standard functions sometimes can't be used after the as function Jul 27, 2023
@vthriller
Copy link
Contributor

Interestingly,

from tab
select col1 = (col1 | as int) * 2

works. It's std.add/std.sub that throws this error.

@max-sixty
Copy link
Member

Good find.

And to note, adding parentheses doesn't help:

from tab
select col1 = ((col1 | as int) + 1)
Error: 
   ╭─[:2:24]
   │
 2 │ select col1 = ((col1 | as int) + 1)
   │                        ───┬──  
   │                           ╰──── function std.add, param `left` expected type `int || float || timestamp || date`, but found type `scalar`
───╯

@eitsupi
Copy link
Member Author

eitsupi commented Jul 30, 2023

It seems that only add and sub are specifing types.
Perhaps scalar should also be allowed?

let mul = left right -> <int || float> internal std.mul
let div_i = left right -> <int || float> internal std.div_i
let div_f = left right -> <int || float> internal std.div_f
let mod = left right -> <int || float> internal std.mod
let add = left<int || float || timestamp || date> right<int || float || timestamp || date> -> <int || float || timestamp || date> internal std.add
let sub = left<int || float || timestamp || date> right<int || float || timestamp || date> -> <int || float || timestamp || date> internal std.sub

@eitsupi
Copy link
Member Author

eitsupi commented Jul 30, 2023

I have looked at this and it seems that allowing scalar will not pass the following test. (Errors will no longer occur)

#[test]
fn test_basic_type_checking() {
assert_display_snapshot!(compile(r#"
from foo
select (a && b) + c
"#)
.unwrap_err(), @r###"
Error:
╭─[:3:13]
3 │ select (a && b) + c
│ ───┬──
│ ╰──── function std.add, param `left` expected type `int || float || timestamp || date`, but found type `bool`
───╯
"###);
}

@PrettyWood
Copy link
Collaborator

So the main difference is that there are constraints on types for add (and sub) but not for mul (and div)

let add = left<int || float || timestamp || date> right<int || float || timestamp || date> -> <int || float || timestamp || date> internal std.add
let mul = left right -> <int || float> internal std.mul

Meaningselect col1 = (col1 | as int) * 2 is ok but select col1 = (col1 | as int) + 2 is not when it should be.
At the same time something like (a && b) * col1 is ok even though it's wrong and is rightly caught as an error with +.

So the question is : should we write
let add = left right -> internal std.add
in order to solve this problem even though some errors are not caught anymore?

If not, a clean fix needs to be found to allow add to work. If this fix is found, it will probably be available for mul and stuff and some constraints could be added to mul to stop allowing stuff like (a && b) * col1

@max-sixty
Copy link
Member

in order to solve this problem even though some errors are not caught anymore?

I would tend towards "fewer false positives, even if that means fewer false negatives" — particularly for instances where there's no escape hatch.

So I'm +0.2 on this. @aljazerzen understands it much better, so if he has a view, it likely outweighs mine...

@PrettyWood
Copy link
Collaborator

If the right fix is hard to implement I'm also in favor of a quick fix to allow valid syntax.
@aljazerzen or you are the best to say whether the proper fix is hard to implement

@aljazerzen
Copy link
Member

aljazerzen commented Dec 10, 2023

So, the problem here is that:

  1. Type of col1 | as int is inferred to be scalar, but it should be int.
  2. std.add has type-annotated params, so the type of argument is checked and we get a type error.
    (because scalar contains text and bool, which cannot be std.add-ed)

Problem 1 is a missing feature, since std.as does not do any type inference at all, but just compiles to an SQL cast. The "right fix" is to actually implement type casting. This would involve:

  • adding special grammar for parsing type casts, since we want to parse a type expression, which is not the same thing as "normal" expressions,
  • removing std.as in favor of a special compiler implementation,
  • add RQ operators for casting between different types.

This will take a bit of time and it also depends on inference of types of columns in relations, which I working on right now.

Problem 2 is not really a problem, it is expected behavior.


Re a quick fix: in this case an error is very inconvenient, so the quick fix of removing type annotation in std module seems a reasonable thing to do. They can easily be added back later. If we do that, make sure to add a test for this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Invalid compiler output or panic compiler
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants