Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'on' clause fails on white spaces around operators and on operators in variable names #3092

MarkusBonsch opened this issue Oct 3, 2018 · 4 comments · Fixed by #3094


Copy link

@MarkusBonsch MarkusBonsch commented Oct 3, 2018

The issue originally popped up her : #2931.
There are two issues from my perspective with 'on'.

White spaces around operators make it fail:

DT <- data.table(a = 1:3, b=1:3)
i <- data.table(ai = 1:3, bi = 1:3)
DT[i, on = "a >= ai"]
## throws error

Column names including operators make it fail

DT <- data.table(`a>=` = 1:3, b = 1:3)
i   <- data.table(ai = 1:3, bi = 1:3)
DT[i, on = "`a>=`==ai"]
## throws error
@MarkusBonsch MarkusBonsch self-assigned this Oct 3, 2018
Copy link

@HughParsonage HughParsonage commented Oct 3, 2018

To me the first issue is not a bug though perhaps a better error message would be nicer. (That said the first time I encountered it I immediately understood what was going on.)

Might also be worth considering these pathological beauties if only to settle the rules:

DT <- data.table(`a=` = 1:3, a = 1:3, `=` = 1:3, b = 1:3)
i   <- data.table(`=a` = 1:3, a = 1:3, `=` = 1:3, bi = 11:13)

Copy link

@MichaelChirico MichaelChirico commented Oct 3, 2018

We already have some verboten column names (.SD springs to mind); why not make these symbols verboten as well? Could be something that only triggers when on is used, so that people don't get annoying errors from fread for example (I have in mind spending 3 minutes freading a huge file only to have it crash at the end when lo and behold the file has dirty column names)...

Copy link

@franknarf1 franknarf1 commented Oct 3, 2018

I also think the no-whitespace thing is okay but maybe could be clearer.

@MichaelChirico Re verboten or late-verboten patterns (that the user only notices when joining), it seems more natural to me to allow most names (except .SD and whatever else is already blocked), but just support backticks, like in Markus' original attempt. That's where the syntax for by= ended up anyways:

DT = data.table(x = 1, "y,z" = 2)
DT[, .N, by=.(x, `y,z`)] # works
DT[, .N, by="x,`y,z`"]   # works

Backticks don't work in the string or .(...) forms of on= yet, so maybe just convert this issue into a FR to add it?

Misc comments: Without support for backticks, as far as I know, there's no way to handle Hugh's examples. Lack of support for backticks also forces use of the string form, but that's not a big deal. Maybe related: #1639

Copy link

@st-pasha st-pasha commented Oct 3, 2018

@MichaelChirico I think it's only forbidden if you go through the front entrance:

> data.table(`.SD` = 1:3)
Error in name_dots(...) : 
  A column may not be called .SD. That has special meaning.
> DT = fread(".SD,A\n1,2"); DT
     .SD     A
   <int> <int>
1:     1     2
> setnames(DT, c("Z", ".SD")); DT
       Z   .SD
   <int> <int>
1:     1     2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

6 participants