Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special case Not, Between, Regex etc. before 1.0? #283

Closed
pdeffebach opened this issue Aug 4, 2021 · 7 comments
Closed

Special case Not, Between, Regex etc. before 1.0? #283

pdeffebach opened this issue Aug 4, 2021 · 7 comments

Comments

@pdeffebach
Copy link
Collaborator

Should we allow for

@select df Not(:a)

before we go for 1.0?

currently you have to do

@select df $(Not(:a))

which is ugly.

I would guess we should special case Not, Between, Regex, and r"...". But not special case 1, 2, 3... because integers are more likely to conflict with normal code doing analysis.

The thing to consider is whether or not the macros that accept arguments as-is (no :y = f(:x) requirement) like @orderby, @subset, and @with will be affected by this. I don't anticipate any issues.

This can be added post-1.0. But maybe we want to be very polished for that.

@pdeffebach
Copy link
Collaborator Author

This could open up the door to thinking about

vars = Not(:a)
@select df vars

which might be expected to work if @select df Not(:a) works. I don't like the idea of specializing too much on the exact syntax used. If only $Not(:a) was allowed instead of needing $(Not(:a)).

@bkamins
Copy link
Member

bkamins commented Aug 4, 2021

I went though documentation on main and docstrings and my conclusion is that unfortunately all has to be read through to make sure it is up to date (as we have some sentences that are outdated scattered). Unfortunately you are the only one that can do this reliably I am afraid 😢.

Some particular issues:

  • $ is typeset as \$ in rendered HTML in several places which can confuse users
  • we sometimes still refer to keyword arguments, see e.g.
    * `e` : keyword arguments specifying new columns in terms of existing columns

Finally I missed a very precise specification how expressions passed to macros are parsed. In particular:

  1. I am missing a very precise rule explaining what $ does (now only examples are given)
  2. I am missing a very precise rule explaining how columns are parsed - I assume that by default literal Symbols are treated as columns only - is this correct?

Having said these two things I do not understand why things like:

@select df r"x"

do not work, but

@select df $(r"x")

works.

E.g. I do not understand the rule for the following:

julia> @macroexpand @subset df r"x"
quote
    (DataFrames.subset)(df, begin
            DataFramesMeta.make_source_concrete([]) => begin
                    ()->begin
                            r"x"
                        end
                end
        end; skipmissing = true)
end

julia> @macroexpand @subset df $(r"x")
quote
    (DataFrames.subset)(df, r"x"; skipmissing = true)
end

and it was not clear for me how DataFramesMeta.jl decided that it does not need to create an anonymous function.

@pdeffebach
Copy link
Collaborator Author

Okay, I will open up a PR to work on this.

I do not know how to fix the \$ issue. If we do not escape inside the markdown strings, we get errors that certain variables do not exist because it tries to interpolate whatever is after the $. I will explore what other packages do to get around this problem.

With regards to r"x", the rule is here:

if is_column_expr(ex)

This line says that if an argument is just $(...), then it gets passed directly as is. Let's focus on @subset. Arguments to @subset do not need to be in a keyword-argument format, i.e while @transform needs @transform df :y = f(:x), the expressions passed to @subset can be arbitrary.

x = [true, true, false]
@subset df :y .== x

gets the same code path as

@subset df x .== true # @subset df x actually fails due to a bug...

We don't require users to reference any columns at all, so arbitrary expressions are assumed to enter the anonymous function. Let's say we make non-keyword arguments default to column selectors. This works fine in @select because if something isn't a :y = f(:x) expression, we know anything else would throw an error and can assume the user wants to select a column.

But in @subset arbitrary expression are valid, so we can't make that assumption and need users to be explicit about $(...).

@bkamins
Copy link
Member

bkamins commented Aug 4, 2021

I do not know how to fix the $ issue.

have you tried raw strings?

@bkamins
Copy link
Member

bkamins commented Aug 4, 2021

Regarding the rules. What I think would be very good is if you written down all the rules in the documentation somewhere (something just like you have described above, but an exhaustive list). If I had such a list I could more easily check things.
What I mean by such a list is a list similar to what we have e.g. for indexing or for transformation minilanguage (this list with 7 cases accepted)

@pdeffebach
Copy link
Collaborator Author

Okay writing this up, two notes so far

  • We should allow multi-argument selectors in @orderby
  • We should make it easier to write :a => fun inside @with.

@pdeffebach
Copy link
Collaborator Author

Added in #372

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants