New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider changing the column identifier syntax to $column #187
Comments
Thanks for this proposal. I think this is a very good idea, particularly with the use of strings with spaces. I was also thinking of having some syntax like One problem is that I think my general reaction is to just discourage using Note that something like |
In my example I use symbols as data just to show the problem, I think it's much more common that functions you want to apply on your data need some symbols passed to them to define which behavior is chosen. The symbols don't have to come out of the dataframe for the syntax collision to be annoying for the user. In my view it's valuable to conflict with as little standard syntax as possible, and I'd argue the $ syntax helps with that. |
I wonder if actually we want |
Yes, I always felt that the Although I still feel that it's nicer to mark column references with the unusual |
Ah one more idea..
There is another symbol which is sometimes used for macros like these, which would work: @transform(df, valid = (&group .== :alpha) .& (&"total weight" .< 0.5) |
@jkrumbiegel how does DFMacros do string interpolation and broadcasting? I was pleasantly surprised to learn that
works on
is a pretty useful pattern. If we do |
This works in DFMacros: df = DataFrame(a = [1, 2, 3], b = [4, 5, 6])
@transform(df, "$(:a)$(:b)")
3×3 DataFrame
Row │ a b a_b_function
│ Int64 Int64 String
─────┼────────────────────────────
1 │ 1 4 14
2 │ 2 5 25
3 │ 3 6 36 The julia> Meta.@dump "$a$b$c"
Expr
head: Symbol string
args: Array{Any}((3,))
1: Symbol a
2: Symbol b
3: Symbol c |
And this also works: df = DataFrame(a = [1, 2, 3], b = [4, 5, 6])
x = 5
y = :b
@transform(df, "$(x * :a)$($y)")
3×3 DataFrame
Row │ a b a_b_function
│ Int64 Int64 String
─────┼────────────────────────────
1 │ 1 4 54
2 │ 2 5 105
3 │ 3 6 156 |
Ah i see. So so the order of operations works itself out. That's good to know. How about broadcasting? |
What about it? |
I want to make sure users can use the
|
Ah I see, I have never used that, is it a common expression? I think that's too niche for me to consider in DFMacros at least. But in principle, one could use |
Yeah, I've never used it either, but I think it might end up as an important performance consideration. Check out the docs here where I outline a few gotchas that are probably relevant for DFMacros.jl. I was going to make an example, but it looks like I have a bug with the use of |
Ah here's an MWE that isn't too niche
|
Closed via #266 |
I don't know if this is something that is even up for debate. But one aspect of DataFramesMeta which I think could be improved is that I can't use symbols as symbols in computation expressions without wrapping them in
^()
.To use columns with string identifiers, I have to use
cols("columnname")
or for integer indicescols(123)
. I think this could be more streamlined by deciding on one way to use all three types.I think the
$
syntax lends itself to this very nicely, because$
is a symbol that is not used in normal Julia code aside from string interpolation (where it's a direct syntax transformation), so it doesn't conflict with other things.My proposal is to make this change:
This also works with integers like
$1 .* $20
and with any other expression that can result in a column identifier like$("column" * "1")
.I know that this package has existed for a long time with the old syntax, but I think with this change it's clearer where macro transformations are taking place (the unusual
$
symbols stick out) and there is no confusion with real symbols, or thecols
function.The text was updated successfully, but these errors were encountered: