Skip to content

Conversation

@pdeffebach
Copy link
Collaborator

@pdeffebach pdeffebach commented Apr 22, 2021

With this PR we have

julia> df = DataFrame(a = 1, b = 2);

julia> @transform df begin 
           x = 1
           y = 2
           z = 5
       end
1×5 DataFrame
 Row │ a      b      x      y      z     
     │ Int64  Int64  Int64  Int64  Int64 
─────┼───────────────────────────────────
   1 │     1      2      1      2      5

@bkamins
Copy link
Member

bkamins commented Apr 22, 2021

If I understand this correctly you want to bring a similar syntax to what Chain.jl offers?

@pdeffebach
Copy link
Collaborator Author

The implementation is similar, yeah. Turn a collection of args into a :block.

I need to think harder about what I do and do not want to allow, as I think being too flexible adds a lot of complication. I'm surprised that tests fail, they passed locally.

@pdeffebach
Copy link
Collaborator Author

pdeffebach commented Apr 22, 2021

Chain.jl seems to have figured this out, actually. cc @jkrumbiegel

For instance, the following works in Chain.jl

julia> using Chain

julia> x = 1
1

julia> @chain x begin 
           begin 
               t = _
               p =t + 100
           end
           first
       end
101

@pdeffebach
Copy link
Collaborator Author

This roadblock has been solved. It was an optimization that was introduced in the anonymous functions PR. I have fixed it by removing LineNumberNodes in more places. Will have to figure out how bad this is.

Copy link
Collaborator Author

@pdeffebach pdeffebach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for a review!

# recursive step for begin :a + :b end
if function_expr isa Expr &&
function_expr.head == :block &&
length(function_expr.args) == 2 # omitting the line number node
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just call Base.remove_linenums! wherever possible, which changed the way we index expressions in a few places.

@pdeffebach
Copy link
Collaborator Author

Okay I've cleaned up the diff a bit.

One thing in particular to look at is the transform tests. We should nail those down before I copy and paste for every macro.

@pdeffebach
Copy link
Collaborator Author

I've simplified the implementation and changed the docstring for @transform. Requesting a review from @nalimilan

@nalimilan
Copy link
Member

Interesting! Have you thought about the possibility to support advanced syntax allowed by @eachrow in the future, like @newcol x::Vector{Float64} and if .... x = 1; else; x = 2; end? That would allow simplifying the API by deprecating @eachrow in favor of @transform.

This roadblock has been solved. It was an optimization that was introduced in the anonymous functions PR. I have fixed it by removing LineNumberNodes in more places. Will have to figure out how bad this is.

Isn't it a problem to remove line numbers? Aren't they necessary to print accurate stacktraces in case of errors? (That's really a strong point compared to dplyr where I remember having a hard time finding out which lines generate errors.)

Finally, I guess the idea is to extend this to @transform!, @select, @select!, etc.?

@pdeffebach
Copy link
Collaborator Author

Interesting! Have you thought about the possibility to support advanced syntax allowed by @eachrow in the future, like @newcol x::Vector{Float64} and if .... x = 1; else; x = 2; end? That would allow simplifying the API by deprecating @eachrow in favor of @transform.

I don't think so. @eachrow is a for-loop and so you can do all sorts of things inside it, not strictly create columns.

This leads me to a design question: Is it weird to have @transform begin ... end with a :block but require that expressions only start with y = ...? Is this against the philosophy of using a :block?

In @eachrow for example, you can do anything you want inside an @eachrow block.

Finally, I guess the idea is to extend this to @transform!, @select, @select!, etc.?

Yes, If you think the tests and docstring are good for @transform I will copy them over to everything else.

@pdeffebach
Copy link
Collaborator Author

Isn't it a problem to remove line numbers? Aren't they necessary to print accurate stacktraces in case of errors? (That's really a strong point compared to dplyr where I remember having a hard time finding out which lines generate errors.)

You would think so, but we are no worse than in 0.6.0, before the parsing re-write where I dropped line numbers.

I think part of this might be due to Chain. We seem to get the line number of the start of a @chain block, but not commands inside of it. Which is really bad! But no worse than before. We can detect errors okay outside of @chain.

This needs a separate investigation later on. If we find that these line number removals are causing problems, I can fix them later. But it's likely upstream.

@pdeffebach
Copy link
Collaborator Author

pdeffebach commented May 10, 2021

w.r.t. line numbers. I think we are in the clear.

With this script

using Chain

foo(x, y) = x * y

x = [1, 2]

@chain x begin
    identity
    identity
    identity
    foo([3, 4])
end

We get the error

julia> include("linenumbers_test.jl")
ERROR: LoadError: MethodError: no method matching *(::Vector{Int64}, ::Vector{Int64})
Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...) at operators.jl:560
  *(::StridedMatrix{T}, ::StridedVector{S}) where {T<:Union{Float32, Float64, ComplexF32, ComplexF64}, S<:Real} at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:44
  *(::StridedVecOrMat{T} where T, ::LinearAlgebra.Adjoint{var"#s832", var"#s831"} where {var"#s832", var"#s831"<:LinearAlgebra.LQPackedQ}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/lq.jl:254
  ...
Stacktrace:
 [1] foo(x::Vector{Int64}, y::Vector{Int64})
   @ Main ~/Documents/Development/DataFramesMeta/linenumbers_test.jl:3
 [2] top-level scope
   @ ~/Documents/Development/DataFramesMeta/linenumbers_test.jl:11
 [3] include(fname::String)
   @ Base.MainInclude ./client.jl:444
 [4] top-level scope
   @ REPL[58]:1
in expression starting at /home/peterwd/Documents/Development/DataFramesMeta/linenumbers_test.jl:7

In [2] is correctly identifies the line number as line 11. Unfortunately it points to the block beginning on line 7 as the most highlighted error. But oh well, at least it's catching line 11.

@pdeffebach pdeffebach changed the title Make block Make block, deprecate @byrow May 10, 2021
@pdeffebach
Copy link
Collaborator Author

@nalimilan this PR should be ready for a review, if not merging.

I added docstrings and tests for everything.

I didn't change index.md in the docs because there aren't many examples of multi-argument transforms. Though eventually I want to get rid of all commas and parentheses in the docs. But I will wait until after @byrow is added to do that re-write.

@pdeffebach pdeffebach changed the title Make block, deprecate @byrow Make block May 10, 2021
@pdeffebach
Copy link
Collaborator Author

Fixed conflicts with master branch after #247

Still ready for a review.

Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Just small details about the docs.

src/macros.jl Outdated
group, and returns a fresh `DataFrame` containing the rows
for which the generated values are all `true`.
Inputs to `@where` can come in two formats: a `block`, in which case each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Block" isn't a language keyword, right? Also isn't super easy to understand. Maybe "as a begin... end block" and say "line" instead of "argument"?

(Same for other functions.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed all of these.

pdeffebach and others added 3 commits May 16, 2021 13:31
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@pdeffebach
Copy link
Collaborator Author

Thanks for the review. I have fixed all the issues. Once tests finish this should be ready for merging.

@pdeffebach pdeffebach merged commit 6149846 into JuliaData:master May 16, 2021
@pdeffebach
Copy link
Collaborator Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants