Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform should expand a data frame when it has 0 rows. #3301

Closed
pdeffebach opened this issue Mar 14, 2023 · 3 comments
Closed

transform should expand a data frame when it has 0 rows. #3301

pdeffebach opened this issue Mar 14, 2023 · 3 comments

Comments

@pdeffebach
Copy link
Contributor

pdeffebach commented Mar 14, 2023

This would be convenient behavior for creating a DataFrame iteratively

julia> df = DataFrame()
0×0 DataFrame

julia> transform(df, [] => (() -> rand(10)) => :y)
ERROR: ArgumentError: length 10 of vector returned from function #35 is different from number of rows 0 of the source data frame.

My use-case was that I was trying to do something along the lines of

julia> df2 = @chain DataFrame() begin
           @transform :x_star = rand(N)
           @rtransform :y = begin
               ϵ = rand()
               y = f(:x_star, 0) + f(ϵ, 0)
           end
       end;

and expected this to work, especially since

julia> df
0×0 DataFrame

julia> insertcols!(df, :b => rand(10))

works.

@bkamins bkamins added this to the 1.6 milestone Mar 14, 2023
@bkamins
Copy link
Member

bkamins commented Mar 14, 2023

This is intentional. Why would you expect this to work? insertcols! is different. Note the docstrings:

  • transform:
The result is guaranteed to have the same number of rows as df.
  • insertcols!:
If df isa DataFrame that has no columns and only values other than AbstractVector
are passed then it is used to create a one-element column. If df isa DataFrame that
has no columns and at least one AbstractVector is passed then its length is used to
determine the number of elements in all created columns. In all other cases the
number of rows in all created columns must match nrow(df).

(of course I would be open to change it, but this is the key point why we have both functions that they have a different behavior)

@pdeffebach
Copy link
Contributor Author

Fair enough. I wanted to avoid insertcols! because of the different syntax (and there is no DataFrameMeta.jl macro for it)

Your comment made me realize that I can use combine for this, since a main difference is that combine can re-size.


julia> using DataFramesMeta
@ch
julia> @chain DataFrame() begin
           @combine :x = rand(100)
       end
100×1 DataFrame
 Row │ x         
     │ Float64   
─────┼───────────
   1 │ 0.226262
   2 │ 0.859944
...

@bkamins
Copy link
Member

bkamins commented Mar 14, 2023

Ah - you are right. So closing :).

@bkamins bkamins closed this as completed Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants