-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setindex!
/broadcast!
design
#1645
Comments
Regarding the broadcasting support, the reason why I consider leaving "automatic broadcasting" for
When we would disable automatic broadcasting then . |
Overall I'd answer "yes" to most of your questions. A few remarks:
Given that we allow creating a column with a new name, which will add it at the end, isn't it consistent to allow indexing the
Can you develop? Does this function automatically broadcast?
Strictly speaking, even the first method implies broadcasting if we assume the similarity with matrices. So maybe we should also require Regarding your second comment, it's indeed annoying that we can't allow So I'm not sure whether we should disallow implicit broadcasting. That would be more consistent with Base, but maybe less convenient. Or maybe we should allow |
I would disallow it because
Now tell me what is
which is not intuitive. As I have mentioned as an alternative to append a column in-place I would export
The problem is the following:
(this is a bug that we should fix anyway - it was introduced by me and I did not catch this special case), but in general after this
writes into these columns,
also leaves them but changes their values and it is natural to expect that
This is the main problem I have. I.e. if we treat
This would leave standard |
Makes sense.
Unfortunately I don't think we can support |
In general not only The general reason is that broadcasting is designed not to mutate the size of the target and for This means that (unless we do heavy tweaking of broadcasting mechanism) we have to keep: So the question is - do we want to go for it as most likely we will have to keep inconsistency anyway? The alternative is to keep doing implicit broadcast as we do currently (and this would be a recommended style) and do not care about explicit broadcasting (unless it already works). I have the possible rules with broadcasting written down if you would want to see them. The only benefit would be that the following would work and now it fails:
at the cost of problems when users want to add columns via broadcasting. |
Are you sure that adding new columns requires more work than supporting |
Yes - try running it yourself - but in general the problem is that |
Notice that even in https://docs.julialang.org/en/latest/manual/interfaces/#man-interfaces-broadcasting-1 it is recommended to define only |
As I have said - it is possible to work around it, e.g. like this (this is not a full solution, but it roughly shows how you can do it, but it is messy):
|
This isn't a huge concern for me. It's not like |
This is the whole problem - in DataFrames.jl |
Thanks for the explanation. I didn't expect this to be so easy. Actually your example above is quite convincing about the fact that we can allow broadcasting to create columns. ;-) Anyway, I guess we can start implementing standard broadcasting support, keeping the automatic broadcasting via |
OK - I will push a PR with a specification of a target functionality so that we can work on a file.
and
seems to allow us to intercept and process whatever we want. CC: @mbauman - if you can spare some time could you please comment if this is an appropriate way to specialize broadcasted assignment in DataFrames.jl? The challenge we have is that we have non-standard indexing (e.g. by column names) and that we want a broadcasted assignment to be able to add columns (in which case it will not be in place, but will have to allocate these new columns). |
Thanks. Though I'd avoid dispatching on |
All is done here. |
Here is a review of the current API (#1571 not included, but accounted for in the comments). I have added decision fields - sometimes I am pretty sure what to do. Sometimes it is a question. The biggest question I have is if we actually want to go for support of broadcasting when LHS is a
DataFrame
at all. Maybe it is OK to leave the current mechanics we have as aDataFrame
is not anAbstractArray
. I am not really sure what is best. The benefit of adding broadcasting is that we are more consistent with Base in terms of notation (now whensetindex!
is called broadcasting is done implicitly).setindex!(df::DataFrame, v::AbstractVector, col_ind::ColumnIndex)
setindex!(df::DataFrame, v, col_ind::ColumnIndex)
setindex!(df::DataFrame, new_df::DataFrame, col_inds::AbstractVector{Bool})
setindex!(df::DataFrame, new_df::DataFrame, col_inds::AbstractVector{<:ColumnIndex})
setindex!(df::DataFrame, v::AbstractVector, col_inds::AbstractVector{Bool})
setindex!(df::DataFrame, val::Any, col_inds::AbstractVector{Bool})
setindex!(df::DataFrame, val::Any, col_inds::AbstractVector{<:ColumnIndex})
setindex!(df::DataFrame, v, ::Colon)
setindex!(df::DataFrame, v::Any, row_ind::Real, col_ind::ColumnIndex)
setindex!(df::DataFrame, v::Any, row_ind::Real, col_inds::AbstractVector{Bool})
setindex!(df::DataFrame, v::Any, row_ind::Real, col_inds::AbstractVector{<:ColumnIndex})
setindex!(df::DataFrame, new_df::DataFrame, row_ind::Real, col_inds::AbstractVector{Bool})
setindex!(df::DataFrame, new_df::DataFrame, row_ind::Real, col_inds::AbstractVector{<:ColumnIndex})
setindex!(df::DataFrame, v::AbstractVector, row_inds::AbstractVector{Bool}, col_ind::ColumnIndex)
setindex!(df::DataFrame, v::AbstractVector, row_inds::AbstractVector{<:Real}, col_ind::ColumnIndex)
setindex!(df::DataFrame, v::Any, row_inds::AbstractVector{Bool}, col_ind::ColumnIndex)
setindex!(df::DataFrame, v::Any, row_inds::AbstractVector{<:Real}, col_ind::ColumnIndex)
setindex!(df::DataFrame, new_df::DataFrame, row_inds::AbstractVector{Bool}, col_inds::AbstractVector{Bool})
setindex!(df::DataFrame, new_df::DataFrame, row_inds::AbstractVector{Bool}, col_inds::AbstractVector{<:ColumnIndex})
setindex!(df::DataFrame, new_df::DataFrame, row_inds::AbstractVector{<:Real}, col_inds::AbstractVector{Bool})
setindex!(df::DataFrame, new_df::DataFrame, row_inds::AbstractVector{<:Real}, col_inds::AbstractVector{<:ColumnIndex})
setindex!(df::DataFrame, v::AbstractVector, row_inds::AbstractVector{Bool}, col_inds::AbstractVector{Bool})
setindex!(df::DataFrame, v::AbstractVector, row_inds::AbstractVector{Bool}, col_inds::AbstractVector{<:ColumnIndex})
setindex!(df::DataFrame, v::AbstractVector, row_inds::AbstractVector{<:Real}, col_inds::AbstractVector{Bool})
setindex!(df::DataFrame, v::AbstractVector, row_inds::AbstractVector{<:Real}, col_inds::AbstractVector{<:ColumnIndex})
setindex!(df::DataFrame, v::Any, row_inds::AbstractVector{Bool}, col_inds::AbstractVector{Bool})
setindex!(df::DataFrame, v::Any, row_inds::AbstractVector{Bool}, col_inds::AbstractVector{<:ColumnIndex})
setindex!(df::DataFrame, v::Any, row_inds::AbstractVector{<:Real}, col_inds::AbstractVector{Bool})
setindex!(df::DataFrame, v::Any, row_inds::AbstractVector{<:Real}, col_inds::AbstractVector{<:ColumnIndex})
setindex!(df::DataFrame, new_df::DataFrame, row_inds::Colon, col_inds::Colon=Colon())
setindex!(df::DataFrame, v, ::Colon, ::Colon)
setindex!(df::DataFrame, v, row_inds, ::Colon)
setindex!(df::DataFrame, v, ::Colon, col_inds)
I am not sure what is the best way to work with this as the list is really long and there are many interconnected decisions to be made.
@nalimilan, @coreywoodfield:
In particular in relation to #1571 note that current
setindex!
mechanics uses only position - not name of RHS if RHS is aDataFrame
.The text was updated successfully, but these errors were encountered: