Avoid aliasing columns on assignment #1052

TotalVerb · 2016-09-04T20:35:59Z

A Stack Overflow user has reported behaviour that I thought was strange, with regard to aliasing between columns: http://stackoverflow.com/questions/39320783/julia-dataframe-changing-one-cell-changes-entire-row

This makes upgrade_vector copying and also unhoists the broadcasting, to remove aliasing. Of course, the performance of these operations will be much worse.

Fixes #1051.

nalimilan · 2016-09-04T20:42:01Z

Thanks! Indeed, it's more consistent to always return a newly-allocated vector from upgrade_vector.

ararslan · 2016-09-04T20:50:12Z

Looks like this is causing this line to fail on 0.4:

df = DataFrame(A = Array(String, 3))

andreasnoack · 2016-09-04T20:58:17Z

Two comments:

I don't think we should copy when it's not necessary. If we follow that policy everywhere we'll end up as inefficient as R. The single change in line 414 seems to be sufficient to fix the issue without introducing copying anywhere else. See also the discussion in implement copy(::Void) = nothing JuliaLang/julia#15546. As I read that issue, the general view is that aliasing can depend on types but not values and here it's about types.
Slighty off topic, sorry. As mentioned elsewhere, I think we should get rid this kind of 2d indexing of DataFrames. It's bad R heritage, it's inefficient and it leads to issues like the one we are fixing here.

TotalVerb · 2016-09-04T21:01:50Z

@andreasnoack I'll separate out the single change in line 414 and keep this open for discussion.

nalimilan · 2016-09-05T07:56:25Z

Actually, I find it weird that df[:, 1:end]=0.0 replaces the original columns with new vectors. I'd rather have it modify existing columns in place (and fail when conversion isn't possible), to be consistent with what happens with standard arrays.

We could also deprecate this indexing method, but probably better keep this discussion separate.

andreasnoack · 2016-09-05T14:00:03Z

Good point. I think you are right here. It is how we usually do this and it is also likely to be more efficient.

TotalVerb · 2016-09-20T19:38:36Z

So what's the decision on setting columns to vectors? Should that copy the data into the existing vector, or replace (and thus alias)?

andreasnoack · 2016-09-20T20:40:01Z

I'd vote for copying the data into the existing and failing if not possible.

TotalVerb · 2017-02-27T02:03:11Z

Closing, as it seems like the redesign will make a lot of these issues redundant.

nalimilan · 2017-02-27T13:05:17Z

I don't think we have precise plans to fix this in a different way, and DataFrames still exists, so better keep it open.

TotalVerb · 2017-02-27T16:38:48Z

This PR does not implement the suggestion of @andreasnoack. I suspect that suggestion can be worded as simply

---
 src/dataframe/dataframe.jl | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/dataframe/dataframe.jl b/src/dataframe/dataframe.jl
index 4ce59b3..2ec54aa 100644
--- a/src/dataframe/dataframe.jl
+++ b/src/dataframe/dataframe.jl
@@ -325,15 +325,15 @@ function insert_single_column!(df::DataFrame,
     end
     if haskey(index(df), col_ind)
         j = index(df)[col_ind]
-        df.columns[j] = dv
+        @compat df.columns[j] .= dv
     else
         if typeof(col_ind) <: Symbol
             push!(index(df), col_ind)
-            push!(df.columns, dv)
+            push!(df.columns, upgrade_vector(dv))
         else
             if isnextcol(df, col_ind)
                 push!(index(df), nextcolname(df))
-                push!(df.columns, dv)
+                push!(df.columns, upgrade_vector(dv))
             else
                 error("Cannot assign to non-existent column: $col_ind")
             end
@@ -380,7 +380,7 @@ end
 function Base.setindex!(df::DataFrame,
                 v::AbstractVector,
                 col_ind::ColumnIndex)
-    insert_single_column!(df, upgrade_vector(v), col_ind)
+    insert_single_column!(df, v, col_ind)
 end
 
 # df[SingleColumnIndex] = Single Item (EXPANDS TO NROW(DF) if NCOL(DF) > 0)
-- 
2.9.3

which is a more minimal change than this PR, albeit possibly disruptive.

nalimilan · 2017-03-06T20:49:45Z

Could you open a PR against DataTables? Disruptive changes are appreciated there. ;-)

quinnj · 2017-09-07T18:08:08Z

upgrade_vector no longer exists, so I don't think this is an issue anymore.

nalimilan · 2017-09-07T19:55:39Z

We still have upgrade_scalar, and the tests added in the PR still fail on master.

nalimilan · 2018-09-22T11:31:10Z

See #1528.

Avoid aliasing columns on assignment

7afce9f

Fix v0.4 test failure

34c80af

TotalVerb mentioned this pull request Sep 4, 2016

Avoid aliasing scalars on assignment #1053

Closed

TotalVerb closed this Feb 27, 2017

nalimilan reopened this Feb 27, 2017

quinnj closed this Sep 7, 2017

nalimilan reopened this Sep 7, 2017

nalimilan mentioned this pull request Sep 22, 2018

Avoid aliasing columns when assigning vector #1528

Merged

nalimilan closed this Sep 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid aliasing columns on assignment #1052

Avoid aliasing columns on assignment #1052

TotalVerb commented Sep 4, 2016 •

edited by nalimilan

nalimilan commented Sep 4, 2016

ararslan commented Sep 4, 2016

andreasnoack commented Sep 4, 2016 •

edited

TotalVerb commented Sep 4, 2016

nalimilan commented Sep 5, 2016

andreasnoack commented Sep 5, 2016

TotalVerb commented Sep 20, 2016

andreasnoack commented Sep 20, 2016

TotalVerb commented Feb 27, 2017

nalimilan commented Feb 27, 2017

TotalVerb commented Feb 27, 2017

nalimilan commented Mar 6, 2017

quinnj commented Sep 7, 2017

nalimilan commented Sep 7, 2017

nalimilan commented Sep 22, 2018

Avoid aliasing columns on assignment #1052

Avoid aliasing columns on assignment #1052

Conversation

TotalVerb commented Sep 4, 2016 • edited by nalimilan

nalimilan commented Sep 4, 2016

ararslan commented Sep 4, 2016

andreasnoack commented Sep 4, 2016 • edited

TotalVerb commented Sep 4, 2016

nalimilan commented Sep 5, 2016

andreasnoack commented Sep 5, 2016

TotalVerb commented Sep 20, 2016

andreasnoack commented Sep 20, 2016

TotalVerb commented Feb 27, 2017

nalimilan commented Feb 27, 2017

TotalVerb commented Feb 27, 2017

nalimilan commented Mar 6, 2017

quinnj commented Sep 7, 2017

nalimilan commented Sep 7, 2017

nalimilan commented Sep 22, 2018

TotalVerb commented Sep 4, 2016 •

edited by nalimilan

andreasnoack commented Sep 4, 2016 •

edited