-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preventing problems with aliased columns #1885
Comments
Thank you for this report and detailed analysis. Actually it goes in hand with what @oxinabox calls for for some time now (and we keep patching it - see the new indexing rules and the fact that constructors copy columns by default). What we plan to do is:
My current thinking is to (and this would go to #1845 implementation):
CC @nalimilan, @oxinabox, @quinnj |
Thanks for the quick and comprehensive response! I meant to say that I did see #1845 when searching, but wasn't sure how closely related it was to this - good to know that they are. Dumb newbie question here, but why would we only check for data frame consistency before expensive operations? I'd normally assume that if it's not an expensive operation then we'd "have even more time" for these checks. Is it because non-expensive operations tend to be applied more often, and therefore the overhead of these checks would add up much more quickly? |
You do non-expensive operations in a loop usually like this:
adding a check to Also note that in your original code the "proper" way to create a new column is:
which avoids aliasing (this is of course does not solve the problem but I just wanted to note the intended pattern for such cases). |
OK, good to know. I was also wondering about the proper way of creating a new column, short of using copy(x1.b), so that's helpful. Thanks. |
see #1887 |
In your actual use case, did you call |
We have aliasing detection code already, as it was needed for broadcasting. |
FWIW, that's what we do by default in recent releases e.g. with |
@bkamins Aliasing detection when broadcasting only needs to check columns between data frames involved in the operation, just like the |
I agree (but my major point was that we do not need to detect aliasing anyway and aliasing detection is very expensive so we also do not want to do this - it is much cheaper to check afterwards if something got broken) |
Ah sorry I thought that in the OP the aliasing was between |
A note in the documentation for
append!()
says:Unfortunately, I only found this note, and understood it, after many hours of confusion and debugging (I have only been using Julia for a couple of months). I was getting strange errors that didn't seem to be related to
append!()
, and therefore took a while to track down. It turns out that appending a DataFrame to another DataFrame with aliased columns caused the problem. Here's a reproducible example:I know the note in the documentation technically warns users of this behavior, but many users who are less familiar with Julia, and aren't thinking of object references/copying and so forth, will also be very confused. I think it would help immensely to add some basic checks on DataFrames to ensure that their columns are consistent sizes or that there aren't problems created by aliased columns. These checks would go a long way towards protecting Julia's reputation of being safe and transparent.
I feel like there are a few options for resolving this, but I'm not sure what's preferred:
The text was updated successfully, but these errors were encountered: