New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix a bug where columns with dtype=object wouldn't be properly deep-copied using copy.deepcopy #15871
BUG: fix a bug where columns with dtype=object wouldn't be properly deep-copied using copy.deepcopy #15871
Conversation
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
👋 Thank you for your draft pull request! Do you know that you can use |
3db5b7f
to
59f9482
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to reviewers: both tests fail on main and are fixed with the same patch. To some extent, their redundancy is intentional, but I could drop their parametrization (or drop it for one of them) since it's a bit heavy handed: only 1 out of 4 scenarios was actually fixed, and the other 3 already worked as intended.
@pllim Do you see a reason not to backport this to 6.0.x ? |
It is a behavior change that is subtle. Some people might be unknowingly using this old behavior so backporting would break them. But if @taldcroft et al. think the risk is low and benefit of backporting outweighs the risk, then they are free to change the milestone. Hope this clarifies the matter! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'm wondering, though, if this goes a bit too far? I think Column
as an ndarray
subclass should work the same way as ndarray
.
My own sense is that we rely on users being able to do deepcopy(...)
and get the right result. For Column
, that is the case already, but for Table
, that needs an adjustment of __deepcopy__
. And then we can mention in the Table
docstring under copy
that if one has object columns and wants those copied too, then one should use copy.deepcopy(table)
(and also under Column.copy
and Table.copy
.
@taldcroft - what do you think?
assert c2 is not c1 | ||
assert c2[0] is not c1[0] | ||
|
||
c3 = table.Column(c1, copy=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if we go with my suggestion, for this case one would have assert c3[0] is c1[0]
for the object
column.
astropy/table/tests/test_column.py
Outdated
@pytest.mark.parametrize( | ||
"data", | ||
[ | ||
np.array([1], dtype=np.int32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe keep the test focussed just on object
? I'd do np.array([1], dtype=object)
and [object()]
only.
astropy/table/tests/test_table.py
Outdated
@pytest.mark.parametrize( | ||
"data", | ||
[ | ||
np.array([1], dtype=np.int32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too, I'd parametrize just some examples of object
.
p.s. The test itself would be independent of my suggested change.
Thanks for your input, I didn't realize that |
59f9482
to
dcad1dd
Compare
Locally I still see 2 failing tests. Specifically it seems to be the ones that exercise copying a |
astropy/table/table.py
Outdated
@@ -3654,6 +3654,9 @@ def copy(self, copy_data=True): | |||
deepcopied regardless of the value for ``copy_data``. | |||
""" | |||
out = self.__class__(self, copy=copy_data) | |||
if copy_data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I still think we should only adjust __deepcopy__
- right now, Table.copy()
does the same thing as dict.copy()
(which makes a shallow copy of the values = the data), and I think that's fine.
That will probably remove all errors, but obviously means one has to add extra tests...
But perhaps good to get @taldcroft's opinion first!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake! If I understand correctly what you're suggesting is more or less this patch ?
diff --git a/astropy/table/table.py b/astropy/table/table.py
index 2c6e6c4a40..ca00fbdb69 100644
--- a/astropy/table/table.py
+++ b/astropy/table/table.py
@@ -3663,7 +3663,10 @@ class Table:
return out
def __deepcopy__(self, memo=None):
- return self.copy(True)
+ out = self.copy(True)
+ for name in out.colnames:
+ out[name] = deepcopy(out[name])
+ return out
def __copy__(self):
return self.copy(False)
There's probably some duplicated work with this version (but not more than with the current state of this PR), but I also note that it also hits the exact same 2 failures that I still don't quite get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I meant.
And let me if I can figure out the test...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot say I necessarily understand it completely, but the following does work:
out = self.copy(copy_data=False)
for name in out.colnames:
out.columns.__setitem__(name, deepcopy(self[name]), validated=True)
return out
Note that the strange line is just the very final line of replace_column
- the rest is not needed since in this case we know the data is OK. An alternative that works as well is,
out = self.copy(copy_data=False)
out._replace_cols({name: deepcopy(self[name]) for name in self.colnames})
return out
Maybe @taldcroft can advice on what would be the best procedure!?
EDIT: I use copy_data=False
since we are making a deep copy of the data right after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much ! I pushed the first approach just so everyone can see that it indeed passes all tests, but happy to change it again if Tom advises we use a different approach.
dcad1dd
to
77f870b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! From my side, this looks all good, but would like @taldcroft's opinion as well.
Let's gently remind @taldcroft about this one :) |
Sorry, I entirely missed this. I'll have a look but I'm stepping away for the afternoon now. One quick thing is with |
To answer my question, despite the documentation
This stems from #8404. It would appear in that PR that we entirely failed to update related docstrings about the behavior of |
out = self.copy(False) | ||
for name in out.colnames: | ||
out.columns.__setitem__(name, deepcopy(self[name]), validated=True) | ||
return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here you just need to add:
out.meta = deepcopy(self.meta)
astropy/table/tests/test_column.py
Outdated
) | ||
def test_deepcopy_object_column(data): | ||
# see https://github.com/astropy/astropy/issues/13435 | ||
c1 = table.Column(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are here, you should define meta = {1: object()}
and then ensure that the copy for meta
is deep as well.
astropy/table/tests/test_table.py
Outdated
) | ||
def test_deepcopy_object_column(data): | ||
# see https://github.com/astropy/astropy/issues/13435 | ||
t1 = Table({"a": data}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for Column
, add a meta
attribute and check the deepcopy.
We should take this opportunity to clean up the docs. In the
Update the
|
Thanks @taldcroft, I've added your suggestions to the batch ! |
ah, got a conflict... too bad, I'll just rebase and rewrite the history after all. |
(for reference, the conflict was with #16002) |
de5a1a4
to
a7cd7be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neutrinoceros - sorry for the long delay. This looks good now and I see that @mhvk also approved. It looks like this needs a rebase to get CI picking up the latest. I'll set it to squash-merge once CI passes.
…eep-copied using copy.deepcopy
Head branch was pushed to by a user without write access
a7cd7be
to
0f15259
Compare
rebased ! |
Done thanks!! |
Description
Fixes #13435