BUG [table]: fix a bug where column names would be lost when instanciating Table from a list of Row objects #15735

neutrinoceros · 2023-12-14T17:39:45Z

Description

Fixes #5923
This is a somewhat inelegant minimal patch; I'll be happy to iterate over it to improve quality if requested !

ping @hamogu @taldcroft

By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

github-actions · 2023-12-14T17:40:14Z

github-actions · 2023-12-14T17:40:19Z

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

pllim · 2023-12-14T17:42:47Z

astropy/table/table.py

@@ -593,7 +620,7 @@ class Table:
        Copy the input data. If the input is a Table the ``meta`` is always
        copied regardless of the ``copy`` parameter.
        Default is True.
-    rows : numpy ndarray, list of list, optional
+    rows : numpy ndarray, list of list, sequence of ``Row``, optional


Addition of new accepted type as input also feel like a feature, so I didn't mark this as backport but Table maintainers can always change that if they want.

Thanks!

Initializing from a sequence of Row was always intended to work, so this is just documenting that correctly (and fixing a bug).

So, @taldcroft , should we backport?

Yes on the backport in concept, though to be honest I don't know where it will be backported. I thought there is only 6.0.1 which is the next bugfix release forward from 6.0.0.

Also, of course, I haven't reviewed the code yet!

neutrinoceros · 2023-12-14T17:48:51Z

There's one failure I didn't catch locally. I'll get back to fix it in the morning, most likely.

neutrinoceros · 2023-12-14T17:53:45Z

astropy/table/table.py

+        return names
+
+    for row in rows[1:]:
+        if not isinstance(row, Row) or len(row.colnames) != len(names):


note to reviewers: I'm assuming that passing rows of different sizes would raise an exception sooner or later during initialization, but I think this function should be as resilient as possible, so raising that exception is not its responsibility.

neutrinoceros · 2023-12-14T17:55:55Z

astropy/table/tests/test_table.py

+    t1["a"] = [1, 2, 3]
+    t1["b"] = [2.0, 3.0, 4.0]
+
+    rows = [row for row in t1]  # noqa: C416


I used a noqa comment here to keep this intention of that line very clear. Otherwise, the linter would push for
rows = list(t1), which is equivalent, but a bit more opaque in my opinion.

I could also create a list of Rows manually. Here I just took the example from @astrofrog for easier comparison between the test and its linked issue.

hamogu · 2023-12-15T14:10:21Z

We should backport.

hamogu · 2023-12-15T14:11:21Z

Since I don't review many PRs, I don't know what I need tp do about that though. Set the backport label?

pllim · 2023-12-15T14:13:06Z

Change the milestone and set the backport label, yes. I already did it here. Thanks for the feedback!

taldcroft · 2023-12-18T19:48:06Z

@neutrinoceros - I did a little digging and noticed a solution that has a much smaller footprint, in fact it reduces the line count. Basically this relies on Row being mostly dict-like, though sadly list(row) gives the values not the keys. (If we could go back 11 years...).

This also fixes some old inelegant code for getting the column order which was written way before Python had an ordered dict. It's very easy now to get the column order as simply the order in which they appear in successive rows. The previous alphabetical sort was not very nice.

One slight unfortunate thing about this solution is that it could change the column ordering in code so it would count as an API change for the unusual circumstance where the first row does not have all the columns. So I think this would need to wait for the next release, but the original issue is 6 years old so it can wait a little longer.

diff --git a/astropy/table/table.py b/astropy/table/table.py
index ac5a06d791..65631d9b0a 100644
--- a/astropy/table/table.py
+++ b/astropy/table/table.py
@@ -205,9 +205,9 @@ def _get_names_from_list_of_dict(rows):
     if rows is None:
         return None
 
-    names = set()
+    names = dict()
     for row in rows:
-        if not isinstance(row, Mapping):
+        if not isinstance(row, (Mapping, Row)):
             return None
         names.update(row)
     return list(names)
@@ -1169,17 +1169,7 @@ class Table:
         MISSING = object()
 
         # Gather column names that exist in the input `data`.
-        names_from_data = set()
-        for row in data:
-            names_from_data.update(row)
-
-        if set(data[0].keys()) == names_from_data:
-            names_from_data = list(data[0].keys())
-        else:
-            names_from_data = sorted(names_from_data)
-
-        # Note: if set(data[0].keys()) != names_from_data, this will give an
-        # exception later, so NO need to catch here.
+        names_from_data = _get_names_from_list_of_dict(data)
 
         # Convert list of dict into dict of list (cols), keep track of missing
         # indexes and put in MISSING placeholders in the `cols` lists.

neutrinoceros · 2023-12-18T20:46:39Z

@taldcroft thank you so much, I agree your solution is much more elegant and it still passes the test, so I went ahead and replaced mine with it (with you as a "co-author") !

pllim · 2023-12-19T21:59:28Z

As per Simon's recommendation, I moved milestone to 7.0, added label to request What's New entry, and turned this into draft so we don't merge before v6.1.

pllim · 2023-12-19T22:00:37Z

Re: #15735 (comment)

@saimn , I have not been super careful about API change and v6.1. Do we have to revert anything in https://github.com/astropy/astropy/pulls?q=is%3Apr+is%3Amerged+label%3A%22API+change%22+milestone%3Av6.1.0

hamogu · 2023-12-19T22:09:12Z

I think pushing this to 7.0 is going too far. See #15735 (comment)
This was always intended to work -in my view that fact that it did not is a bug that's fixed here. In that sense it's similar to #15774 - it breaks stuff if you relied on undocumented, uncodified behavior that the was not intended. In the same vain, I don't think a what'-s new is justified for this bug fix.

@taldcroft ?

neutrinoceros · 2023-12-20T08:44:28Z

it breaks stuff if you relied on undocumented, uncodified behavior that the was not intended.

@hamogu not exactly: the behaviour that's changed here was documented, even though it wasn't ideal (and it's a side effect from @taldcroft's patch that I adopted, not it's primary goal)
In that sense, I agree that it makes sense to push it to 6.1 in this kind of minor breakage was acceptable (it sounds like they're not) or 7.0 otherwise.

My own patch was backward compatible, albeit a lot less elegant than @taldcroft , so I think I can propose a way forward that should be satisfactory to all parties:

reinstate my own backward-compatible patch that just fixes the targeted bug (including a regression test) and move back this PR to 6.0.1
open a separate PR with @taldcroft's cleanup patch, aimed at 7.0

@saimn I am not sure how to go about adding a deprecation warning in this case: ideally users should have some sort of control to opt-in the new behaviour before it becomes the default, so they're able to dodge the warning once they are aware of it. In this case in do not see how to provide such an interface without adding a flag to Table.__init__'s signature, but the flag itself would then need to be deprecated too. All in all, is it really worth it ? I think there's a case to be had that a clean (documented !) break in 7.0 would help a lot with maintainability.

saimn · 2023-12-20T09:30:43Z

@hamogu - From what I understand, and @neutrinoceros confirms it above, there is both a bugfix (supporting list of Row) and an API change (ordering of the columns, which affects also instantiating from a list of dict). If the second part can be avoided then the bugfix can even go to 6.0.1, and maybe the API change can be delayed to 7.0, with a warning if possible.

@neutrinoceros - Not sure for the warning, it also depends if it makes sense to have an option to keep the old behavior. One option is to issue the warning when columns are sorted, stating that the default behavior will change in 7.0, and people can filter the warning if they want. Or add an option to choose the sort behavior which would allow to force the old behavior if it make sense to do so ?

neutrinoceros · 2023-12-20T09:58:36Z

Or add an option to choose the sort behavior which would allow to force the old behavior if it make sense to do so ?

What's great about @taldcroft's patch is that it decreases complexity. If we were to support multiple behaviours then complexity would go back up again, and I'm honestly not sure it's worth it.

taldcroft · 2023-12-20T10:21:36Z

This is exactly the sort of situation I was thinking about in discussions for APE-21 which resulted in the statement about being pragmatic. Historically, API changes like this have been common in minor releases. Although we justified them by pointing to the LTS, in practice the project recognized that very few people were actually using the LTS.

Breakage could only occur for a pretty unusual situation:

Initializing a table from a list of dict where the first row does not contain all of the columns.
Downstream code that depends on the column ordering.

The pragmatic part is accepting that this situation can occur but it is unlikely. Not only does (1) need to be true, but it needs to always be true for all input data sets. Basically the current code is somewhat randomly changing the column ordering based on an arbitrary test.

It is true that we could test for situation (1), issue a warning and sort the columns alphabetically. But then users in that situation need to modify their code anyway to suppress the warning. And if they want to permanently keep alphabetical ordering they have to add code to do that sorting so it still works after 7.0, and then they are left with orphaned code to suppress the warning. Users that want the more natural ordering now would be out of luck unless we add some new configuration option.

This is just a LOT of complexity for a very little gain.

hamogu · 2023-12-20T11:52:23Z

Like #15774 this sounds like space-bar heating: https://xkcd.com/1172/. I don't think we have to backport, but I suggest a 6.1 milestone. We also don't have to do the extra work of breaking this up into two PRs.

taldcroft · 2023-12-20T12:21:19Z

In my day job (Chandra Flight Director dealing directly with NASA) I deal with process control authority, A LOT. In this case, there is a delegation of authority:

CoCo has ultimate authority for the release contents.
CoCo does not have time to deal with each PR, so they delegate to the release managers.
Release managers do not have time or (in many cases) the domain knowledge to assess each PR, so they delegate to the package maintainers.
Package maintainers normally have authority to make a decision about the impact of an API change and the target milestone.

Only in unusual circumstances would the release managers or the CoCo override the recommendation of the package maintainer(s). Of course a healthy discussion is always encouraged!

saimn · 2023-12-20T13:59:58Z

Not sure I understand where this discussion is going but, sure in the end it's up to the maintainers to decide.
But on the other side the change of philosophy of the release cycle is recent and it's easy to forget (e.g. #14977) that we decided to avoid breaking changes in minor versions. It's sometimes difficult to avoid, or sometimes justified to break something in order to fix a bug, fine.
Here the first version was backward compatible, and creating a Table from a list of dict is pretty common I guess, so it seems reasonable to ask if we can find another way to proceed.

neutrinoceros · 2023-12-20T14:23:08Z

It's not up to me to decide and ultimately I'm happy to re-iterate as long as it takes to get everyone happy but I think that going with the patch as is and aiming it as 6.1 is a reasonable way forward and indeed falls into the "pragmatic" kind of low impact breakage that APE 21 defines as acceptable for a minor release.

taldcroft · 2023-12-20T14:47:03Z

@saimn - agreed to try avoiding API changes, I'm definitely 100% behind not breaking code unnecessarily. That said, in this case we have talked it through and I believe the complexity that is needed to maintain perfect back-compatibility outweighs the potential impact. Creating Tables from list of dict is common, but having both (1) and (2) from above is not.

taldcroft

@neutrinoceros - this is converging! I made a number of suggestions on the docs. The code and tests look fine. I think the CI failures are just doc tests (at least from the first one) but you should check them all.

docs/table/construct_table.rst

docs/whatsnew/6.1.rst

neutrinoceros · 2024-02-10T07:46:53Z

rebased to resolve a merge conflict

…ble from a list of Row objects Co-authored-by: Tom Aldcroft <taldcroft@gmail.com>

neutrinoceros · 2024-04-30T10:17:32Z

rebased again to resolve conflicts. This time I also squashed my commits to minimize the number of conflicts I need to resolve in a single rebase.

github-actions bot added table visualization labels Dec 14, 2023

pllim added this to the v6.1.0 milestone Dec 14, 2023

pllim added the Bug label Dec 14, 2023

pllim reviewed Dec 14, 2023

View reviewed changes

neutrinoceros marked this pull request as ready for review December 14, 2023 17:43

neutrinoceros requested a review from taldcroft as a code owner December 14, 2023 17:43

neutrinoceros changed the title ~~BUG: fix a bug where column names would be lost when instanciating Table from a list of Row objects~~ BUG [table]: fix a bug where column names would be lost when instanciating Table from a list of Row objects Dec 14, 2023

neutrinoceros marked this pull request as draft December 14, 2023 17:48

neutrinoceros commented Dec 14, 2023

View reviewed changes

neutrinoceros force-pushed the table/bug/table_from_rows branch from b83cb50 to 2381367 Compare December 14, 2023 17:56

pllim removed the visualization label Dec 14, 2023

neutrinoceros force-pushed the table/bug/table_from_rows branch from 2381367 to e3386c8 Compare December 15, 2023 09:46

neutrinoceros marked this pull request as ready for review December 15, 2023 10:23

pllim modified the milestones: v6.1.0, v6.0.1 Dec 15, 2023

pllim added the 💤 backport-v6.0.x on-merge: backport to v6.0.x label Dec 15, 2023

neutrinoceros force-pushed the table/bug/table_from_rows branch 2 times, most recently from 715ae5a to 3bea311 Compare December 18, 2023 20:45

taldcroft added the API change PRs and issues that change an existing API, possibly requiring a deprecation period label Dec 19, 2023

taldcroft removed this from the v6.0.1 milestone Dec 19, 2023

pllim added the whatsnew-needed label Dec 19, 2023

pllim marked this pull request as draft December 19, 2023 21:58

hamogu removed the whatsnew-needed label Dec 20, 2023

hamogu modified the milestones: v7.0.0, v6.1.0 Dec 20, 2023

neutrinoceros force-pushed the table/bug/table_from_rows branch from 57b4c3a to cd2692a Compare December 20, 2023 18:41

taldcroft requested changes Dec 21, 2023

View reviewed changes

taldcroft marked this pull request as ready for review December 21, 2023 12:13

neutrinoceros force-pushed the table/bug/table_from_rows branch from 660e796 to c02340a Compare December 21, 2023 12:35

taldcroft added the whatsnew-needed label Dec 22, 2023

neutrinoceros force-pushed the table/bug/table_from_rows branch from dbba51f to 1545980 Compare February 10, 2024 07:46

astrofrog modified the milestones: v6.1.0, v7.0.0 Apr 4, 2024

neutrinoceros and others added 2 commits April 30, 2024 12:15

TST: add a regression test for bug 5923

f7a32a9

BUG: fix a bug where column names would be lost when instanciating Ta…

7339e2b

…ble from a list of Row objects Co-authored-by: Tom Aldcroft <taldcroft@gmail.com>

neutrinoceros force-pushed the table/bug/table_from_rows branch from 53ab42e to 7339e2b Compare April 30, 2024 10:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG [table]: fix a bug where column names would be lost when instanciating Table from a list of Row objects #15735

BUG [table]: fix a bug where column names would be lost when instanciating Table from a list of Row objects #15735

neutrinoceros commented Dec 14, 2023

github-actions bot commented Dec 14, 2023

github-actions bot commented Dec 14, 2023

pllim Dec 14, 2023

taldcroft Dec 15, 2023

pllim Dec 15, 2023

taldcroft Dec 18, 2023

neutrinoceros commented Dec 14, 2023

neutrinoceros Dec 14, 2023

neutrinoceros Dec 14, 2023

neutrinoceros Dec 15, 2023

hamogu commented Dec 15, 2023

hamogu commented Dec 15, 2023

pllim commented Dec 15, 2023

taldcroft commented Dec 18, 2023

neutrinoceros commented Dec 18, 2023

pllim commented Dec 19, 2023

pllim commented Dec 19, 2023

hamogu commented Dec 19, 2023

neutrinoceros commented Dec 20, 2023

saimn commented Dec 20, 2023

neutrinoceros commented Dec 20, 2023

taldcroft commented Dec 20, 2023 •

edited

hamogu commented Dec 20, 2023

taldcroft commented Dec 20, 2023

saimn commented Dec 20, 2023

neutrinoceros commented Dec 20, 2023

taldcroft commented Dec 20, 2023 •

edited

taldcroft left a comment

neutrinoceros commented Feb 10, 2024

neutrinoceros commented Apr 30, 2024

BUG [table]: fix a bug where column names would be lost when instanciating Table from a list of Row objects #15735

Are you sure you want to change the base?

BUG [table]: fix a bug where column names would be lost when instanciating Table from a list of Row objects #15735

Conversation

neutrinoceros commented Dec 14, 2023

Description

github-actions bot commented Dec 14, 2023

github-actions bot commented Dec 14, 2023

pllim Dec 14, 2023

Choose a reason for hiding this comment

taldcroft Dec 15, 2023

Choose a reason for hiding this comment

pllim Dec 15, 2023

Choose a reason for hiding this comment

taldcroft Dec 18, 2023

Choose a reason for hiding this comment

neutrinoceros commented Dec 14, 2023

neutrinoceros Dec 14, 2023

Choose a reason for hiding this comment

neutrinoceros Dec 14, 2023

Choose a reason for hiding this comment

neutrinoceros Dec 15, 2023

Choose a reason for hiding this comment

hamogu commented Dec 15, 2023

hamogu commented Dec 15, 2023

pllim commented Dec 15, 2023

taldcroft commented Dec 18, 2023

neutrinoceros commented Dec 18, 2023

pllim commented Dec 19, 2023

pllim commented Dec 19, 2023

hamogu commented Dec 19, 2023

neutrinoceros commented Dec 20, 2023

saimn commented Dec 20, 2023

neutrinoceros commented Dec 20, 2023

taldcroft commented Dec 20, 2023 • edited

hamogu commented Dec 20, 2023

taldcroft commented Dec 20, 2023

saimn commented Dec 20, 2023

neutrinoceros commented Dec 20, 2023

taldcroft commented Dec 20, 2023 • edited

taldcroft left a comment

Choose a reason for hiding this comment

neutrinoceros commented Feb 10, 2024

neutrinoceros commented Apr 30, 2024

taldcroft commented Dec 20, 2023 •

edited

taldcroft commented Dec 20, 2023 •

edited