New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No way to efficiently add many rows to astropy.table.Table #9212
Comments
If this is for use within the astropy
Thinking about the idea of |
For astropy,utils.iers this isn't strictly necessary, or even (arguably) a good idea - we have a new table with the right coiumns already, so I just delete all the columns and add the new columns all at once. No need to have incompatible column lengths at any point. Plus it's not guaranteed that the old data won't change. (But it is what got me looking for insert_rows.) But yes, providing the flexibility of insert_row is cumbersome and maybe not essential. Are tables necessarily one-dimensional, or can a "row" be an array of values per column? This shape validation, including broadcasting, definitely complicates numpy's insert function. |
A table is essentially two-dimensional in the sense that the table "shape" must be M rows by N columns where M and N are scalar integers. Columns can have any shape so long as col.shape[0] is M. In addition there are mixin columns (e.g. Time) which are not numpy subclass arrays but do provide an |
Ah, that does complicate things - if a column is higher-dimensional we're going to have to be careful with broadcasting and input validation. If it's an arbitrary subtype, we might have to fall back on the present method (use insert() a lot), though duck typing might save the day. |
If you want to modify a Table in-place, so that anyone who holds the Table object will see your changes, you can use add_row to add a single row. If you want to add multiple rows, for example in astropy.utils.iers.IERS_Auto when a new table is available, you have to call add_row multiple times, producing a reallocation every time. The same is true for insert_row, but there is remove_rows that can remove multiple rows at once. I suggest new methods insert_rows and add_rows (which just calls insert_rows).
As of numpy 1.8, insert can do multiple insertions at once, but even before that the code is just a matter of input validation; Columns may be freely reallocated, and are in insert_row, so users already cannot hang on to Column objects and expect to see changes.
Currently if you want to make major changes to a table without creating a new table object, you're probably best removing all its columns and then adding a new set of columns at the new, longer length.
The text was updated successfully, but these errors were encountered: