Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a row to an astropy Table is slow #7629

Open
taldcroft opened this issue Jul 9, 2018 · 2 comments
Open

Adding a row to an astropy Table is slow #7629

taldcroft opened this issue Jul 9, 2018 · 2 comments

Comments

@taldcroft
Copy link
Member

From a response in the astropy performance survey.

@taldcroft
Copy link
Member Author

Quick line profiler result shows 2/3 of the time is spent inserting values into the respective columns. So realistically one might get a factor of two speed-up in row if inserting could be made faster.

 2576         1          9.0      9.0      0.3          columns = self.TableColumns()
  2577         1          2.0      2.0      0.1          try:
  2578                                                       # Insert val at index for each column
  2579        11         30.0      2.7      0.9              for name, col, val, mask_ in zip(colnames, self.columns.values(), vals, mask):
  2580                                                           # If the new row caused a change in self.ColumnClass then
  2581                                                           # Column-based classes need to be converted first.  This is
  2582                                                           # typical for adding a row with mask values to an unmasked table.
  2583        10         34.0      3.4      1.0                  if isinstance(col, Column) and not isinstance(col, self.ColumnClass):
  2584                                                               col = self.ColumnClass(col, copy=False)
  2585                                           
  2586        10       2290.0    229.0     66.2                  newcol = col.insert(index, val, axis=0)
  2587        10         23.0      2.3      0.7                  if not isinstance(newcol, BaseColumn):
  2588                                                               newcol.info.name = name
  2589                                                               if self.masked:
  2590                                                                   newcol.mask = FalseArray(newcol.shape)
  2591                                           
  2592        10         20.0      2.0      0.6                  if len(newcol) != N + 1:
  2593                                                               raise ValueError('Incorrect length for column {0} after inserting {1}'
  2594                                                                                ' (expected {2}, got {3})'
  2595                                                                                .format(name, val, len(newcol), N + 1))
  2596        10        472.0     47.2     13.6                  newcol.info.parent_table = self
  2597                                           
  2598                                                           # Set mask if needed
  2599        10         28.0      2.8      0.8                  if self.masked:
  2600                                                               newcol.mask[index] = mask_
  2601                                           
  2602        10         46.0      4.6      1.3                  columns[name] = newcol
  2603                                           
  2604                                                       # insert row in indices
  2605         1        144.0    144.0      4.2              for table_index in self.indices:
  2606                                                           table_index.insert_row(index, vals, self.columns.values())
  2607                                           
  2608                                                   except Exception as err:
  2609                                                       raise ValueError("Unable to insert row because of exception in column '{0}':\n{1}"
  2610                                                                        .format(name, err))
  2611                                                   else:
  2612         1        307.0    307.0      8.9              self._replace_cols(columns)
  2613                                           
  2614                                                       # Revert groups to default (ungrouped) state
  2615         1          2.0      2.0      0.1              if hasattr(self, '_groups'):
  2616                                                           del self._groups

@astropy astropy deleted a comment from geekypathak21 Jul 6, 2019
@astropy astropy deleted a comment from geekypathak21 Jul 6, 2019
@taldcroft
Copy link
Member Author

One thing to note is that for a table with 25 columns, adding a row with astropy Table is about 3 times faster than pandas DataFrame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants