Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TimeSeries slicing breaks remove_row() #10732

Open
dhomeier opened this issue Sep 15, 2020 · 3 comments
Open

TimeSeries slicing breaks remove_row() #10732

dhomeier opened this issue Sep 15, 2020 · 3 comments

Comments

@dhomeier
Copy link
Contributor

dhomeier commented Sep 15, 2020

Description

A (row) slice of a TimeSeries returns a TimeSeries object with broken remove_row() property: the function can be called just once, on the next call either on the same or a different row in the table it fails with a ValueError.

Expected behavior

>>> from astropy.time import Time; from astropy.timeseries import TimeSeries
>>> ts = TimeSeries(time=Time(np.arange(1500, 1510), format='mjd'))
>>> ts.remove_row(2)
>>> ts.remove_row(2)
>>> ts.remove_row(1)
>>> ts
<TimeSeries length=7>
 time 
object
------
1500.0
1504.0
1505.0
1506.0
1507.0
1508.0
1509.0
>>> ts = TimeSeries(time=Time(np.arange(1500, 1510), format='mjd'))
>>> ts_short = ts[3:8]
>>> ts_short.remove_row(2)
>>> ts_short
<TimeSeries length=4>
 time 
object
------
1503.0
1504.0
1506.0
1507.0
>>> ts_short.remove_row(2)

should remove again the (current) row 2, i.e. the mjd=1506.0 entry.

Actual behavior

>>> ts_short.remove_row(2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/sw/lib/python3.8/site-packages/astropy/table/table.py", line 2227, in remove_row
    self.remove_rows(index)
  File "/sw/lib/python3.8/site-packages/astropy/table/table.py", line 2275, in remove_rows
    index.remove_rows(row_specifier)
  File "/sw/lib/python3.8/site-packages/astropy/table/index.py", line 554, in remove_rows
    self.copy().remove_rows(row_specifier)
  File "/sw/lib/python3.8/site-packages/astropy/table/index.py", line 228, in remove_rows
    self.remove_row(row, reorder=False)
  File "/sw/lib/python3.8/site-packages/astropy/table/index.py", line 248, in remove_row
    raise ValueError(f"Could not remove row {row} from index")
ValueError: Could not remove row 2 from index
>>> ts_short.remove_row(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/sw/lib/python3.8/site-packages/astropy/table/table.py", line 2227, in remove_row
    self.remove_rows(index)
  File "/sw/lib/python3.8/site-packages/astropy/table/table.py", line 2275, in remove_rows
    index.remove_rows(row_specifier)
  File "/sw/lib/python3.8/site-packages/astropy/table/index.py", line 554, in remove_rows
    self.copy().remove_rows(row_specifier)
  File "/sw/lib/python3.8/site-packages/astropy/table/index.py", line 228, in remove_rows
    self.remove_row(row, reorder=False)
  File "/sw/lib/python3.8/site-packages/astropy/table/index.py", line 248, in remove_row
    raise ValueError(f"Could not remove row {row} from index")
ValueError: Could not remove row 1 from index

Steps to Reproduce

See above for Astropy 4.1rc2 and 4.2.dev591+g5fc518eb8.d20200909 (and per lightkurve/lightkurve#836 on 4.1rc1/Linux)

System Details

>>> import platform; print(platform.platform())
macOS-10.12.6-x86_64-i386-64bit
>>> import sys; print("Python", sys.version)
Python 3.8.5 (default, Aug 28 2020, 19:55:15) 
[Clang 9.0.0 (clang-900.0.39.2)]
>>> import numpy; print("Numpy", numpy.__version__)
Numpy 1.19.1
>>> import astropy; print("astropy", astropy.__version__)
astropy 4.1rc2
>>> import scipy; print("Scipy", scipy.__version__)
Scipy 1.5.2
>>> import matplotlib; print("Matplotlib", matplotlib.__version__)
Matplotlib 3.3.0
@i-am-b-soto
Copy link
Contributor

There appears to be an issue with the Index of the underlying Table not properly updating.
It appears we can do multiple removes, so long as the positions from the initial table and the spliced table matchup

For example:

        >>> ts = TimeSeries(time=Time(np.arange(1500, 1510), format='mjd'))
	>>> ts_short = ts[0:5]
	>>> ts_short.remove_row(4)
	>>> ts_short.remove_row(3)
	>>> ts_short.remove_row(2)
	>>> ts_short.remove_row(1)
        >>> ts_short
 time 
------
1500.0

@i-am-b-soto
Copy link
Contributor

i-am-b-soto commented Oct 23, 2020

Okay! I see two problems here.

  1. The columns of the Index get updated to reflect the columns of the Table after the first remove. But before that they keep the same column values as the parent data table. This causes an incorrect data row to be lost in the Index.

We could fix this by updating the columns of the SlicedIndex immidiately after its/(their) creation in Table._new_from_slice() to match the column values of the sliced Table.
i.e. we add these two lines at line 1199 in Table._new_from_slice()

            for index in newcol.info.indices:
                index.columns[index.col_position(col.info.name)] = col[slice_]
  1. In SortedArray.find_pos() there is this condition:
        if not self.unique:
        #    # consider the row value as well
            key = key + (data,)
            num_cols += 1

If the data isn't unique it will add the row number/ row id (confusingly labeled 'data' in this example) to the key, or the row to find. This is problematic because the row number + key being passed into this function will be different to the row number assigned to the key in the underlying SortedArray data. Since the data remains the same even after being 'sliced'. So this function will never find the 'key'. If you comment out those lines of code you can 'remove' without error. (Though the underlying data will be wrong without the fix mentioned above)

This I'm not quite sure what the fix for this one would be...

I'll play around with this more and post here if I find anything else

EDIT:
I think ultimately there's a design flaw here. SlicedIndex keeps the original indices' column values by design. To my understanding this makes taking a 'slice of a slice' possible (I'm looking at the test_col_get_slice test in table/tests/test_index.py). However when Table.remove_row() is called the indices' column values get replaced manually to the updated Table's columns (see Table.replace_cols()). It seems to be inconsistant behavior?

I'm hoping this explination all makes sense and might be useful.

@i-am-b-soto
Copy link
Contributor

Another work around:

SortedArray has the property cols. If we use this property to iterate over the columns, rather than the index.columns (so we iterate over the data columns and not the index's columns), we can get the desirable results. Namely, line 256 in index.remove_row:

if not self.data.remove(tuple([col[row] for col in self.data.cols]), row)
instead of
if not self.data.remove(tuple([col[row] for col in self.columns]), row)

Now of course, the problem with this is not all data engines have a 'cols' attribute

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants