Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to index table with iloc[] after dump/load table into pickle file #11332

Open
jorgemarpa opened this issue Feb 17, 2021 · 12 comments
Open

Comments

@jorgemarpa
Copy link

jorgemarpa commented Feb 17, 2021

EDIT: Temporary Workaround

See #11332 (comment)

Description

I am saving and loading a Table object into a pickle file, but after loading the table cannot be indexed using the iloc method. I found that problem is because after loading the table doesn't the attribute Table.primary_key

Steps to Reproduce

from astropy.table import Table
import pickle

t = Table([(1, 2, 3, 4), (10, 1, 9, 9)], names=('a', 'b'), dtype=['i8', 'i8'])
t.add_index('a')
print(t.iloc[2])   #--> [3, 9]

with open("t_test.pkl" , "wb") as f:
    pickle.dump(t, f, protocol=0)
    
t_ = pickle.load(open("t_test.pkl", "rb"))

print(t_.iloc[2])

leads to the following error:

-----------------------------
TypeErrorTraceback (most recent call last)
<ipython-input-173-7e3638d00d00> in <module>
     11 t_ = pickle.load(open("t_test.pkl", "rb"))
     12 
---> 13 print(t_.iloc[2])

~/.pyenv/versions/adap/lib/python3.8/site-packages/astropy/table/index.py in __getitem__(self, item)
    953         else:
    954             key = self.table.primary_key
--> 955         index = self.indices[key]
    956         rows = index.sorted_data()[item]
    957         table_slice = self.table[rows]

~/.pyenv/versions/adap/lib/python3.8/site-packages/astropy/table/index.py in __getitem__(self, item)
    809             raise IndexError(f"No index found for {item}")
    810 
--> 811         return super().__getitem__(item)
    812 
    813 

TypeError: list indices must be integers or slices, not NoneType

and here when accessing the primary_key attribute of the original and post-pickle table:

t.primary_key, t_.primary_key
# (('a',), None)

System Details

macOS-10.15.7-x86_64-i386-64bit
Python 3.8.6 (default, Jan 5 2021, 15:15:33)
[Clang 12.0.0 (clang-1200.0.32.28)]
Numpy 1.19.5
astropy 4.2
Scipy 1.6.0
Matplotlib 3.3.3

@github-actions
Copy link

Welcome to Astropy 👋 and thank you for your first issue!

A project member will respond to you as soon as possible; in the meantime, please double-check the guidelines for submitting issues and make sure you've provided the requested details.

If you feel that this issue has not been responded to in a timely manner, please leave a comment mentioning our software support engineer @embray, or send a message directly to the development mailing list. If the issue is urgent or sensitive in nature (e.g., a security vulnerability) please send an e-mail directly to the private e-mail feedback@astropy.org.

@pllim pllim added the table label Feb 17, 2021
@pllim
Copy link
Member

pllim commented Feb 17, 2021

Thanks for reporting this. I also see the same behavior with 4.3.dev. Adding t_.add_index('a') doesn't help either.

@pllim pllim added the Bug label Feb 17, 2021
@taldcroft
Copy link
Member

@jorgemarpa - with apologies, this is a known limitation of astropy Table indexing. The indexes are not saved in any of the supported formats such as pickle, FITS, HDF5, ECSV, etc. See #6925.

In my test right now it is possible to regenerate the index after pickling with t_.add_index('a'). I don't understand if this did not work for @pllim .

@pllim
Copy link
Member

pllim commented Feb 18, 2021

Huh... It didn't work for me but I can try again if you want. I did notice after I did that 3 times, I see multiple entries of the same stuff under t_.indices. 🤷

But if it works for everyone else, then maybe it is just me.

@taldcroft
Copy link
Member

@pllim - I must have made a mistake in my test because I am reproducing the failure you reported. The oddity is that it only fails if t has had an index created. Based on my recollection of how pickling works for tables this makes no sense, but obviously my recollection is not serving me well. 😄

Here is my minimum failing example:

In [20]: from astropy.table import Table                                                            
In [21]: import pickle                                                                              
In [22]: t = Table([[1,2], [3,4]])                                                                  
In [23]: t.add_index('col0')                                                                        
In [24]: t2 = pickle.loads(pickle.dumps(t))                                                         
In [25]: t2.loc[1]                                                                                  
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: list indices must be integers or slices, not NoneType

@pllim
Copy link
Member

pllim commented Feb 18, 2021

@taldcroft , thanks for double-checking! Yes, that is consistent with the errors reported here.

@taldcroft
Copy link
Member

OK, just looked at the code. The indices which live on the columns are getting pickled, but the index object on the table is getting lost. This may be fixable...

@taldcroft
Copy link
Member

A workaround for now is to set the table primary_key attribute to a tuple with the column names comprising the index, e.g.

t_.primary_key = ('a',)

After doing this the t_.iloc[1] statement should succeed. The root problem is that the primary key is not pickled.

@jorgemarpa
Copy link
Author

Yes @taldcroft, I also found that after pickling the primary_key attribute is lost. So for now I was using that workaround to not crash my code.

@embray
Copy link
Member

embray commented Feb 22, 2021

I know this issue is about pickles, but I wonder about other formats (HDF5, FITS). Should table reader/writer interfaces get (optional) methods for reading/writing indices. Perhaps there should also be a warning when dumping tables with indices if they cannot be saved.

@taldcroft
Copy link
Member

taldcroft commented Feb 23, 2021

@embray - I've been thinking about this recently and it should not be too difficult, at least for the default SortedArray index type. This can happen after #11155 which adds some necessary infrastructure. When that is in place then the indices can be treated as column attributes that get flattened out to additional columns.

The sorted containers tree is a bit less obvious for serialization and I'm not immediately sure how to handle that.

@toihr
Copy link

toihr commented Mar 30, 2021

Im not sure if this has something todo with it but this problem also happens when you create a new TimeSeries out of an old one like so
ts2 = ts["time","Test"] ts2.iloc[:]
this leads to the same error. I have figured out that for some reason when doing this it doesnt assign a new primary key so if you set that by hand things work but before its problematic.
Hope this helps if its not helpfull to this conversation then i apologize it just seemed like a closely related problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants