Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with to_hdf5() #689

Closed
amnona opened this issue Jan 28, 2016 · 3 comments
Closed

problem with to_hdf5() #689

amnona opened this issue Jan 28, 2016 · 3 comments

Comments

@amnona
Copy link

amnona commented Jan 28, 2016

When trying to save a biom table with observation metadata, i get the following error:

/Users/amnon/Python/git/heatsequer/heatsequer/experiment/io.py in savetobiom(expdat, filename, format)
    581         if format=='hdf5':
    582                 with biom.util.biom_open(filename, 'w') as f:
--> 583                         tab.to_hdf5(f, "heatsequer")
    584         elif format=='json':
    585                 with open(filename,'w') as f:

/Users/amnon/anaconda/lib/python2.7/site-packages/biom/table.pyc in to_hdf5(self, h5grp, generated_by, compress, format_fs)
   3533                   self.ids(axis='observation'),
   3534                   self.metadata(axis='observation'),
-> 3535                   self.group_metadata(axis='observation'), 'csr', compression)
   3536         axis_dump(h5grp.create_group('sample'), self.ids(),
   3537                   self.metadata(), self.group_metadata(), 'csc', compression)

/Users/amnon/anaconda/lib/python2.7/site-packages/biom/table.pyc in axis_dump(grp, ids, md, group_md, order, compression)
   3505                     # Create the dataset for the current category,
   3506                     # putting values in id order
-> 3507                     formatter[category](grp, category, md, compression)
   3508
   3509             # Create the group for the group metadata

/Users/amnon/anaconda/lib/python2.7/site-packages/biom/table.pyc in vlen_list_of_str_formatter(grp, header, md, compression)
    272             continue
    273         value = np.asarray(m[header])
--> 274         data[i, :len(value)] = value
    275     # Change the None entries on data to empty strings ""
    276     data = np.where(data == np.array(None), "", data)

TypeError: len() of unsized object

This problem does not happen without the metadata added to the biom table.
The metadata is added using:
table.add_metadata(taxdict,axis='observation')
where taxdict is of the form:
taxdict[OBSID]={'taxonomy': 'unknown'}

note that to_json() works fine with this table, but later get the same error if try to convert this json file to hdf5.
Attached is the json file
test.txt

@wasade
Copy link
Member

wasade commented Jan 29, 2016

The expectation is that taxonomy is a list of str. The JSON format is much more flexible on metadata, and this is a known issue. Work on refactoring the HDF5 formatters and parsers is deferred until the Table migrates to skbio.

09:57:08 (daniel@sandbar):~/Downloads> t = load_table('test.txt')

09:57:21 (daniel@sandbar):~/Downloads> md = {i: {'taxonomy': [d['taxonomy']]} for i, d in zip(t.ids(axis='observation'), t.metadata(axis='observation'))}

09:57:31 (daniel@sandbar):~/Downloads> t.add_metadata(md, axis='observation')

09:57:39 (daniel@sandbar):~/Downloads> f = h5py.File('baz.txt', 'w')

09:57:46 (daniel@sandbar):~/Downloads> t.to_hdf5(f, 'asd')

09:57:52 (daniel@sandbar):~/Downloads> f.close()

@wasade wasade closed this as completed Jan 29, 2016
@amnona
Copy link
Author

amnona commented Jan 29, 2016

Cool. That explains it :)
Maybe worth updating the doc for the add_metadata() function to state that?

Thanks!
Amnon

On Fri, Jan 29, 2016 at 9:58 AM, Daniel McDonald notifications@github.com
wrote:

The expectation is that taxonomy is a list of str. The JSON format is
much more flexible on metadata, and this is a known issue. Work on
refactoring the HDF5 formatters and parsers is deferred until the Table
migrates to skbio.

09:57:08 (daniel@sandbar):/Downloads> t = load_table('test.txt')
09:57:21 (daniel@sandbar):
/Downloads> md = {i: {'taxonomy': [d['taxonomy']]} for i, d in zip(t.ids(axis='observation'), t.metadata(axis='observation'))}
09:57:31 (daniel@sandbar):/Downloads> t.add_metadata(md, axis='observation')
09:57:39 (daniel@sandbar):
/Downloads> f = h5py.File('baz.txt', 'w')
09:57:46 (daniel@sandbar):/Downloads> t.to_hdf5(f, 'asd')
09:57:52 (daniel@sandbar):
/Downloads> f.close()


Reply to this email directly or view it on GitHub
#689 (comment)
.

@wasade
Copy link
Member

wasade commented Jan 29, 2016

Sure, are you able to issue a PR?

On Fri, Jan 29, 2016 at 7:03 PM, amnona notifications@github.com wrote:

Cool. That explains it :)
Maybe worth updating the doc for the add_metadata() function to state that?

Thanks!
Amnon

On Fri, Jan 29, 2016 at 9:58 AM, Daniel McDonald <notifications@github.com

wrote:

The expectation is that taxonomy is a list of str. The JSON format is
much more flexible on metadata, and this is a known issue. Work on
refactoring the HDF5 formatters and parsers is deferred until the Table
migrates to skbio.

09:57:08 (daniel@sandbar):/Downloads> t = load_table('test.txt')
09:57:21 (daniel@sandbar):
/Downloads> md = {i: {'taxonomy':
[d['taxonomy']]} for i, d in zip(t.ids(axis='observation'),
t.metadata(axis='observation'))}
09:57:31 (daniel@sandbar):/Downloads> t.add_metadata(md,
axis='observation')
09:57:39 (daniel@sandbar):
/Downloads> f = h5py.File('baz.txt', 'w')
09:57:46 (daniel@sandbar):/Downloads> t.to_hdf5(f, 'asd')
09:57:52 (daniel@sandbar):
/Downloads> f.close()


Reply to this email directly or view it on GitHub
<
https://github.com/biocore/biom-format/issues/689#issuecomment-176887767>
.


Reply to this email directly or view it on GitHub
#689 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants