problem with to_hdf5() #689

amnona · 2016-01-28T07:24:22Z

When trying to save a biom table with observation metadata, i get the following error:

/Users/amnon/Python/git/heatsequer/heatsequer/experiment/io.py in savetobiom(expdat, filename, format)
    581         if format=='hdf5':
    582                 with biom.util.biom_open(filename, 'w') as f:
--> 583                         tab.to_hdf5(f, "heatsequer")
    584         elif format=='json':
    585                 with open(filename,'w') as f:

/Users/amnon/anaconda/lib/python2.7/site-packages/biom/table.pyc in to_hdf5(self, h5grp, generated_by, compress, format_fs)
   3533                   self.ids(axis='observation'),
   3534                   self.metadata(axis='observation'),
-> 3535                   self.group_metadata(axis='observation'), 'csr', compression)
   3536         axis_dump(h5grp.create_group('sample'), self.ids(),
   3537                   self.metadata(), self.group_metadata(), 'csc', compression)

/Users/amnon/anaconda/lib/python2.7/site-packages/biom/table.pyc in axis_dump(grp, ids, md, group_md, order, compression)
   3505                     # Create the dataset for the current category,
   3506                     # putting values in id order
-> 3507                     formatter[category](grp, category, md, compression)
   3508
   3509             # Create the group for the group metadata

/Users/amnon/anaconda/lib/python2.7/site-packages/biom/table.pyc in vlen_list_of_str_formatter(grp, header, md, compression)
    272             continue
    273         value = np.asarray(m[header])
--> 274         data[i, :len(value)] = value
    275     # Change the None entries on data to empty strings ""
    276     data = np.where(data == np.array(None), "", data)

TypeError: len() of unsized object

This problem does not happen without the metadata added to the biom table.
The metadata is added using:
table.add_metadata(taxdict,axis='observation')
where taxdict is of the form:
taxdict[OBSID]={'taxonomy': 'unknown'}

note that to_json() works fine with this table, but later get the same error if try to convert this json file to hdf5.
Attached is the json file
test.txt

wasade · 2016-01-29T17:58:41Z

The expectation is that taxonomy is a list of str. The JSON format is much more flexible on metadata, and this is a known issue. Work on refactoring the HDF5 formatters and parsers is deferred until the Table migrates to skbio.

09:57:08 (daniel@sandbar):~/Downloads> t = load_table('test.txt')

09:57:21 (daniel@sandbar):~/Downloads> md = {i: {'taxonomy': [d['taxonomy']]} for i, d in zip(t.ids(axis='observation'), t.metadata(axis='observation'))}

09:57:31 (daniel@sandbar):~/Downloads> t.add_metadata(md, axis='observation')

09:57:39 (daniel@sandbar):~/Downloads> f = h5py.File('baz.txt', 'w')

09:57:46 (daniel@sandbar):~/Downloads> t.to_hdf5(f, 'asd')

09:57:52 (daniel@sandbar):~/Downloads> f.close()

amnona · 2016-01-29T18:03:24Z

Cool. That explains it :)
Maybe worth updating the doc for the add_metadata() function to state that?

Thanks!
Amnon

On Fri, Jan 29, 2016 at 9:58 AM, Daniel McDonald notifications@github.com
wrote:

The expectation is that taxonomy is a list of str. The JSON format is
much more flexible on metadata, and this is a known issue. Work on
refactoring the HDF5 formatters and parsers is deferred until the Table
migrates to skbio.

09:57:08 (daniel@sandbar):/Downloads> t = load_table('test.txt')
09:57:21 (daniel@sandbar):/Downloads> md = {i: {'taxonomy': [d['taxonomy']]} for i, d in zip(t.ids(axis='observation'), t.metadata(axis='observation'))}
09:57:31 (daniel@sandbar):/Downloads> t.add_metadata(md, axis='observation')
09:57:39 (daniel@sandbar):/Downloads> f = h5py.File('baz.txt', 'w')
09:57:46 (daniel@sandbar):/Downloads> t.to_hdf5(f, 'asd')
09:57:52 (daniel@sandbar):/Downloads> f.close()

—
Reply to this email directly or view it on GitHub
#689 (comment)
.

wasade · 2016-01-29T18:07:07Z

Sure, are you able to issue a PR?

On Fri, Jan 29, 2016 at 7:03 PM, amnona notifications@github.com wrote:

Cool. That explains it :)
Maybe worth updating the doc for the add_metadata() function to state that?

Thanks!
Amnon

On Fri, Jan 29, 2016 at 9:58 AM, Daniel McDonald <notifications@github.com

wrote:

The expectation is that taxonomy is a list of str. The JSON format is
much more flexible on metadata, and this is a known issue. Work on
refactoring the HDF5 formatters and parsers is deferred until the Table
migrates to skbio.

09:57:08 (daniel@sandbar):/Downloads> t = load_table('test.txt')
09:57:21 (daniel@sandbar):/Downloads> md = {i: {'taxonomy':
[d['taxonomy']]} for i, d in zip(t.ids(axis='observation'),
t.metadata(axis='observation'))}
09:57:31 (daniel@sandbar):/Downloads> t.add_metadata(md,
axis='observation')
09:57:39 (daniel@sandbar):/Downloads> f = h5py.File('baz.txt', 'w')
09:57:46 (daniel@sandbar):/Downloads> t.to_hdf5(f, 'asd')
09:57:52 (daniel@sandbar):/Downloads> f.close()

—
Reply to this email directly or view it on GitHub
<
https://github.com/biocore/biom-format/issues/689#issuecomment-176887767>
.

—
Reply to this email directly or view it on GitHub
#689 (comment)
.

wasade closed this as completed Jan 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem with to_hdf5() #689

problem with to_hdf5() #689

amnona commented Jan 28, 2016

wasade commented Jan 29, 2016

amnona commented Jan 29, 2016

wasade commented Jan 29, 2016

problem with to_hdf5() #689

problem with to_hdf5() #689

Comments

amnona commented Jan 28, 2016

wasade commented Jan 29, 2016

amnona commented Jan 29, 2016

wasade commented Jan 29, 2016