metadata_to_dataframe order matters #855

guillaume-gricourt · 2021-02-10T12:04:49Z

Hi,
When you have this:
good.txt
it'ok
When the order of metadata is different :
bad.txt
You have :
ValueError: 2 columns passed, passed data had 6 columns
Maybe, taking account the maximum of value before parsing them ?
biom-format v2.1.10

The text was updated successfully, but these errors were encountered:

wasade · 2021-02-10T16:19:07Z

Hi @guillaume-gricourt, that parser was designed to support classic OTU tables from QIIME1 where the lineages were assured to be balanced with placeholders for unidentified names. TSVs are not BIOM-Format, and are unstructured which, which creates a wide range of edge cases.

As a work around, you could parse counts without metadata, parse the taxonomy separately and add it in with biom.Table.add_metadata?

guillaume-gricourt · 2021-02-10T16:31:39Z

Yeah it's a good workaround.
I create biom files from tsv to load data into Phyloseq package. Also, this file is my entrypoint to perform others analysis.
From now on, when I'll create this biom file I'll check the order of metadata on my tsv file.
As you can create this kind of biom file, it seems to me, it's a feature of interest to implement ?

wasade · 2021-02-10T16:55:06Z

I'd greatly welcome a pull request to resolve this feature request, otherwise I'm not sure when I'll be able to get to it. A possible work around is below.

$ biom convert -i bad.txt -o bad.biom --to-hdf5
$ python
Python 3.6.11 | packaged by conda-forge | (default, Aug  5 2020, 20:19:23)
[GCC Clang 10.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import biom
>>> df = pd.read_csv('bad.txt', sep='\t')
>>> df.set_index('#OTU ID', inplace=True)
>>> t = biom.load_table('bad.biom')
>>> formatted = {k: {'taxonomy': v.split(';')} for k, v in df['taxonomy'].items()}
>>> t.add_metadata(formatted, axis='observation')
>>> with biom.util.biom_open('okay.biom', 'w') as fp:
...   t.to_hdf5(fp, 'converted')
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metadata_to_dataframe order matters #855

metadata_to_dataframe order matters #855

guillaume-gricourt commented Feb 10, 2021 •

edited

Loading

wasade commented Feb 10, 2021

guillaume-gricourt commented Feb 10, 2021

wasade commented Feb 10, 2021

metadata_to_dataframe order matters #855

metadata_to_dataframe order matters #855

Comments

guillaume-gricourt commented Feb 10, 2021 • edited Loading

wasade commented Feb 10, 2021

guillaume-gricourt commented Feb 10, 2021

wasade commented Feb 10, 2021

guillaume-gricourt commented Feb 10, 2021 •

edited

Loading