Column header naming #10

peterjc · 2018-02-23T12:09:12Z

I was looking at updating the sample output in the Galaxy wrapper help text, and realised that the v0.0.3 changes went beyond adding new genus columns:

abaizan/kodoja_galaxy@0597e33

Old column headings:

Species
Tax_ID
kraken
kaiju
combined

New column names:

Species
Species TaxID
Species sequences
Species sequences (stringent)
Genus
Genus sequences
Genus sequences (stringent)

The new names are much longer, which can be a usability issue for viewing the output as a table (in Excel or in Galaxy), and yet to me are not any clearer.

For the Galaxy wrapper, we would be able to explain the columns in the tool's help text. For other Kodoja users this can go in the tool's command line help, the README file or other documentation.

Previously you have species level read counts from Kraken, Kaiju, and their intersection. Now there are just two counts for "Species sequences" and "Species sequences (stringent)" which is not as obvious.

What it not clear from the small test cases is what should be present in the genus columns (and if this information is repeated if for example multiple species from that genus are identified). I wonder if two tables is clearer, for species level reporting and genus level reporting?

Also could/should the genus TaxID be included?

The text was updated successfully, but these errors were encountered:

peterjc · 2018-02-23T12:51:41Z

Possible new help text, have not yet checked actual implementation of the counts vs stringent counts:

Updated after reading the code:

abaizan/kodoja_galaxy@8e4e133

peterjc · 2018-04-06T10:40:46Z

After discussion, the Genus sequences and Genus sequences (stringent) columns ought to be zero in the test data output (rather than blank). These are read counts assigned to the genus but not to ANY species.

Thus if we have 300 reads matching Ipomovirus, of which 45 matched Cassava brown streak virus, and 30 matched Ugandan cassava brown streak virus, both rows of the table would report genus numbers of 300 - 45 - 30 = 225 reads. (I wonder if putting the genus level total might be simpler to understand, here 300?)

See discussion on #10 about this.

peterjc added the enhancement label Feb 23, 2018

peterjc added a commit that referenced this issue Sep 6, 2018

Zero not blank in cols 6 and 7 of virus_table.txt

2646bd5

See discussion on #10 about this.

peterjc mentioned this issue Aug 16, 2019

Species sequences and Genus sequences in virus_table.txt #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column header naming #10

Column header naming #10

peterjc commented Feb 23, 2018

peterjc commented Feb 23, 2018 •

edited

Loading

peterjc commented Apr 6, 2018

Column header naming #10

Column header naming #10

Comments

peterjc commented Feb 23, 2018

peterjc commented Feb 23, 2018 • edited Loading

peterjc commented Apr 6, 2018

peterjc commented Feb 23, 2018 •

edited

Loading