Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column header naming #10

Open
peterjc opened this issue Feb 23, 2018 · 2 comments
Open

Column header naming #10

peterjc opened this issue Feb 23, 2018 · 2 comments

Comments

@peterjc
Copy link
Collaborator

peterjc commented Feb 23, 2018

I was looking at updating the sample output in the Galaxy wrapper help text, and realised that the v0.0.3 changes went beyond adding new genus columns:

abaizan/kodoja_galaxy@0597e33

Old column headings:

  • Species
  • Tax_ID
  • kraken
  • kaiju
  • combined

New column names:

  • Species
  • Species TaxID
  • Species sequences
  • Species sequences (stringent)
  • Genus
  • Genus sequences
  • Genus sequences (stringent)

The new names are much longer, which can be a usability issue for viewing the output as a table (in Excel or in Galaxy), and yet to me are not any clearer.

For the Galaxy wrapper, we would be able to explain the columns in the tool's help text. For other Kodoja users this can go in the tool's command line help, the README file or other documentation.

Previously you have species level read counts from Kraken, Kaiju, and their intersection. Now there are just two counts for "Species sequences" and "Species sequences (stringent)" which is not as obvious.

What it not clear from the small test cases is what should be present in the genus columns (and if this information is repeated if for example multiple species from that genus are identified). I wonder if two tables is clearer, for species level reporting and genus level reporting?

Also could/should the genus TaxID be included?

@peterjc
Copy link
Collaborator Author

peterjc commented Feb 23, 2018

Possible new help text, have not yet checked actual implementation of the counts vs stringent counts:

Updated after reading the code:

abaizan/kodoja_galaxy@8e4e133

@peterjc
Copy link
Collaborator Author

peterjc commented Apr 6, 2018

After discussion, the Genus sequences and Genus sequences (stringent) columns ought to be zero in the test data output (rather than blank). These are read counts assigned to the genus but not to ANY species.

Thus if we have 300 reads matching Ipomovirus, of which 45 matched Cassava brown streak virus, and 30 matched Ugandan cassava brown streak virus, both rows of the table would report genus numbers of 300 - 45 - 30 = 225 reads. (I wonder if putting the genus level total might be simpler to understand, here 300?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant