Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balance summary #172

Merged
merged 25 commits into from
Jun 14, 2017
Merged

Balance summary #172

merged 25 commits into from
Jun 14, 2017

Conversation

mortonjt
Copy link
Collaborator

@mortonjt mortonjt commented May 1, 2017

Addresses #181

Still a work in progress - need to figure out how to fix the scaling in the rendering.

This basically provides some better summaries for the individual balances. This ultimately allows for users to visualize how a single balance relates to the metadata and what microbes the balance is composed of.

Below is an example how to run it and the results of the visualization.

qiime gneiss balance-taxonomy \
    --i-model 88soils_regression_model.qza \
    --o-visualization 88soils_summary

screen shot 2017-05-27 at 7 46 35 pm

Help menu

tests-MacBook-Pro-4:tests mortonjt$ qiime gneiss balance-taxonomy
Usage: qiime gneiss balance-taxonomy [OPTIONS]

  Visualize the distribution of a single balance and summarize its numerator
  and denominator components.

Options:
  --i-balances PATH               Artifact: FeatureTable[Balance]  [required]
                                  The table of balances resulting from the ilr
                                  transform.
  --i-tree PATH                   Artifact: Phylogeny[Rooted]  [required]
                                  The
                                  tree used to calculate the balances.
  --i-taxonomy PATH               Artifact: FeatureData[Taxonomy]  [required]
                                  Taxonomy information for the OTUs.
  --p-balance-name TEXT           [required]
                                  Name of the balance to summarize.
  --p-taxa-level [family|genus|kingdom|class|order|species|phyla]
                                  [default: phyla]
                                  Level of taxonomy to
                                  summarize.
  --m-metadata-file PATH          Metadata mapping file  [optional]
  --m-metadata-category TEXT      Category from metadata mapping file
                                  [optional]
  --o-visualization PATH          Artifact: Visualization  [required if not
                                  passing --output-dir]
  --output-dir DIRECTORY          Output unspecified results to a directory
  --cmd-config PATH               Use config file for command options
  --verbose                       Display verbose output to stdout and/or
                                  stderr during execution of this action.
                                  [default: False]
  --help                          Show this message and exit.

@nbokulich do you have any thoughts on the user interface here?

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 98.679% when pulling 3b9e00e on mortonjt:balance-summary into 4e76bcd on biocore:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 98.679% when pulling 3b9e00e on mortonjt:balance-summary into 4e76bcd on biocore:master.

@nbokulich
Copy link

I'd recommend making taxa_level an int as levels may change depending on the taxonomy.

Also, don't place an upper limit, as not all taxonomies will necessarily have 6 or 7 tidy levels. (if taxa_level > len(taxon): taxa_level = taxon[-1])

Looks great!

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.08%) to 98.704% when pulling f78b1a5 on mortonjt:balance-summary into 1baf580 on biocore:master.

@mortonjt
Copy link
Collaborator Author

Finally this passed! @qiyunzhu do you mind if you could take a look at this?

@mortonjt
Copy link
Collaborator Author

@nbokulich @ebolyen do you know how to force the ordering of options on the command line?
For instance here, it would be great if we could force the ordering ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', species'].

@ebolyen
Copy link
Member

ebolyen commented May 31, 2017

@mortonjt the underlying choices are stored as a set, so there's not a way to do that at this time.

Copy link
Collaborator

@qiyunzhu qiyunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mortonjt Great job! Only trivial comments.


def setUp(self):
self.results = "results"
os.mkdir(self.results)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it cause problem without making a temporary directory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope - because I tear it down right after creation. The main reason why I do this is because it makes debugging a heck of a lot easier.

It is making the assumption that the current directory is clean and doesn't have the folder results. This is a safe assumption, particularly given that these file directories are created on the fly.

index_f.write('<h1>Balance Taxonomy</h1>\n')
index_f.write('<img src="barplots.svg" alt="barplots">\n\n')
index_f.write(('<h3>Numerator taxa</h3>\n'
'<a href="numerator.csv">\n'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial. There two HTML lines can be merged in one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bam done!

num_clade = st.children[NUMERATOR]
denom_clade = st.children[DENOMINATOR]
if num_clade.is_tip():
num_ = pd.DataFrame(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This num_ variable is not declared before the if block?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't matter - since it is declared in both the if and the else.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.08%) to 98.701% when pulling 7515c4e on mortonjt:balance-summary into 1baf580 on biocore:master.

Copy link
Contributor

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, a couple of questions:

  • Should NUMERATOR and DENOMINATOR be parameters?
  • Agree with @nbokulich comment but I will add that depending of the target gene or dataset the taxonomy (should it be feature-description?) might have more than 7 levels.

@mortonjt
Copy link
Collaborator Author

mortonjt commented Jun 8, 2017

  1. The NUMERATOR and DENOMINATOR can't be parameters -- those quantities are fixed constants. Basically, this is performing a mapping such that the right child maps to the numerator of the balance, and the left child maps to the denominator of the balance. These constants are crucial for making sure that this isn't mixed up (it is very surprisingly easy to mix numerator with denominator).
  2. Most definitely - we would like to have this generalized to handle feature-metadata in general. However, there are multiple structural obstacles for making this happen. Namely a fixed file format for accepting this sort of object and q2 compatibility. Right now, there doesn't exist a feature-metadata object in qiime2, but there is a FeatureData[Taxonomy] object. Right now, we are only concerning the FeatureData[Taxonomy] object, which only has taxonomy. I am not familiar with taxa strings with more than 7 levels, or what the semantics of that would look like. @nbokulich @antgonza do you have an example readily available?

@nbokulich
Copy link

An example of a taxonomy with > 7 levels is the raw SILVA taxonomy. It has the same format as greengenes but contains more levels, e.g.:

AB001038.1.1721	Eukaryota;Archaeplastida;Chloroplastida;Chlorophyta;Chlorophyceae;Chlamydomonadales;Polytoma;Chlamydomonas pulsatilla

But there are other taxonomies that contain fewer than 7 levels, e.g., RDP taxonomy I believe is 6 levels.

Another example (to consider non-taxonomic feature data) would be gene pathway/ontology data. E.g., picrust data reports KEGG pathways that are 3 levels deep.

@mortonjt
Copy link
Collaborator Author

Ok! I have made the taxonomies integer valued, so that multiple levels can be accessed.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.08%) to 98.7% when pulling d9d2640 on mortonjt:balance-summary into 1baf580 on biocore:master.

@qiyunzhu qiyunzhu merged commit a6d1c61 into biocore:master Jun 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants