Loads tree in NewickFormat, uses BP to parse tree faster #42

ahdilmore · 2021-03-18T17:21:39Z

Uses BP to parse tree to improve Phylo-RPCA's memory usage. I also updated the setup.py to account for new dependencies.

uses BP to parse tree faster

gwarmstrong · 2021-03-18T17:33:17Z

gemelli/rpca.py

@@ -39,6 +42,11 @@ def phylogenetic_rpca(table: biom.Table,
       gemelli.
    """

+    # loads NewickFormat tree as bp.BP tree; loads & parses tree faster
+    phylogeny = get_bp(phylogeny)


get_bp is only like two lines and it is pretty easy to get burned by importing from private modules in python. I would probably recommend just duplicating that code here.

ElDeveloper

Looks good. Thanks so much @ahdilmore. Since get_bp isn't a public function I would suggest to just copy the 2 lines directly into the code.

Also, NewickFormat can probably not be included here because the standalone CLI will not work if q2_types isn't installed. @cameronmartino or @gibsramen any suggestions on how to handle this? Short of writing a wrapper for this function that's decorated with the relevant types, etc?

gemelli/rpca.py

gibsramen · 2021-03-18T18:10:40Z

Not super familiar with how gemelli handles trees (or bp for that matter) but why does this necessitate changing phylogeny from a TreeNode object? Is the tree passed to phylogenetic_rpca changed when using bp?

gwarmstrong · 2021-03-18T18:13:44Z

NewickFormat can probably not be included here because the standalone CLI will not work if q2_types isn't installed.

As a hack, you could probably use importlib.

>>> import importlib
>>> foo = importlib.import_module('random')
>>> foo.randrange(0, 10)
4
>>> bar = getattr(foo, 'randrange')
>>> bar
<bound method Random.randrange of <random.Random object at 0x000001AE50C48F20>>
>>> from random import randrange
>>> bar == randrange
True

So something like

try:
    q2_types = importlib.load_module('q2_types')
    NewickFormat = getattr(q2_types, 'NewickFormat')
except:
    NewickFormat = str

def phylo_rpca(..., tree: NewickFormat, ...):
    ...

Could work. Though I would want to check integration with qiime2.

Alternatively, it is pretty easy to include only qiime2 and q2-types in a conda install by adding -c qiime2 without doing a full qiime2 installation, so that NewickFormat could be used for the standalone without a full Qiime2 installation.

gwarmstrong · 2021-03-18T18:15:47Z

Not super familiar with how gemelli handles trees (or bp for that matter) but why does this necessitate changing phylogeny from a TreeNode object? Is the tree passed to phylogenetic_rpca changed when using bp?

The iow.parse_newick is orders of magnitude faster than the skbio TreeNode parser. See https://github.com/wasade/improved-octo-waddle/blob/master/ipynb/performance%20comparison.ipynb

gibsramen · 2021-03-18T18:30:41Z

Sure but in that case why not specify that phylogeny should be of type bp.BP?

Aside from that

Alternatively, it is pretty easy to include only qiime2 and q2-types in a conda install by adding -c qiime2 without doing a full qiime2 installation, so that NewickFormat could be used for the standalone without a full Qiime2 installation.

I think this is a decent solution.

gwarmstrong · 2021-03-18T18:43:43Z

Sure but in that case why not specify that phylogeny should be of type bp.BP?

Good point. In that case I think a transformer would need to be defined for the q2 plugin.

cameronmartino · 2021-04-01T21:36:35Z

Thanks for the contribution @ahdilmore, this tree import is way faster. Also, good points from @gibsramen, @ElDeveloper, and @gwarmstrong.

I made a few changes:

I moved the tree import to preprocessing.py as a function bp_read_phylogeny. That way we can use it in multiple commands (rclr, ctf, and rpca).
I added this import to the phylo ctf and rclr command.
The QIIME2 functions now all work fine with the NewickFormat 👍

It looks like everything is now passing. If someone wants to take a second check at everything, I think we should be good to merge.

ElDeveloper

Looks great, thanks @ahdilmore and @cameronmartino! Just one suggested change because the comment seems out of place.

ElDeveloper · 2021-04-03T03:47:00Z

gemelli/preprocessing.py

+        # The file will still be closed even though we return from within the
+        # with block: see https://stackoverflow.com/a/9885287/10730311.


This comment seems a bit outdated.

Suggested change

# The file will still be closed even though we return from within the

# with block: see https://stackoverflow.com/a/9885287/10730311.

Good catch @ElDeveloper. That comment is now removed. Thanks!

ElDeveloper · 2021-04-05T17:24:04Z

Thanks @cameronmartino!

ahdilmore and others added 2 commits March 18, 2021 13:09

uses BP to parse tree faster

e7d5708

Merge pull request #1 from ahdilmore/add-newick-format

d6213e5

uses BP to parse tree faster

gwarmstrong reviewed Mar 18, 2021

View reviewed changes

ElDeveloper requested changes Mar 18, 2021

View reviewed changes

gemelli/rpca.py Outdated Show resolved Hide resolved

gemelli/rpca.py Outdated Show resolved Hide resolved

Moved get_bp code here

12fded9

cameronmartino added 5 commits April 1, 2021 13:24

add bp import and work into phylo rpca command

c490a0a

fix bp input for CTF

dc4d400

extend bp import to standalone text

b7d114f

fix imports in ctf/rclr

68399e7

fix flake8

67aeab8

cameronmartino requested a review from ElDeveloper April 2, 2021 00:15

ElDeveloper approved these changes Apr 3, 2021

View reviewed changes

remove out of place comment

8b82c67

cameronmartino merged commit 9dfa1d1 into biocore:phylo-rclr Apr 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loads tree in NewickFormat, uses BP to parse tree faster #42

Loads tree in NewickFormat, uses BP to parse tree faster #42

ahdilmore commented Mar 18, 2021

gwarmstrong Mar 18, 2021

ElDeveloper left a comment

gibsramen commented Mar 18, 2021

gwarmstrong commented Mar 18, 2021

gwarmstrong commented Mar 18, 2021

gibsramen commented Mar 18, 2021

gwarmstrong commented Mar 18, 2021

cameronmartino commented Apr 1, 2021

ElDeveloper left a comment

ElDeveloper Apr 3, 2021

cameronmartino Apr 3, 2021

ElDeveloper commented Apr 5, 2021

		# The file will still be closed even though we return from within the
		# with block: see https://stackoverflow.com/a/9885287/10730311.

Loads tree in NewickFormat, uses BP to parse tree faster #42

Loads tree in NewickFormat, uses BP to parse tree faster #42

Conversation

ahdilmore commented Mar 18, 2021

gwarmstrong Mar 18, 2021

Choose a reason for hiding this comment

ElDeveloper left a comment

Choose a reason for hiding this comment

gibsramen commented Mar 18, 2021

gwarmstrong commented Mar 18, 2021

gwarmstrong commented Mar 18, 2021

gibsramen commented Mar 18, 2021

gwarmstrong commented Mar 18, 2021

cameronmartino commented Apr 1, 2021

ElDeveloper left a comment

Choose a reason for hiding this comment

ElDeveloper Apr 3, 2021

Choose a reason for hiding this comment

cameronmartino Apr 3, 2021

Choose a reason for hiding this comment

ElDeveloper commented Apr 5, 2021