Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize 'compare' with joblib #666

Open
wants to merge 20 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@olgabot
Copy link
Contributor

commented Apr 8, 2019

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

ugh I have to remove those 10x barcode commits

@olgabot olgabot force-pushed the czbiohub:olgabot/parallelize-compare branch 2 times, most recently from 700d506 to 3618423 Apr 8, 2019

@codecov

This comment has been minimized.

Copy link

commented Apr 8, 2019

Codecov Report

Merging #666 into master will decrease coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #666      +/-   ##
==========================================
- Coverage   89.26%   89.18%   -0.08%     
==========================================
  Files          27       28       +1     
  Lines        4238     4264      +26     
  Branches       39       39              
==========================================
+ Hits         3783     3803      +20     
- Misses        455      461       +6
Impacted Files Coverage Δ
sourmash/commands.py 89.01% <100%> (-0.08%) ⬇️
sourmash/compare.py 100% <100%> (ø)
sourmash/signature_json.py 95.62% <0%> (-1.25%) ⬇️
sourmash/lca/lca_utils.py 95.23% <0%> (-0.8%) ⬇️
sourmash/lca/command_index.py 90.3% <0%> (-0.45%) ⬇️
sourmash/sbt.py 85.32% <0%> (-0.4%) ⬇️
sourmash/sbt_storage.py 86.36% <0%> (+0.97%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab0279a...4e02e03. Read the comment docs.

@olgabot olgabot force-pushed the czbiohub:olgabot/parallelize-compare branch from f587f3c to 73d3a48 Apr 8, 2019

@olgabot

This comment has been minimized.

Copy link
Contributor Author

commented Apr 8, 2019

Addresses: #638

@olgabot

This comment has been minimized.

Copy link
Contributor Author

commented Apr 17, 2019

Hello! This is ready from my end. The travis build is failing on the wheel building:

Screen Shot 2019-04-16 at 3 05 01 PM

Is there anything else I should do to prepare the PR for submission?

@luizirber

This comment has been minimized.

Copy link
Member

commented Apr 23, 2019

Travis build working again (just restarted the job, it was a network error during a dependency installation).

luizirber and others added some commits Apr 23, 2019

@olgabot olgabot force-pushed the czbiohub:olgabot/parallelize-compare branch from 099e6c2 to 130e5ab Apr 26, 2019

names = [x.name() for x in siglist]

if ignore_abundance:
true_similarities = np.array(

This comment has been minimized.

Copy link
@olgabot

olgabot Apr 26, 2019

Author Contributor

Hello, I'm confusing myself here. I think I'm missing something because I'm having trouble getting the tests working and it seems that the similarity matrices are different depending on which platform they're made from. I added a conditional with hardcoded values of the true similarities depending on whether ignore_abundance is True or not, where the hardcoded values are generated from running them locally on my mac laptop. However, when I run on a remote ubuntu machine to match what Travis is doing, the arrays aren't the same anymore.

Here is the similarity calculated on MacOS:

(Pdb) true_similarities
array([[1.   , 0.   , 0.074, 0.   , 0.   , 0.078, 0.086],
       [0.   , 1.   , 0.   , 0.382, 0.364, 0.   , 0.   ],
       [0.074, 0.   , 1.   , 0.   , 0.   , 0.072, 0.078],
       [0.   , 0.382, 0.   , 1.   , 0.386, 0.   , 0.   ],
       [0.   , 0.364, 0.   , 0.386, 1.   , 0.   , 0.   ],
       [0.078, 0.   , 0.072, 0.   , 0.   , 1.   , 0.356],
       [0.086, 0.   , 0.078, 0.   , 0.   , 0.356, 1.   ]])
(Pdb) similarities
array([[1.   , 0.   , 0.356, 0.078, 0.   , 0.072, 0.   ],
       [0.   , 1.   , 0.   , 0.   , 0.382, 0.   , 0.386],
       [0.356, 0.   , 1.   , 0.086, 0.   , 0.078, 0.   ],
       [0.078, 0.   , 0.086, 1.   , 0.   , 0.074, 0.   ],
       [0.   , 0.382, 0.   , 0.   , 1.   , 0.   , 0.364],
       [0.072, 0.   , 0.078, 0.074, 0.   , 1.   , 0.   ],
       [0.   , 0.386, 0.   , 0.   , 0.364, 0.   , 1.   ]])

So running on my mac, ignore_abundance=False fails, while on the ubuntu system, ignore_abundance=True fails. Is there something else I should be accounting for in this test?

olgabot added some commits May 8, 2019

@olgabot

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2019

Looks like it's just Python 2.7 that's failing now: https://travis-ci.com/dib-lab/sourmash/jobs/198732207

@olgabot

This comment has been minimized.

Copy link
Contributor Author

commented May 10, 2019

Somehow the from . import signature as sig line in commands.py isn't getting recognized in Python 2.7. If one adds the import statement to every function, it works for me, meaning PyCharm stops complaining, but we'll see if Travis stops

@luizirber

This comment has been minimized.

Copy link
Member

commented May 10, 2019

Python 2 and 3 behave differently about imports, I think we need to add an
from __future__ import absolute_import to the beginning of that file

@olgabot

This comment has been minimized.

Copy link
Contributor Author

commented May 10, 2019

Cool, just added that

@luizirber

This comment has been minimized.

Copy link
Member

commented May 10, 2019

Hmm, still failing. This is very weird =/


labeltext.append(E.name())
labeltext = [sig.name() for sig in siglist]

This comment has been minimized.

Copy link
@luizirber

luizirber May 12, 2019

Member

now, THAT'S a fun bug! Somehow this sig here replaces the sig module imported in the beginning of the file, and so the loop works for the first file but fails for the second!

If you change the name to something else, then everything works:

Suggested change
labeltext = [sig.name() for sig in siglist]
labeltext = [item.name() for item in siglist]

And that's very confusing behavior! I'll try to find the source, but I'm pretty sure Python 3 change the lexical scope for this and then it's not a bug anymore (and the reason why it works in 3.x, but not in 2.7)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.