Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plans for sourmash 4.0 #835

Open
ctb opened this issue Jan 8, 2020 · 5 comments
Open

plans for sourmash 4.0 #835

ctb opened this issue Jan 8, 2020 · 5 comments
Labels
4.0
Milestone

Comments

@ctb
Copy link
Member

@ctb ctb commented Jan 8, 2020

Excerpted from #762, now that 3.0 is out --

thoughts for 4.0 include,

  • follow @standage hints in #785 and deprecate some subcommands
  • consider making --scaled the default instead of --num-hashes (this is controversial tho :)
  • would like to do a better job of simulations and theory before we release sourmash 4.0, or at least get it on the radar. We need to start understanding (and explicating) where the basic scaled approach is good and not so good.
@standage

This comment has been minimized.

Copy link
Member

@standage standage commented Jan 8, 2020

Drop Python 2.7 support?

@luizirber

This comment has been minimized.

Copy link
Member

@luizirber luizirber commented Jan 8, 2020

There are some leftovers in https://github.com/dib-lab/sourmash/projects/2 for sourmash 3.0, should we create another project and track 4.0 (or just use issues/labels, since the projects are not being used anyway? 😬 )

@satishv

This comment has been minimized.

Copy link

@satishv satishv commented Jan 10, 2020

We love it, if you guys can improve gather's performance.

here are some stats... looks like v3 may be slower than v2?

Should we file a task? we are happy to help. perhaps in v4? Is a smaller release planned on top of v3 codebase anytime soon?

thanks

sourmash compute -k 31 --scaled 5000 -o testv2.sig test.01M_R1_L001.fastq.gz
0m17.907s

sourmash gather -k 31 --scaled 5000 -o gatherv2 testv2.sig db/ecolidb.sbt.json
0m1.896s

sourmash v3.0.1
sourmash compute -k 31 --scaled 5000 -o testv3.sig test.01M_R1_L001.fastq.gz
0m42.243s

sourmash gather -k 31 --scaled 5000 -o gatherv3 testv3.sig db/ecolidb.sbt.json
0m4.429s```
@ctb

This comment has been minimized.

Copy link
Member Author

@ctb ctb commented Jan 11, 2020

@ctb

This comment has been minimized.

Copy link
Member Author

@ctb ctb commented Jan 13, 2020

regarding versions, optimizations and speed, per @satishv comment -- a collection of random thoughts.

3.x should be compatible with 2.x in terms of databases and core functionality, although we may be adding new command line flags. So you can always use 2.x for now!

In terms of optimization, my personal perspective (not necessarily shared by others :) is that functionality & correctness, maintainability, user experience, and memory usage come before speed. There are no hard and fast rules here, of course, but we have finite attentional resources and have to prioritize somehow.

That having been said, we are always happy to take contributions. The move to rust in 3.0 is opening up a lot of potential optimizations, since rust is (among other things) threadsafe and robust, and we would be happy to receive PRs for specific optimizations. We are also enthusiastic about benchmarking that highlights problem areas, because more information is always better - so thanks, Satish!

@luizirber luizirber added the 4.0 label Jan 14, 2020
@luizirber luizirber added this to the 4.0 milestone Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.