Skip to content
This repository has been archived by the owner on Nov 9, 2023. It is now read-only.

Include SortMeRNA index #24

Open
wasade opened this issue Oct 7, 2015 · 7 comments
Open

Include SortMeRNA index #24

wasade opened this issue Oct 7, 2015 · 7 comments

Comments

@wasade
Copy link
Member

wasade commented Oct 7, 2015

Computing the index on the 97% representative sequences is expensive, and easy to forget to do. It would be nice if a precomputed index were available in this repo.

@colinbrislawn
Copy link

👍 I was just going to ask for that.

@colinbrislawn
Copy link

These indexed files are big, even when zipped.

38M     97_otus.fasta.gz

310M    97_otus.bursttrie_0.dat.gz
730K    97_otus.kmer_0.dat.gz
525M    97_otus.pos_0.dat.gz
674K    97_otus.stats.gz

Do we need to include all of them? Is the size justified given that SortMeRNA is not used by default? @wasade @gregcaporaso, is SortMeRNA about to become much more important in qiime?

@wasade
Copy link
Member Author

wasade commented Dec 3, 2015

What about placing them on ftp.microbio.me?

On Thu, Dec 3, 2015 at 1:12 PM, Colin Brislawn notifications@github.com
wrote:

These indexed files are big, even when zipped.

38M 97_otus.fasta.gz

310M 97_otus.bursttrie_0.dat.gz
730K 97_otus.kmer_0.dat.gz
525M 97_otus.pos_0.dat.gz
674K 97_otus.stats.gz

Do we need to include all of them? Is the size justified given that
SortMeRNA is not used by default?


Reply to this email directly or view it on GitHub
#24 (comment)
.

@colinbrislawn
Copy link

Sure, that would work. But would that defeat the purpose of qiime-default-reference? If SortMeRNA ends up replacing the uclust tax assigner as the default, I think this is well worth it.

These files are large enough that we would have to store them using Git LFS. Do we want to introduce that?

@jairideout
Copy link
Member

Note that pypi limits the size of packages and we are nearing that limit with qiime-default-reference. I can't find this size published anywhere but release uploads will fail if too large.

@wasade
Copy link
Member Author

wasade commented Dec 3, 2015

Doesn't defeat the purpose as setup.py could just source the files.

On Thu, Dec 3, 2015 at 2:01 PM, Jai Ram Rideout notifications@github.com
wrote:

Note that pypi limits the size of packages and we are nearing that limit
with qiime-default-reference. I can't find this size published anywhere but
release uploads will fail if too large.


Reply to this email directly or view it on GitHub
#24 (comment)
.

@colinbrislawn
Copy link

Doesn't defeat the purpose as setup.py could just source the files.

True. I thought having the defaults in one repo was preferable, but idk about the original goals. It is functionally the same.

Another idea: What if we index these files the first time they are used, and save them alongside greengenes. This takes about 20-40 mins, but would only have to be done once and prevents us from adding 700 mb to everyone's base qiime distribution. We already do this every time assign_taxonomy.py -m sortmerna is run; we have the code and everything, but assign_taxonomy.py removes it when finished. Why not save it?

I'm interested in taking this, either for 1.9.2 or for 2. If this is going to get use as a default, either for OTU picking or tax assignment, I'm very interested.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants