Major question/concern: Order of operations for applying filters will affect the results. Currently, the filters are applied in steps Fraction, Fragment, and then Taxonomy. Which sequences that pass the Fraction filter is a bit random and certainly uncaring to the biology; that filter just blindly removes based on a modulo boolean check. Whatever sequences remain may still not pass the other filters. This has the potential to really hamstring an analysis.
Applying the Fraction filter last, would enable the other filters to thoroughly remove the biologically uninteresting sequences, leaving behind a set of sequences that we know the user is interested in, from which the Fraction filter can remove sequences in a more random, uncaring fashion.
But, changing the order of filtering operations will/may return different results compared to those from the EFI v1 code base.
Originally posted by @rbdavid in #151 (comment)
I would put this as a low priority question to ask John/Remi later this summer. It can be added as a low priority issue if you want.
Originally posted by @nilsoberg in #151 (comment)
Originally posted by @rbdavid in #151 (comment)
Originally posted by @nilsoberg in #151 (comment)