Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improves %id filtering, and adds
easel filter
miniapp.
Tom suggests improving %id filtering so it doesn't select a fragment over a full length sequence. Previous rule was to keep the lower-indexed sequence (earlier in file); I now call this the "origorder" preference rule. Reimplemented esl_msaweight_IDFilter(), adding esl_msaweight_IDFilter_adv() and esl_msaweight_IDFilter_txt(), along same lines as revised PB weights. _adv() version for digital mode alignments implements a "conscover" preference rule: within the "span" of an aseq (from 1st to last residue), how many consensus columns does it cover. The intent of the rule is to favor full length sequences without introducing a lot of bias in insertion/deletion statistics. _adv() can also be optionally configured to use the "origorder" preference rule, or random preference. _txt(), for text mode alignments, stays with the original implementation and the origorder rule. Wrote `easel filter` miniapp in cmd_filter.c, with man page documentation in cmd_filter.md -- in Markdown instead of nroff, which seems like a good long term switch to start making. Reorganized unit tests in esl_msaweight, and added utest_idfilter().
- Loading branch information