Skip to content

Commit

Permalink
Improves %id filtering, and adds easel filter miniapp.
Browse files Browse the repository at this point in the history
Tom suggests improving %id filtering so it doesn't select a fragment
over a full length sequence. Previous rule was to keep the
lower-indexed sequence (earlier in file); I now call this the
"origorder" preference rule.

Reimplemented esl_msaweight_IDFilter(), adding
esl_msaweight_IDFilter_adv() and esl_msaweight_IDFilter_txt(), along
same lines as revised PB weights. _adv() version for digital mode
alignments implements a "conscover" preference rule: within the "span"
of an aseq (from 1st to last residue), how many consensus columns does
it cover. The intent of the rule is to favor full length sequences
without introducing a lot of bias in insertion/deletion statistics.
_adv() can also be optionally configured to use the "origorder"
preference rule, or random preference. _txt(), for text mode
alignments, stays with the original implementation and the origorder
rule.

Wrote `easel filter` miniapp in cmd_filter.c, with man page
documentation in cmd_filter.md -- in Markdown instead of nroff, which
seems like a good long term switch to start making.

Reorganized unit tests in esl_msaweight, and added utest_idfilter().
  • Loading branch information
cryptogenomicon committed Apr 2, 2019
1 parent f952523 commit bd21cdd
Show file tree
Hide file tree
Showing 6 changed files with 947 additions and 157 deletions.

0 comments on commit bd21cdd

Please sign in to comment.