Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More cleaning/filtration steps #43

Open
fanninpm opened this issue May 24, 2022 · 1 comment
Open

More cleaning/filtration steps #43

fanninpm opened this issue May 24, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@fanninpm
Copy link

I'm in the middle of making an IRMA module for Adenoviruses. I came across your repo today and thought it would be useful for that purpose (I'm definitely thinking of using it to generate consensus sequences.) The IRMA paper mentions a few filtration steps that I thought would be a natural fit (in the "Methods" section, in the "Datasets" sub-section, in the "Influenza alignment dataset" sub-sub-section, second paragraph). In particular, they mentioned:

  • Removing duplicate sequences
    This should be the (second-)easiest of the bunch.
  • Removing sequences with greater than N ambiguous nucleotides
    In the paper, the authors specified N=5, which may be a good default setting for Influenza A/B segments.
  • Removing sequences causing frame-shifts
    I think this may be relatively difficult to calculate, compared to the others.
  • Removing short sequences
    This functionality is already implemented (--remove_short), but it may be nice to have the ability to specify a percentage of the alignment as a cutoff.
@KatyBrown
Copy link
Owner

I'm sorry it's taken such a long time to reply! We will look at incorporating these features. All except the frameshift seems reasonably straightforward - I'll look into it and get back to you.

@KatyBrown KatyBrown added the enhancement New feature or request label Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants