-
Notifications
You must be signed in to change notification settings - Fork 13
Description
We noticed that AMPlify strictly sticks to the 20 standard amino acids in input sequences and ignores all others, as stated in its help message:
$AMPlify -h
[...]
AMPlify v2.0.0
------------------------------------------------------
Predict whether a sequence is AMP or not.
Input sequences should be in fasta format.
Sequences should be shorter than 201 amino acids long,
and should not contain amino acids other than the 20 standard ones.
So far, so clear. But even if a stop codon is indicated with the commonly used asterisk *, the sequence is ignored. I believe this behaviour might not be desired, because several sequence annotation tools (e.g. Pyrodigal, Prodigal, Bakta, Prokka) append the * by default; for Prodigal, Prokka, and Bakta it is not even possible to deactivate the * as stop codon indicator. Thus, one cannot simply use the output from such annotation tools as input for AMPlify without first removing all *.
My feature request is thus, to have AMPlify accept sequences with stop codon indicator and remove the asterisk internally if necessary.
Minimum reproducible example:
- Download this FASTA file: amplify-failed-genes.faa.gz (contains two sequences: one too long and one with
*)
zcat amplify-failed-genes.faa.gz > amplify-failed-genes.faa
AMPlify -s amplify-failed-genes.faa
I'll link another issue where this behaviour was observed.