-
Notifications
You must be signed in to change notification settings - Fork 0
unambiguous_codes
Replacing ambiguous codes in FASTA/FASTQ-files.
The genomic nucleotides are adenine, cytosine, guanine and thymine, denoted as A, C, G and T. When the identity of a nucleotide is uncertain, other characters can be used to denote them (see IUPAC Codes. However, many bioinformatics tools do not accept FASTA/FASTQ-files using these codes.
Ambiguous codes are replaced with a randomly selected allowed base.
The total number of replacements as well as the number of replacements per ambiguous code are reported.
Additionally, a "uncertainty" is reported which is calculated as:
usage: unambiguous_codes [-h] [-t THREADS] in_file out_file
Replacing ambigouity codes in FASTA/FASTQ with A,C,G or T.
positional arguments:
in_file
out_file
options:
-h, --help show this help message and exit
-t THREADS, --threads THREADS
Number of parallel threads [default: 1]