-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Hi,
Thanks for the useful tool. Would it be possible to add the option to modify the headers of the output fastq files? I can see in aligner.py that the (final) command for generating clean reads with paired-end data is:
f" | samtools fastq --threads 4 -c 6 -N -1 '{fastq1_out_path}' -2 '{fastq2_out_path}'"
This results in read headers in the form of "@<read_id>/1". It would be useful to have the option to output clean reads with the Illumina Casava 1.8 format entry, which is an option in samtools fastq:
-i add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
<truncated>
--barcode-tag TAG
Barcode tag [BC]
--quality-tag TAG
Quality tag [QT]
--index-format STR
How to parse barcode and quality tags
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
Minimally, the command could be:
f" | samtools fastq --threads 4 -c 6 -n -i --index-format "i*" -1 '{fastq1_out_path}' -2 '{fastq2_out_path}'"
This results in read headers in the form of "@<read_id> 1:N:0:0".
In my case this is useful because my previous method of host decontamination (bowtie2 ... --un-conc-gz id.unmapped.fastq.gz ...) resulted in read headers in that format, and hence downstream read manipulation is based on that format. However, I can understand if this might not be a priority to implement. If not, could there be an option to save a *.bam file with clean reads, allowing users to also extract reads to file as they see fit?
Thanks