Skip to content

Allow adding of Illumina Casava 1.8 format entry to fastq headers #41

@charlesfoster

Description

@charlesfoster

Hi,

Thanks for the useful tool. Would it be possible to add the option to modify the headers of the output fastq files? I can see in aligner.py that the (final) command for generating clean reads with paired-end data is:

f" | samtools fastq --threads 4 -c 6 -N -1 '{fastq1_out_path}' -2 '{fastq2_out_path}'"

This results in read headers in the form of "@<read_id>/1". It would be useful to have the option to output clean reads with the Illumina Casava 1.8 format entry, which is an option in samtools fastq:

  -i           add Illumina Casava 1.8 format entry to header (eg 1:N:0:ATCACG)
  <truncated>
  --barcode-tag TAG
               Barcode tag [BC]
  --quality-tag TAG
               Quality tag [QT]
  --index-format STR
               How to parse barcode and quality tags

      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]

Minimally, the command could be:

f" | samtools fastq --threads 4 -c 6 -n -i --index-format "i*"  -1 '{fastq1_out_path}' -2 '{fastq2_out_path}'"

This results in read headers in the form of "@<read_id> 1:N:0:0".

In my case this is useful because my previous method of host decontamination (bowtie2 ... --un-conc-gz id.unmapped.fastq.gz ...) resulted in read headers in that format, and hence downstream read manipulation is based on that format. However, I can understand if this might not be a priority to implement. If not, could there be an option to save a *.bam file with clean reads, allowing users to also extract reads to file as they see fit?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions