Remap previously mapped reads to a new genome, preserving BAM tags #939

J-Moravec · 2020-06-11T01:06:39Z

I have an older mapped BAM file from previous 10x experiment, but I don't have the original reference genome, so I need to remap the reads for the GATK SVN discovery pipeline.

However, this is scRNAseq data and I would like to preserve the BAM tags.

I have used samtools fastq to get fastq files that I then fed them into STAR. I have tried to preserve the tags by specifying the -T option in samtools with list of the tags, which put the tags into the read header, but STAR seems to ignore the tags even when I tried to specify the --outSAMattributes.

Is there a way to remap the file while preserving the BAM tags?

Thanks

The text was updated successfully, but these errors were encountered:

alexdobin · 2020-06-12T20:15:59Z

Hi J-Moravec

you can use the BAM file as input to STAR directly, with
--readFilesType SAM SE --readFilesIn File.bam --readFilesCommand samtools view
This is for single-end reads - which is the case for 10X since the 2nd read is barcode and is recorded with SAM tags CR CY UR UY .
All the tags from the BAM file will be output to the new Aligned.out.bam file. This may create duplicate tags, so it may be advisable to filter out the unnecessary tags before mapping.

For 10X data specifically, you can also use their own bamtofastq converter which will re-create read1/2 FASTQs:
https://support.10xgenomics.com/docs/bamtofastq

Cheers
Alex

J-Moravec · 2020-06-12T22:52:06Z

Ah, that would save me so much time!

I am aware of bamtofastq, but I don't particularly understand how to feed the 3 different fastq files for each readgroup back to STAR. Could you help me here please?

Thanks,
Jirka

alexdobin · 2020-06-12T23:02:30Z

Hi Jirka,

I think there should be one fastq with cDNA read (~100b), one with barcode (26b or 28b) - those you feed to STAR. The 3rd one is probably the "index" read with Illumina library barcodes which is not really needed.

Cheers
Alex

J-Moravec · 2020-06-13T00:21:49Z

Thanks. You are much more hepfull than the 10x people themselves.

J-Moravec · 2020-06-24T05:02:51Z

Just to be clear @alexdobin , when using the two read options (read fastq and barcode fastq), I need to use the STARsolo?

alexdobin · 2020-06-25T14:49:01Z

Hi Jiří,

--readFilesIn generally inputs two read files. In the case of scRNA-seq, the first read should be the cDNA read, and the 2nd read - barcode read. For bulk RNA-seq, without STARsolo option, both reads are cDNA reads. So if you want to map scRNA-seq without STARsolo, you would supply only the cDNA read fastq.

Cheers
Alex

alexdobin added add to FAQ question with answer that should be added to FAQ question labels Jun 12, 2020

J-Moravec closed this as completed Jun 13, 2020

J-Moravec mentioned this issue Jun 24, 2020

Invalid and mismatch CIGAR after remapping BAM file #952

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remap previously mapped reads to a new genome, preserving BAM tags #939

Remap previously mapped reads to a new genome, preserving BAM tags #939

J-Moravec commented Jun 11, 2020

alexdobin commented Jun 12, 2020

J-Moravec commented Jun 12, 2020

alexdobin commented Jun 12, 2020

J-Moravec commented Jun 13, 2020

J-Moravec commented Jun 24, 2020 •

edited

Loading

alexdobin commented Jun 25, 2020

Remap previously mapped reads to a new genome, preserving BAM tags #939

Remap previously mapped reads to a new genome, preserving BAM tags #939

Comments

J-Moravec commented Jun 11, 2020

alexdobin commented Jun 12, 2020

J-Moravec commented Jun 12, 2020

alexdobin commented Jun 12, 2020

J-Moravec commented Jun 13, 2020

J-Moravec commented Jun 24, 2020 • edited Loading

alexdobin commented Jun 25, 2020

J-Moravec commented Jun 24, 2020 •

edited

Loading