Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remap previously mapped reads to a new genome, preserving BAM tags #939

Closed
J-Moravec opened this issue Jun 11, 2020 · 6 comments
Closed
Labels
add to FAQ question with answer that should be added to FAQ question

Comments

@J-Moravec
Copy link

I have an older mapped BAM file from previous 10x experiment, but I don't have the original reference genome, so I need to remap the reads for the GATK SVN discovery pipeline.

However, this is scRNAseq data and I would like to preserve the BAM tags.

I have used samtools fastq to get fastq files that I then fed them into STAR. I have tried to preserve the tags by specifying the -T option in samtools with list of the tags, which put the tags into the read header, but STAR seems to ignore the tags even when I tried to specify the --outSAMattributes.

Is there a way to remap the file while preserving the BAM tags?

Thanks

@alexdobin alexdobin added add to FAQ question with answer that should be added to FAQ question labels Jun 12, 2020
@alexdobin
Copy link
Owner

Hi J-Moravec

you can use the BAM file as input to STAR directly, with
--readFilesType SAM SE --readFilesIn File.bam --readFilesCommand samtools view
This is for single-end reads - which is the case for 10X since the 2nd read is barcode and is recorded with SAM tags CR CY UR UY .
All the tags from the BAM file will be output to the new Aligned.out.bam file. This may create duplicate tags, so it may be advisable to filter out the unnecessary tags before mapping.

For 10X data specifically, you can also use their own bamtofastq converter which will re-create read1/2 FASTQs:
https://support.10xgenomics.com/docs/bamtofastq

Cheers
Alex

@J-Moravec
Copy link
Author

Ah, that would save me so much time!

I am aware of bamtofastq, but I don't particularly understand how to feed the 3 different fastq files for each readgroup back to STAR. Could you help me here please?

Thanks,
Jirka

@alexdobin
Copy link
Owner

Hi Jirka,

I think there should be one fastq with cDNA read (~100b), one with barcode (26b or 28b) - those you feed to STAR. The 3rd one is probably the "index" read with Illumina library barcodes which is not really needed.

Cheers
Alex

@J-Moravec
Copy link
Author

Thanks. You are much more hepfull than the 10x people themselves.

@J-Moravec
Copy link
Author

J-Moravec commented Jun 24, 2020

Just to be clear @alexdobin , when using the two read options (read fastq and barcode fastq), I need to use the STARsolo?

@alexdobin
Copy link
Owner

Hi Jiří,

--readFilesIn generally inputs two read files. In the case of scRNA-seq, the first read should be the cDNA read, and the 2nd read - barcode read. For bulk RNA-seq, without STARsolo option, both reads are cDNA reads. So if you want to map scRNA-seq without STARsolo, you would supply only the cDNA read fastq.

Cheers
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
add to FAQ question with answer that should be added to FAQ question
Projects
None yet
Development

No branches or pull requests

2 participants