Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provided example bams are invalid #8

Open
yfarjoun opened this issue Dec 3, 2021 · 2 comments
Open

Provided example bams are invalid #8

yfarjoun opened this issue Dec 3, 2021 · 2 comments

Comments

@yfarjoun
Copy link

yfarjoun commented Dec 3, 2021

Hello,

using samtools v1.14 the bam supplied in the example seems to be invalid. Running the example code results in the following error:

[E::sam_hdr_sanitise] Malformed SAM header at line 2
samtools index: failed to create index for "example/data/control_scramble_3_unique_kcnq2.bam"

when manually inspecting the bam it seems that there's a missing @RG tag:

$ zless zless example/data/control_scramble_3_unique_kcnq2.bam   
BAM,@HD	VN:1.3	SO:coordinate
ID:control_scramble_3	PL:Illumina
@SQ	SN:chr1	LN:248956422
@SQ	SN:chr2	LN:242193529
<SNIP>
@yfarjoun
Copy link
Author

yfarjoun commented Dec 3, 2021

Looking at the PG line it seems that sugin the bam generation star aligner was run with the --outSAMheaderHD ID:control_scramble_2 PL:Illumina instead of using the --outSAMheaderRG argument....

from the star aligner PDF:

--outSAMattrRGline
default: -
    string(s): SAM/BAM read group line. The first word contains the read group
    identifier and must start with ”ID:”, e.g. –outSAMattrRGline ID:xxx CN:yy
    ”DS:z z z”.
    xxx will be added as RG tag to each output alignment. Any spaces in the tag
    values have to be double quoted.
    Comma separated RG lines correspons to different (comma separated) input
    files in –readFilesIn. Commas have to be surrounded by spaces, e.g.
    –outSAMattrRGline ID:xxx , ID:zzz ”DS:z z” , ID:yyy DS:yyyy

--outSAMheaderHD
    default: -
    strings: @HD (header) line of the SAM header

@yfarjoun yfarjoun changed the title provided example bam is invalid Provided example bams are invalid Dec 3, 2021
@yfarjoun
Copy link
Author

yfarjoun commented Dec 3, 2021

I can fix the bam files with a samtools reheader command:

e.g.:

zless example/data/TDP43_knockdown_1_unique_kcnq2.bam   | \
  sed -n '1{s/^.*@HD/@HD/p}; 2{s/^/@RG\t/p}; 3,/@CO/p' > TDP43_knockdown_1.header.sam
samtools reheader TDP43_knockdown_1.header.sam example/data/TDP43_knockdown_1_unique_kcnq2.bam > temp.bam
mv temp.bam example/data/TDP43_knockdown_1_unique_kcnq2.bam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant