Skip to content
This repository has been archived by the owner on Jan 28, 2020. It is now read-only.

exome seq: Error parsing SAM header #6

Closed
johnbradley opened this issue Jun 26, 2017 · 4 comments
Closed

exome seq: Error parsing SAM header #6

johnbradley opened this issue Jun 26, 2017 · 4 comments

Comments

@johnbradley
Copy link
Contributor

When running the exome seq pipeline had this error on step job sort.
Seems to be related to the read_group_header.

Details of the error.

[Mon Jun 26 14:38:22 UTC 2017] picard.sam.SortSam done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=253231104
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. 
Problem parsing @PG key:value pair ID:1 clashes with ID:bwa. Line:
@PG ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:bwa mem -R @RG   ID:1    LB:LIBRARY  PL:illumina 
SM:sample1  PU:AB1234 -t 8 
/var/lib/cwl/stge2216d72-3f93-4c68-b9d8-0bcbc77abc9f/human_g1k_v37.fasta 
/var/lib/cwl/stg2c10d4eb-e5ae-4ca2-af5d-265b7e9c0960/SRR099967_1_first_100000_val_1.fq.gz
 /var/lib/cwl/stg3d122656-3166-4580-a1c469aef96956da/SRR099967_2_first_100000_val_2.fq.gz;
 ; Line number 86
    at 
htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:238)

If I change the ID to bwa in the read_group_header the error doesn't occur.

@dleehr
Copy link
Member

dleehr commented Jun 26, 2017

Trying to debug/diagnose. Command-line that runs is

[job sort] /home/ubuntu/bespin-cwl/data/exomeseq-01-preprocessing-cache/8edc5d76c8aa6619aa1b2b60276ed4bf$ docker \
    run \
    -i \
    --volume=/home/ubuntu/bespin-cwl/data/exomeseq-01-preprocessing-cache/8edc5d76c8aa6619aa1b2b60276ed4bf:/var/spool/cwl:rw \
    --volume=/tmp/tmpZBxMQd:/tmp:rw \
    --volume=/home/ubuntu/bespin-cwl/data/exomeseq-01-preprocessing-cache/c8bfecf40918361b1da6f945dd85712e/mapped.bam:/var/lib/cwl/stge2085ee2-2ffa-4855-9a23-7656e2fd722c/mapped.bam:ro \
    --workdir=/var/spool/cwl \
    --read-only=true \
    --user=1000 \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/var/spool/cwl \
    dukegcb/picard \
    java \
    -jar \
    /usr/picard/picard.jar \
    SortSam \
    INPUT= \
    /var/lib/cwl/stge2085ee2-2ffa-4855-9a23-7656e2fd722c/mapped.bam \
    OUTPUT= \
    sorted.bam \
    SORT_ORDER= \
    coordinate

@dleehr
Copy link
Member

dleehr commented Jun 26, 2017

See also lh3/bwa#83. The issue was that we added the tab-delimited read groups in the command-line for bwa mem. When bwa writes the bam file, it includes the command-line in the @PG tag. Since our command-line contained tabs, these tabs were later interpreted as key-value delimiters by picard when it tried to read the header.

The solution is to escape the backslashes in the CWL JSON for the read_group_header:

"read_group_header": "@RG\tID:1\tLB:LIBRARY\tPL:illumina\tSM:sample1\tPU:AB1234"

becomes

"read_group_header": "@RG\\tID:1\\tLB:LIBRARY\\tPL:illumina\\tSM:sample1\\tPU:AB1234"

The issue here was that bwa added a @PG (program) header with a CL field that contained tabs because the command-line contained tabs.

@dleehr
Copy link
Member

dleehr commented Jun 26, 2017

Fixed on exome-seq in 4f868d2

@dleehr
Copy link
Member

dleehr commented Aug 11, 2017

Fixed by #11 in aforementioned commit

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants