-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ema align output SAM parsing error #39
Comments
Hi @pontushojer, Couple quick questions:
|
Hi @arshajii, No worries!
I am running a pre-built version from conda, version 0.6.2 build h8b12597_1.
The datasets have been about 400-500 M read-pairs, so far I have had issues on about three of my dataset. I have so far been unable to generate a smaller dataset to replicate the issue. If I extract the read-pairs for barcodes surrounding the entry that causes the error in the full dataset, it completes without error. I will continue to try and generate a subset, as you say it would help narrowing this down. I can check about sending a full dataset... |
@arshajii I have now managed to generate a smaller subset that can recreate the issue. Running the following:
outputs this to the
If I skip the pipe to
As you see the cigars are Hope this helps to locate the issue! Subset: failing.fastq.gz |
Hi @arshajii. I was wondering if you have had the opportunity too look into this issue after I posted the subset? |
I have been running ema (version 0.6.2) on reads in the longranger basic FASTQ format (BX:Z in header). I pipe the output directly to
samtools sort
. My command looks something like this.Mostly this have been working fine but every other run have failed because
samtools sort
gets a parsing errorEither this one
Or this one
Both are related to the CIGAR so I think there is some formatting error here that creeps in every now and then. I looked SAM entries that caused the error in the one of the runs and found some strange things. Below are four lines, of which the third (marked in bold) is causing the error.
As you can see this SAM entry is just plain strange. It aslo contains a
^@
character for some reason. Interestingly the SEQ and QUAL strings for this entry does not below to the read with the QNAMEST-E00266:342:HYW32CCXY:1:1211:12439:69625:TTTGTTCCCTAAGTAACACG
instead it belongs to the first entry in my example namedST-E00266:342:HYW32CCXY:1:1212:14001:52854:AAAAAAAAAAAAAAAAAATG
. This is also the read-pair just before the one causing the error in my FASTQ.I have tried to replicate the error on a smaller subset of my data but have so far been unsuccessful. For example if i take the FASTQ entries corresponding to the failed SAM entries shown above I don't get any error. So somehow this only happens when running the full dataset.
Do you have any idea what could be causing this?
The text was updated successfully, but these errors were encountered: