New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chromosomal sequence could not be extracted error #68
Comments
Hi Michael, You normally only see this warning message very rarely whenever a sequence aligns right to the vert end (or start for reverse sequences) of a reference genome, and typically this is on the MT. The error arises because Bismark would like to extract 2 bp further downstream of the sequence to determine the cytosine context but this is obviously not possible when there are no further 2bp available. If you are using very short sequences, such as amplicons, you may get these sequences a lot more frequently. As a quick fix you could pad the sequences in your genome file with 2 bases of your liking on either side, e.g.: If you use NN you make sure that the sequence context of terminal Cs would be found as Unknown which does not obfuscate further downstream events. I hope this helps, |
Fixed! Thanks again! |
Wow that was quick! Thanks, Felix |
Dear Felix, I am doing amplicon sequencing and facing the same problem. I tried padding NN, and it worked for some sequences but then it didn't for some other. After using |
Hi sansense, the ** ** is indeed the markdown way of making things bold, but apparently this doesn't work in a code block. I have therefore edited it out of the previous comment. And yes this should only appear for when sequences align to the very edges of chromosomes and/or scaffolds (as Bismark does not perform soft-clipping). When you say you have padded your sequences, did you do this on the start and end? |
Yes, I did it at both start and the end, but still get the same error |
Since Bismark doesnt do soft clipping, do you suggest to pad more than 2-bases? Optimally how many? My read length is 301 |
The padding at short sequences really only affects the context calling. If your sequences or reads are too long in general they should simply not align at all. Maybe you should go and check in a bit more detail for one or two sequences what is going on, Bismark should report the ID of the sequence so you can grep for the sequence in the FastQ file (e.g. using |
I'm using a customized reference sequence.
When running bismark, I'm getting tons of error like "Chromosomal sequence could not be extracted for ... BIA 1"
Is it because my reference file is too short?
One example entry looks like this:
I'm using the default bismark_genome_preparation and bismark
Attached is my reference, as well as the bismark log.
myRef.fasta.txt
1.log.txt
Thanks in advance!
Michael
The text was updated successfully, but these errors were encountered: