chromosomal sequence could not be extracted error #68

MichaelMW · 2016-10-14T21:36:08Z

I'm using a customized reference sequence.

When running bismark, I'm getting tons of error like "Chromosomal sequence could not be extracted for ... BIA 1"

Is it because my reference file is too short?
One example entry looks like this:

BIA
TTGGAGTTGAGTGGGTTGATGTAGGGTGTGACTAGTCGCCTGAGTGCCCAAAAGACGCCCTCAAGGACAGGGAAGCAGCTGCGATTCACTCAGCGATCACCAGCGCCCAGAACACCGTTTTTCAGGGCCGCCCAAGCGGAGGAGCAACACGTGGCCCAGCAAAAGGAGCTGGGTCCCGTGCAGCACAGCAACAAGAAACTGGCCTAGGAAGACGCTGGAGCGTCTTTCTTTCGCACGGAATTCAAGAGGGAGGAGGAGTAGGAAGGATTAGGTG

I'm using the default bismark_genome_preparation and bismark

bismark_genome_preparation --path_to_bowtie ~/tools/bowtie2-2.2.9/ --verbose myRef

Attached is my reference, as well as the bismark log.

myRef.fasta.txt
1.log.txt

Thanks in advance!

Michael

The text was updated successfully, but these errors were encountered:

FelixKrueger · 2016-10-14T22:30:45Z

Hi Michael,

You normally only see this warning message very rarely whenever a sequence aligns right to the vert end (or start for reverse sequences) of a reference genome, and typically this is on the MT. The error arises because Bismark would like to extract 2 bp further downstream of the sequence to determine the cytosine context but this is obviously not possible when there are no further 2bp available. If you are using very short sequences, such as amplicons, you may get these sequences a lot more frequently.

As a quick fix you could pad the sequences in your genome file with 2 bases of your liking on either side, e.g.:
>BIA NNTTGGAGTTGAGTGGGTTGATGTAGGGTGTGACTAGTCGCCTGAGTGCCCAAAAGACGCCCTCAAGGACAGGGAAGCAGCTGCGATTCACTCAGCGATCACCAGCGCCCAGAACACCGTTTTTCAGGGCCGCCCAAGCGGAGGAGCAACACGTGGCCCAGCAAAAGGAGCTGGGTCCCGTGCAGCACAGCAACAAGAAACTGGCCTAGGAAGACGCTGGAGCGTCTTTCTTTCGCACGGAATTCAAGAGGGAGGAGGAGTAGGAAGGATTAGGTGNN

If you use NN you make sure that the sequence context of terminal Cs would be found as Unknown which does not obfuscate further downstream events. I hope this helps,
Best, Felix

MichaelMW · 2016-10-14T22:34:38Z

Fixed! Thanks again!

FelixKrueger · 2016-10-14T22:36:52Z

Wow that was quick! Thanks, Felix

sansense · 2016-12-22T19:49:58Z

Dear Felix,
Did you suggest **NN** to pad the sequences (vs. NN) or is it just a formatting issue of github? What is the meaning of '**' in the former case?

I am doing amplicon sequencing and facing the same problem. I tried padding NN, and it worked for some sequences but then it didn't for some other. After using "**NN**", it works for many of the other sequences, however, it still shows the error "chromosomal sequence could not be extracted error". Could you elaborate further on this? My specific Q: is this error specific to NOT getting the cytosine context at the end of reference, or can it also occur in other case like when the read only partially maps to reference (and so there is not enough useful sequence remaining)?

FelixKrueger · 2016-12-22T20:03:49Z

Hi sansense, the ** ** is indeed the markdown way of making things bold, but apparently this doesn't work in a code block. I have therefore edited it out of the previous comment.

And yes this should only appear for when sequences align to the very edges of chromosomes and/or scaffolds (as Bismark does not perform soft-clipping). When you say you have padded your sequences, did you do this on the start and end?

sansense · 2016-12-22T20:04:42Z

Yes, I did it at both start and the end, but still get the same error

sansense · 2016-12-22T20:09:10Z

Since Bismark doesnt do soft clipping, do you suggest to pad more than 2-bases? Optimally how many? My read length is 301

FelixKrueger · 2016-12-22T20:18:05Z

The padding at short sequences really only affects the context calling. If your sequences or reads are too long in general they should simply not align at all. Maybe you should go and check in a bit more detail for one or two sequences what is going on, Bismark should report the ID of the sequence so you can grep for the sequence in the FastQ file (e.g. using zcat file.fastq.gz | grep -A 3 readID > test.fastq), and then align this sequence on its own (or using Bowtie2 against the Bismark converted genome sequence). I hope this helps.

MichaelMW closed this as completed Oct 14, 2016

FelixKrueger mentioned this issue Feb 27, 2017

Bismark alignment - Chromosomal sequence could not be extracted for --- question #94

Closed

jrmerritt6 mentioned this issue Oct 11, 2017

Cannot map PEAR merged reads #134

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chromosomal sequence could not be extracted error #68

chromosomal sequence could not be extracted error #68

MichaelMW commented Oct 14, 2016 •

edited

FelixKrueger commented Oct 14, 2016 •

edited

MichaelMW commented Oct 14, 2016

FelixKrueger commented Oct 14, 2016

sansense commented Dec 22, 2016 •

edited

FelixKrueger commented Dec 22, 2016

sansense commented Dec 22, 2016

sansense commented Dec 22, 2016

FelixKrueger commented Dec 22, 2016

chromosomal sequence could not be extracted error #68

chromosomal sequence could not be extracted error #68

Comments

MichaelMW commented Oct 14, 2016 • edited

FelixKrueger commented Oct 14, 2016 • edited

MichaelMW commented Oct 14, 2016

FelixKrueger commented Oct 14, 2016

sansense commented Dec 22, 2016 • edited

FelixKrueger commented Dec 22, 2016

sansense commented Dec 22, 2016

sansense commented Dec 22, 2016

FelixKrueger commented Dec 22, 2016

MichaelMW commented Oct 14, 2016 •

edited

FelixKrueger commented Oct 14, 2016 •

edited

sansense commented Dec 22, 2016 •

edited