Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about the manual : Resulting de Bruijn graph and customer set explanation #183

Open
TsorngWeiWu opened this issue Aug 15, 2018 · 2 comments

Comments

@TsorngWeiWu
Copy link

(1)

I have read your manual and found that format of "de_bruijn_graph.dot" should be explained more detailed.

If I give a simple format : A.fasta : ATC B.fasta : AT and custom parameter set as the following

1
2 2

After executing this command : Sibelia -k paraset -m 2 A.fasta B.fasta , the content of "de_bruijn_graph.dot is

image

I know the color meaning but how come the other like

0->2?
1->0?
content in curly brackets??

(2)
this eaxaple in manaul
1st pair: ... K1 ABCD K2 ...
2nd pair: ... K1 FGHE K2 ...

If the distance between K1 and K2 within each pair is less than D, then "Sibelia" replaces FGHE with ABCD to obtain longer "synteny block":

1st pair: ... K1 ABCD K2 ...
2nd pair: ... K1 ABCD K2 ...

More concrete example. Suppose that K = 3, D = 5 and somewhere in the genome we find:

1st pair: ... act gaga ggc ...
2nd pair: ... act gatg ggc ...

As we see, distance between "act" and "ggc" is less than 5 nucleotides so we replace "gatg" by "gaga":

1st pair: ... act gaga ggc ...
2nd pair: ... act gaga ggc ...

My question is how come if there is another 3rd pair ... act gattag ggc ... ??

@iminkin
Copy link
Member

iminkin commented Aug 22, 2018

Hi,

If I give a simple format : A.fasta : ATC B.fasta : AT and custom parameter set as the following

The sequences in the input files should at least of length $k + 1$. My bad that Sibelia does not check for this, I will add it later. In your example the second file contains string of length 2.

I know the color meaning but how come the other like content in curly brackets??

The language used in the output is DOT, there are online manuals describing it: https://en.wikipedia.org/wiki/DOT_(graph_description_language)

My question is how come if there is another 3rd pair ... act gattag ggc ... ??

Then the bubbles will be simplified one by one. Supposed that in your example D is 7:

Initially:

act gaga ggc
act gatg ggc
act gattag ggc

First step:

act gaga ggc
act gaga ggc
act gattag ggc

Second step:

act gaga ggc
act gaga ggc
act gaga ggc

The choice of sequences to fill the branches of the bubbles is arbitrary, but it in this case it will result in the same synteny blocks no matter which branch is chosen.

@iminkin
Copy link
Member

iminkin commented Aug 22, 2018

I will think about how to improve the manual, thanks for your suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants