-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to represent Inversions #6
Comments
I'm confused as to how DNA can become reversed without being complemented. Can you give me an example? |
I am not completely clear how this is possible either. That said we should not limit ourselves to things that we know are normal biologically. This would be like making a FASTA spec that says only biologically viable DNA sequences can be represented. I don't think there is any problem for inversions if each node is like a virtual pair of nodes connected by a hidden link (we could imagine that this link carries the sequence label of the node). Then edges always come from one end and go to another. You can also represent the deletion of a node without referring to its neighbors, which is very useful. I also can't see how this would present a problem for other uses. The more simple and general we can keep things the less constraint the various uses will need to work around. However if links have only + and - versions as they do now then we can't convey enough information to represent this. Graphviz has dot format, which is generally able to represent any graph you can think up. It is also quite simple to make simple graphs. We should aim for this level of generality. |
These inversions happen when the molecular machinery goes wrong and it's I don't see a clean way of representing this without adding a new operator. Do you have a pointer to the ga4gh graph discussion about this? |
I do not believe that it is possible by natural mutation to reverse a DNA sequence without also complementing it. It is not helpful to design a file format to handle cases that are not physically possible. |
I've never heard of reverse-non-complement in vivo, chemically it makes no sense since it requires breaking each individual 5' to 3' bond and flipping it. The only time i've ever seen it used is a control when searching for low information regions within repetitive sequences. A technical artifact which arises in silico though, that's easy to see. |
From this diagram http://ghr.nlm.nih.gov/handbook/illustrations/inversion it looks like the region will be reverse-complemented. |
Yes, correct. Reverse and complemented. The |
I disagree that we should only design for things that are physically The use case that makes a lot of sense to me is describing the deletion of As for representing non complemented inversions, it seems correct that @adamnovak, @benedictpaten, and @haussler have been strong proponents of
|
Yes, that's correct. A deletion is represented like so:
|
Biology has enough weirdness as it is. Let's prioritize first handling the cases that are physically possible. |
Similarly for (RC)-inversion it can be represented directly
I think the case you are thinking of adding intermediate nodes does happen in de Bruijn graphs, but since you can specify |
This was raised in #3, how can the format represent inversions, or is this something we want.
Since there are 3 main use cases for GFA, assembly graphs, long reads and variation graphs it should be noted that inversions are only explicitly needed in variation graphs.
An assembler would naturally construct two contigs for an inverted segment and long reads would be unaffected.
What is annoying is that the inverted segments are not complemented, so this would mean we would need to come up with a new symbol or mechanism to denote this.
The text was updated successfully, but these errors were encountered: