How to represent Inversions #6

pmelsted · 2015-07-27T10:06:08Z

This was raised in #3, how can the format represent inversions, or is this something we want.

Since there are 3 main use cases for GFA, assembly graphs, long reads and variation graphs it should be noted that inversions are only explicitly needed in variation graphs.

An assembler would naturally construct two contigs for an inverted segment and long reads would be unaffected.

What is annoying is that the inverted segments are not complemented, so this would mean we would need to come up with a new symbol or mechanism to denote this.

sjackman · 2015-07-27T23:38:22Z

I'm confused as to how DNA can become reversed without being complemented. Can you give me an example?

ekg · 2015-07-28T08:01:30Z

I am not completely clear how this is possible either.

That said we should not limit ourselves to things that we know are normal biologically. This would be like making a FASTA spec that says only biologically viable DNA sequences can be represented.

I don't think there is any problem for inversions if each node is like a virtual pair of nodes connected by a hidden link (we could imagine that this link carries the sequence label of the node). Then edges always come from one end and go to another.

You can also represent the deletion of a node without referring to its neighbors, which is very useful.

I also can't see how this would present a problem for other uses. The more simple and general we can keep things the less constraint the various uses will need to work around.

However if links have only + and - versions as they do now then we can't convey enough information to represent this.

Graphviz has dot format, which is generally able to represent any graph you can think up. It is also quite simple to make simple graphs. We should aim for this level of generality.

pmelsted · 2015-07-28T10:50:50Z

These inversions happen when the molecular machinery goes wrong and it's
usually bad for you.

I don't see a clean way of representing this without adding a new operator.
We could use ~ (tilde) to denote non-complemented, so ~+ and ~- would mean
... ugh.

Do you have a pointer to the ga4gh graph discussion about this?

sjackman · 2015-07-28T21:45:20Z

I do not believe that it is possible by natural mutation to reverse a DNA sequence without also complementing it. It is not helpful to design a file format to handle cases that are not physically possible.

ababaian · 2015-07-28T21:56:05Z

I've never heard of reverse-non-complement in vivo, chemically it makes no sense since it requires breaking each individual 5' to 3' bond and flipping it. The only time i've ever seen it used is a control when searching for low information regions within repetitive sequences.

A technical artifact which arises in silico though, that's easy to see.

pmelsted · 2015-07-28T22:10:10Z

From this diagram http://ghr.nlm.nih.gov/handbook/illustrations/inversion it looks like the region will be reverse-complemented.

sjackman · 2015-07-28T22:20:58Z

Yes, correct. Reverse and complemented. The - value of the orientation field indicates reversed and complemented.

ekg · 2015-07-29T08:27:15Z

I disagree that we should only design for things that are physically
possible. The graphs we are all working with have no natural chemical
basis. No genome will ever look like an overlap or de Bruijn graph, so a
design rule of this type would preclude everything we are doing. Maybe I am
taking the metaphor too far though :)

The use case that makes a lot of sense to me is describing the deletion of
an entire node. If we cannot describe which end edges go from and to then
this cannot be done in a node-local sense. You would need to add edges
between the inbound and outbound nodes where an intermediary has been
deleted and a path that skips it is required.

As for representing non complemented inversions, it seems correct that
another operator would be required to clarify this. I guess an extension of
the cigar concept would be sufficient? The reason for not duplicating these
as reverse complemented sequences is to enable non ambiguous alignment to
and annotation of the graph. With minor extensions to the exchange format
the inversion can be encoded in the graph without duplication.

@adamnovak, @benedictpaten, and @haussler have been strong proponents of
this idea and maybe could better clarify what I am describing.
On Jul 29, 2015 12:21 AM, "Shaun Jackman" notifications@github.com wrote:

Yes, correct. Reverse and complemented. The - value of the orientation
field indicates reversed and complemented.

—
Reply to this email directly or view it on GitHub
#6 (comment).

sjackman · 2015-07-29T16:44:11Z

You would need to add edges between the inbound and outbound nodes where an intermediary has been deleted and a path that skips it is required.

Yes, that's correct. A deletion is represented like so:
Path 11 is AAACCCATA
Path 12 is AAAATA

S 0 AAA
S 1 CCC
S 2 ATA
L 0 + 1 + 0M
L 0 + 2 + 0M
L 1 + 2 + 0M
P 11 0+,1+,2+ 0M,0M,0M
P 12 0+,2+ 0M,0M,0M

sjackman · 2015-07-29T16:47:01Z

I disagree that we should only design for things that are physically possible.

Biology has enough weirdness as it is. Let's prioritize first handling the cases that are physically possible.

pmelsted · 2015-07-30T00:19:00Z

Similarly for (RC)-inversion it can be represented directly

S 0 AAA
S 1 CCC
S 2 ATA
L 0 + 1 + 0M
L 0 + 1 - 0M
L 1 + 2 + 0M
L 1 - 2 + 0M
P 11 0+,1+,2+ 0M,0M,0M
P 12 0+,1-,2+ 0M,0M,0M

I think the case you are thinking of adding intermediate nodes does happen in de Bruijn graphs, but since you can specify 0M as overlap you don't need them here.

sjackman added the question label Jul 29, 2015

sjackman self-assigned this Aug 6, 2015

sjackman closed this as completed Aug 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to represent Inversions #6

How to represent Inversions #6

pmelsted commented Jul 27, 2015

sjackman commented Jul 27, 2015

ekg commented Jul 28, 2015

pmelsted commented Jul 28, 2015

sjackman commented Jul 28, 2015

ababaian commented Jul 28, 2015

pmelsted commented Jul 28, 2015

sjackman commented Jul 28, 2015

ekg commented Jul 29, 2015

sjackman commented Jul 29, 2015

sjackman commented Jul 29, 2015

pmelsted commented Jul 30, 2015

How to represent Inversions #6

How to represent Inversions #6

Comments

pmelsted commented Jul 27, 2015

sjackman commented Jul 27, 2015

ekg commented Jul 28, 2015

pmelsted commented Jul 28, 2015

sjackman commented Jul 28, 2015

ababaian commented Jul 28, 2015

pmelsted commented Jul 28, 2015

sjackman commented Jul 28, 2015

ekg commented Jul 29, 2015

sjackman commented Jul 29, 2015

sjackman commented Jul 29, 2015

pmelsted commented Jul 30, 2015