Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3 coordinate offsets don't make sense #112

Open
jblachly opened this issue Oct 1, 2021 · 3 comments
Open

GFF3 coordinate offsets don't make sense #112

jblachly opened this issue Oct 1, 2021 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@jblachly
Copy link
Member

jblachly commented Oct 1, 2021

When looking at this unit test:

unittest{
import dhtslib.coordinates;
auto rec = GTFRecord("chr1\tHAVANA\tgene\t11869\t14409\t.\t+\t.\tID \"ENSG00000223972.5\" ; gene_id ENSG00000223972.5 ; gene_id ENSG00000223972.5 ; gene_type transcribed_unprocessed_pseudogene ; gene_name DDX11L1 ; level 2 ; havana_gene OTTHUMG00000000961.2"); // @suppress(dscanner.style.long_line)
auto rec_neg= GTFRecord("chr1\tHAVANA\tgene\t11869\t14409\t.\t-\t.\tID \"ENSG00000223972.5\" ; gene_id ENSG00000223972.5 ; gene_id ENSG00000223972.5 ; gene_type transcribed_unprocessed_pseudogene ; gene_name DDX11L1 ; level 2 ; havana_gene OTTHUMG00000000961.2"); // @suppress(dscanner.style.long_line)
assert(rec.seqid=="chr1");
assert(rec.source=="HAVANA");
assert(rec.type=="gene");
assert(rec.start==11_869);
assert(rec.end==14_409);
assert(rec.score==-1.0);
assert(rec.strand()=='+');
assert(rec.phase==-1);
assert(rec["ID"] == "ENSG00000223972.5");
assert(rec["gene_id"] == "ENSG00000223972.5");
assert(rec["gene_type"] == "transcribed_unprocessed_pseudogene");
assert(rec["gene_name"] == "DDX11L1");
assert(rec["level"] == "2");
assert(rec["havana_gene"] == "OTTHUMG00000000961.2");
assert(rec.length == 2541);
assert(rec.relativeStart == 1);
assert(rec.relativeEnd == 2541);
// Test forward and backward offsets
assert(rec.coordinateAtOffset(2) == 11_870);
assert(rec_neg.coordinateAtOffset(2) == 14_408);

I was confused that an offset of 2 would translate from 11869 -> 11870

Looking at the code its clear that this is a mistake:

    /// Genomic coordinate at offset into feature, taking strandedness into account
    @property coordinateAtOffset(long offset) const
    {
        // GFF3 features are 1-based coordinates
        assert(offset > 0);
        offset--;   // necessary in a 1-based closed coordinate system
        
        // for - strand count from end; for +, ., and ? strand count from start
        immutable auto begin = (this.strand == '-' ? this.end : this.start);


        // for - strand count backward; for +, ., and ? strand count forward
        immutable auto direction = (this.strand == '-' ? -1 : 1);


        return OB(begin + (direction * offset));
    }

Agree that GFF3 is 1-based, and agree that GFF3 features are 1-based coordinates, but since you are dealing with OFFSETS (deltas), zero is still a reasonable input value.

Function should be revised to take >= 0 values, and should not offset--. Tests will then have to be updated.

@jblachly jblachly added the bug Something isn't working label Oct 1, 2021
@charlesgregory
Copy link
Contributor

Yeah all this code was from a while ago. When it was part of gff3d we used the offsets as 1-based (even though that doesn't really make sense). There are actually a couple more functions that are implemented incorrectly.

This should be OB(this.length - 1) since the end is inclusive in a one-based system.

@property relativeEnd() const { return OB(this.length); }

This should be this.coordinateAtOffset(0); since an offset should be zero-based.

@property coordinateAtBegin() const
{
    return this.coordinateAtOffset(1);
}

This should be this.length - 1 since the end is inclusive in a one-based system.

/// Genomic coordinate at end of feature, taking strandedness into account
@property coordinateAtEnd() const
{
    return this.coordinateAtOffset(this.length);
}

@charlesgregory
Copy link
Contributor

charlesgregory commented Oct 1, 2021

This should be this.length - 1 since the end is inclusive in a one-based system.

This is incorrect should be :

/// Genomic coordinate at end of feature, taking strandedness into account
@property coordinateAtEnd() const
{
    return this.coordinateAtOffset(this.length - 1);
}

@charlesgregory charlesgregory reopened this Oct 1, 2021
@charlesgregory
Copy link
Contributor

related to #114 and #115

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants