Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More consistent coordinate specification for Maf alignment access in MafIO.py #504

Closed
wants to merge 24 commits into from

Conversation

blaiseli
Copy link
Contributor

@blaiseli blaiseli commented Apr 3, 2015

Trying to fuse #503 with #350

I had started working with polyatail version of MafIO.py, but after making my pull request today, I discovered that another one existed based on a more thorough re-working of polyatail's code. So I tried to fuse both. I'm not an experienced git user, so I hope it's not too much of a mess.

I tried to make the MAF parser more internally consistent with respect to start end end coordinates and checked that the alignments returned by MafIndex.search and MafIndex.get_spliced were correct.

I show a test in message commit 4be9fd1

Maybe it's better to reproduce it here:


Fixed coordinate specifications inconsistences.

The coordinate specification system was inconsistent, as the following test shows:

The test.maf file ends with the following two blocks:

a score=-11454.000000
s hg19.chrY              13282 34 +  59373566 CCCCTCA-------------CCTTGACCCT------------CCCATTCTGCCCCACCT
s ponAbe2.chrX_random  9952046 34 +  10445060 CCTCTCA-------------CCTTGACCCT------------CCCATTCTGCCGCACCT
i ponAbe2.chrX_random C 0 C 0
s nomLeu3.chrX            1654 34 + 141252148 CCTCTCG-------------CCTTGACCCT------------CCCATTCTGCCCCACCC
i nomLeu3.chrX        C 0 C 0
s falChe1.KB397537        3106 24 -      4194 cccccca------------------agtcc------------cacttttcccca-----
i falChe1.KB397537    C 0 C 0
s ficAlb2.chr10       19660605 51 +  21346708 CTCTtcagggatgtgctggtgaGTGAGCGC---GGCCGGGGACCGGGCCAGCCG-----
i ficAlb2.chr10       C 0 C 0
s taeGut2.chr26        1809911 37 -   4907541 CTTCCCA-----GTGCCTCTCTGTGAATCC------------CAGCACCCTCCA-----
i taeGut2.chr26       C 0 C 0
s pseHum1.KB221448      102653 24 -    117795 cccccca------------------aatct------------gggacccctccc-----
i pseHum1.KB221448    C 0 C 0
s latCha1.JH129912       91397 46 -    184544 CCCCCCG-------------CTGTGCCCTTATGAACCGAGAACACCCCCTGCTACACCC
i latCha1.JH129912    N 0 C 0

a score=-43159.000000
s hg19.chrY               13316 51 +  59373566 ----------GTCAGGATCACAAGGACCCCCAGATCAGCA----GATGGGAACCGGACC------AAAAAG
s ponAbe2.chrX_random   9952080 51 +  10445060 ----------GTCAGGATCACAAGGACCCCCAGCTCAGCA----GATGGGAACCGGACC------AAAAAG
i ponAbe2.chrX_random C 0 C 0
s nomLeu3.chrX             1688 51 + 141252148 ----------GTCAGGATCACAAGGACCCCAAGCTCAGCA----AATGGCAAACGGACC------AAAAAG
i nomLeu3.chrX        C 0 C 0
s monDom5.chr2        536305954 43 + 541556283 ----------GTTCAGAGCAAAAGCACTGCCGGCTCCACG----GCCAGGGACTGGG--------------
i monDom5.chr2        N 0 C 0
s falChe1.KB397537         3130 12 -      4194 ----------caaaactcccct-------------------------------------------------
i falChe1.KB397537    C 0 C 0
s ficAlb2.chr10        19660656 39 +  21346708 ----------GGGGGAGTCACCGGGATCCCTGGG----------GATGTCACCCAGACC------------
i ficAlb2.chr10       C 0 I 7
s taeGut2.chr26         1809948 46 -   4907541 ----------ACGGCAGCTACCCCTACACCACAGTCACCAC---AGTGCAACCCAGTGC------------
i taeGut2.chr26       C 0 T 831
s pseHum1.KB221448       102677 53 -    117795 ----------caaaccgtgacccc--tccccaaatcagGATTGGGGtgggaccccctccccaaac------
i pseHum1.KB221448    C 0 C 0
s latCha1.JH129912        91443 49 -    184544 TTATGAACCAGGAACAACCAACCCTGCAGCGCACTTACAA----ACCAACAAC------------------
i latCha1.JH129912    C 0 C 0

Depending on how coordinates have to be specified, we can expect various
behaviour from MafIndex.search. But what is observed with the previous
implementation seems incoherent.

Tests using commit 02120d1:

>>> from Bio.AlignIO.MafIO import MafIndex
>>> idx = MafIndex("test.mafindex", "test.maf", "hg19.chrY")
>>> len(list(idx.search([13314], [13315])))
1
>>> len(list(idx.search([13315], [13316])))
2
>>> len(list(idx.search([13316], [13317])))
2
>>> len(list(idx.search([13317], [13318])))
1
>>> len(list(idx.search([13317], [13317])))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/data/home/herve.seitz/.local/lib/python2.7/site-packages/Bio/AlignIO/MafIO.py", line 401, in search
    raise ValueError("Exon coordinates invalid (%s >= %s)" % (exonstart, exonend))
ValueError: Exon coordinates invalid (13317 >= 13317)

If start == end, then there is an error. So we whould assume that
one-character colums have to be specified with end = start + 1

But the with start, end = 13315, 13316 and start, end = 13316, 13317, two
alignment blocks are yielded.

Tests using the present commit:

>>> from Bio.AlignIO.MafIO import MafIndex
>>> idx = MafIndex("test.mafindex", "test.maf", "hg19.chrY")
>>> len(list(idx.search([13314], [13315])))
1
>>> len(list(idx.search([13315], [13316])))
2
>>> len(list(idx.search([13316], [13317])))
1
>>> len(list(idx.search([13317], [13318])))
1
>>> len(list(idx.search([13317], [13317])))
1
>>> list(idx.search([13316], [13316]))
[<<class 'Bio.Align.MultipleSeqAlignment'> instance (9 records of length 71, SingleLetterAlphabet()) at 2085410>]
>>> list(idx.search([13315], [13315]))
[<<class 'Bio.Align.MultipleSeqAlignment'> instance (8 records of length 59, SingleLetterAlphabet()) at 2085390>]

Now we allow start and end to be the same, which specifies a single position.


I will probably close my other pull request

@peterjc
Copy link
Member

peterjc commented Apr 3, 2015

Is the full test file available?

@blaiseli
Copy link
Contributor Author

blaiseli commented Apr 6, 2015

I generated the test file as follows:

wget "http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz100way/maf/chrY.maf.gz
zcat chrY.maf.gz > chrY.maf
head -111 chrY.maf > test.maf

@adamnovak
Copy link
Contributor

That sounds like a good test case to have. Can you make this PR pass the Travis tests?

@blaiseli
Copy link
Contributor Author

blaiseli commented Apr 8, 2015

Is the mafindex rebuilt anew during the Travis tests ?

@blaiseli
Copy link
Contributor Author

blaiseli commented Apr 8, 2015

Reading the logs, I realize that some of my changes of .items() to .iteritems() and range() to xrange() actually revert some changes that were made by polyatail on purpose to improve compatibility with python 3. Since I'm not sure my changes actually improve code efficiency, I will revert to the python 3 compatible version.

@blaiseli
Copy link
Contributor Author

blaiseli commented Apr 8, 2015

It seems that removing the mafindex files present in Tests/MAF enables the Travis tests to go further. Now it fails as follows:

FAIL: test_correct_retrieval_1 (test_MafIO_index.TestSearchGoodMAF)

----------------------------------------------------------------------

Traceback (most recent call last):

File "/home/travis/build/biopython/biopython/Tests/test_MafIO_index.py", line 283, in test_correct_retrieval_1

self.assertEqual(len(results), 12)

AssertionError: 10 != 12

results is obtained from a self.idx.search((3014742, 3018161), (3015028, 3018644))

I looked in the maf file, and find the following potentially concerned blocks:

block start size
1 3014742 36
2 3014778 17
3 3014795 47
4 3014842 186
5 3015028 58
6 3015086 2572
7 3017658 85
8 3017743 418
9 3018161 69
10 3018230 129
11 3018359 123
12 3018482 162
13 3018644 178

For each (exonstart, exonend) pair (in this case (3014742, 3015028) and (3018161, 3018644)), the blocks are retrieved using the following query:

        result = con.execute("SELECT DISTINCT start, end, offset FROM "
                 "offset_data WHERE bin IN (%s) AND (end BETWEEN %s AND %s "
                 "OR %s BETWEEN start AND end) ORDER BY start, end, "
                 "offset ASC;" \
                 % (possible_bins, exonstart, exonend, exonend))

BETWEEN x AND y is inclusive, so we expect to retrieve blocks 1-5 for the first (exonstart, exonend) pair, and blocks 9-13 for the second pair, which makes 10 elements in results.
Why would the test expect 12 elements?

@peterjc
Copy link
Member

peterjc commented Apr 8, 2015

TravisCI starts from a clean VM, so there should be no pre-existing stale indexes (unless they are explicitly checked into git's version tracking). In general our tests should ideally use temp files for the indexes (to ensure unique) or explicitly delete any pre-existing indexes if you want to test with a specific filename. See os.remove(...) entries in Tests/test_SeqIO_index.py for examples.

@blaiseli
Copy link
Contributor Author

I modified the tests to use the "zero-based and inclusive" coordinate specification.
The code now passes the Travis tests.

I'm still not fully sure that a "zero-based and inclusive" coordinate specification is a good thing: It is neither the "human-readable" way ("1-based and inclusive" coordinates), neither the bedtools / "python slice" way.

I decided to do "zero-based" because MAF format is zero-based. But I decided to make the end coordinate inclusive, because it is not very intuitive for the human user to have to provide a list of "inclusive" start coordinates, but "exclusive" end coordinates when calling MafIndex.search and MafIndex.get_spliced.

Do you have an opinion on the issue?

@polyatail
Copy link
Contributor

Thank you for your work on this branch!

IMO since there is no standardized coordinate specification in sequence data, it doesn't really matter. The user of any script and data format must bear the burden of ensuring they are taking into account different base systems. I think this is fine.

I'd like to again voice my opinion that this pull request be merged. I am regularly contacted by users wondering when this will happen.

@peterjc
Copy link
Member

peterjc commented Sep 30, 2015

Do you have a strong reason not to follow a more Python-slicing like coordinate system?

@adamnovak
Copy link
Contributor

I would also like this to be merged.

On Wed, Sep 30, 2015 at 2:03 AM, Peter Cock notifications@github.com
wrote:

Do you have a strong reason not to follow a more Python-slicing like
coordinate system?


Reply to this email directly or view it on GitHub
#504 (comment).

@polyatail
Copy link
Contributor

Good point Peter. Maybe blaiseli can comment on why it was necessary to change from 0-based, end exclusive, which is both the UCSC spec and Pythonic?

The test case at the beginning of this pull request makes sense. An error should be thrown when start and end coordinates are the same.

Adam, #350 seems to not pass the build tests. Any idea why?

@adamnovak
Copy link
Contributor

350 is probably a victim of over-eager Github merging. The Travis tests
fail thusly:


Output : " Checking can write/read as 'nexus' format"

Expected: ' Failed: Identifiers in each MultipleSeqAlignment must be unique'

There's that file of expected test output, and if it doesn't contain
exactly the right output for all the tests in the right order, the tests
fail. When the tests change, this file needs to be updated with what the
tests will produce, and when those changes are merged together, that file
needs to have its lines merged in the right order.

I bet the tests were re-run against a new master commit, the results file
was merged automatically and wrongly, and a mismatch was detected. To
resolve it, the thing probably has to be rebased against master, with the
test output manually merged.

On Wed, Sep 30, 2015 at 2:48 PM, Andrew Sczesnak notifications@github.com
wrote:

Good point Peter. Maybe blaiseli can comment on why it was necessary to
change from 0-based, end exclusive, which is both the UCSC spec and
Pythonic?

The test case at the beginning of this pull request makes sense. An error
should be thrown when start and end coordinates are the same.

Adam, #350 #350 seems to not
pass the build tests. Any idea why?


Reply to this email directly or view it on GitHub
#504 (comment).

@blaiseli
Copy link
Contributor Author

blaiseli commented Oct 1, 2015

As I wrote in an earlier comment:

I decided to do "zero-based" because MAF format is zero-based. But I decided to make the end coordinate inclusive, because it is not very intuitive for the human user to have to provide a list of "inclusive" start coordinates, but "exclusive" end coordinates when calling MafIndex.search and MafIndex.get_spliced.

So these are not at all "strong" reasons not to use classic python slicing coordinates.

When I find some time, I can work on switching to the python way.

@polyatail
Copy link
Contributor

I would disagree--I think it's intuitive for the human user to have consistency across the BioPython package and within Python.

@blaiseli
Copy link
Contributor Author

I've been looking back at the code of MafIO.py, and I would like to make a point in favour of using inclusive coordinates, at least internally.

The MafIndex.search method performs overlap tests in SQL using the BETWEEN operator, which is "inclusive". It is easier (at least for me) to reason using inclusive coordinates when trying to see if the code is correct.

Let's imagine the coordinates were end-exclusive. How do we handle a case where a query segment (exonstart, exonend) ends just the position before an indexed alignment block (start, end)?

We have exonend == start, but the segment does not overlap the alignment block. If we naively use these end-exclusive coordinates in the BETWEEN operator, we erroneously infer an overlap.

A similar problem happens when the start of a query segment begins just the position after the end of an alignment block. Therefore, we will somehow have to shift some coordinates somewhere for the code to be correct, or use a different test in the SQL query.

I think it is less error-prone and more readable to use inclusive coordinates as "early" as possible. I mean: we should avoid coordinates "conversions" deep in the code. It's easier to ensure the coordinates are in the correct system when reading or writing MAF format.

If my intuition is correct, the code for determining which bins a segment belongs to also uses inclusive start and end coordinates internally (see _region2bin, _ucscbin, and http://genomewiki.ucsc.edu/index.php/Bin_indexing_system.
I didn't dare modify this part of the code, but in doing so, I probably have neglected an important point when trying to make coordinate system coherent.

Another point I would like to make is the following: I don't think that asking the user to provide end-inclusive coordinates will bring much consistency with respect to python syntax in general. The get_spliced and search methods do not use list slice syntax anyway: they ask for lists of start and end positions.
List-like interface comes with the MultipleSeqAlignment that these two methods return.

Maybe other Biopython parts also deal with lists of starts and ends. In this case, I agree that the user interface should be consistent regarding coordinates inclusive- or exclusiveness.

I hope the above comments are not too badly phrased.

@blaiseli
Copy link
Contributor Author

I just added comments and modified the end coordinates in the _region2bin and _ucscbin.
This results in a new commit that fails to pass tests as follows:

ERROR: runTest (__main__.ComparisonTestCase)

test_SeqIO

----------------------------------------------------------------------

Traceback (most recent call last):

  File "run_tests.py", line 343, in runTest

    line_number))

ValueError: 

Output  : u" Checking can write/read as 'maf' format"

Expected: " Checking can write/read as 'nexus' format"

/home/travis/build/biopython/biopython/Tests/output/test_SeqIO line 1800

I don't know how these tests work, but it looks like if some expected results were hard-coded and no test with "maf" format was anticipated. A test using this format may have been automatically activated and inserted before the test for "nexus".

Strange that this did not happen before.

@peterjc
Copy link
Member

peterjc commented Nov 26, 2015

That's a print-and-compare style test, use run_tests.py -g test_SeqIO to regenerate the expected output file output/test_SeqIO which ought to have been updated if necessary in the pull request. See also the chapter on the test framework in the Biopython tutorial.

@blaiseli
Copy link
Contributor Author

I regenerated Tests/output/test_SeqIO as you suggested, but this file doesn't seem to be under version control. If I understand correctly, it is normal that this file is not under version control since it is supposed to be generated automatically. But then:

  1. Why wasn't it updated ?
  2. How am I suppose to get the updated version into my pull request ?

@peterjc
Copy link
Member

peterjc commented Nov 30, 2015

Strange - it is and should be under version control: https://github.com/biopython/biopython/blob/master/Tests/output/test_SeqIO

You would need to explicitly include the changes to Tests/output/test_SeqIO in your commit, e.g.

git add Tests/output/test_SeqIO
git commit -m "Expected output from test_SeqIO.py has changed"

@blaiseli
Copy link
Contributor Author

Actually, I was mistaken when I wrote that the file was not under version control. What appears to have happened is that run_tests.py -g test_SeqIO did not induce any changes in the expected output: there was already a line for maf format before the line for nexus format (lines 1800 and 1801 of https://github.com/blaiseli/biopython/blob/alignio-maf/Tests/output/test_SeqIO).
So nothing showed up when I then ran git status. Hence my error.

So I'm still clueless as to how to make the tests pass.

@blaiseli
Copy link
Contributor Author

In an attempt to force another try of the Travis tests, I tried to rebase on the more recent commits of the main biopython, based on explanations found here https://github.com/edx/edx-platform/wiki/How-to-Rebase-a-Pull-Request, but I'm lost: my last commits do not seem to appear in the log any more, however, I can see that the modifications I had made indeed have been taken into account. Sorry for the mess.

@adamnovak
Copy link
Contributor

I see your commits. Did you want something after "Expected output from test_SeqIO.py has changed."? Maybe git reflog on your end will let you find the commit hash from before the rebase so you can track down what went wrong, if anything?

@blaiseli
Copy link
Contributor Author

blaiseli commented Dec 3, 2015

The commit "Expected output from test_SeqIO.py has changed" was done after rebasing.

git reflog allowed me to get back to the commit I thought I had done before rebasing: "0bb1f58 Added comments, adjusted bin-related code.".
The Bio/AlignIO/MafIO.py file in its present state is OK with respect to what it was in that commit. I think it is because I had to manually merge this file during rebasing, and guided myself with a copy I had made from 0bb1f58 to be sure I was merging correctly.

So to summarize: the MafIO.py file is as it should, but the history of its modifications may be messy due to manual merging during rebasing.

@peterjc
Copy link
Member

peterjc commented Dec 3, 2015

Don't worry about the history too much: I could probably squash down some of this as part of any merge/rebase to the master.

@adamnovak
Copy link
Contributor

So what outstanding work is there to do on this? Has a coordinate system
been settled on?

On Thu, Dec 3, 2015 at 4:06 AM, Peter Cock notifications@github.com wrote:

Don't worry about the history too much: I could probably squash down some
of this as part of any merge/rebase to the master.


Reply to this email directly or view it on GitHub
#504 (comment).

@blaiseli
Copy link
Contributor Author

Regarding coordinates, I presented some arguments for using end-inclusive coordinate in earlier comments: #504 (comment) and #504 (comment) (where I give reasons why I don't see it as inconsistent).

The current version of the code uses zero-based end-inclusive coordinate both internally (including bin determination, now) and in its interface with the user. get-spliced and search methods take lists of segments starts and ends as arguments. I personally would think it somewhat awkward to provide the end coordinates as a list of "one after the last kept position".

There's no list-like interface where slice syntax with end-exclusive coordinates would be required, that's why I don't consider the current implementation inconsistent with respect to the python way.

Feel free to argue against my point of view if you really think the way I decided to treat coordinates is a problem.

@MattDMo
Copy link

MattDMo commented Jan 10, 2017

So what is the status of this PR? I ran across this on the website, which cites a forked branch that hasn't been updated since 2012. Tests seem to be passing, so what's the hold-up?

@blaiseli
Copy link
Contributor Author

blaiseli commented Feb 24, 2017

I reverted from the "avoid dots" to the normal method call approach, and also fixed more issues (some of which I had introduced when trying to remove conflicts using the online editor).

There is still at least one issue in the tests. test_old_file_not_found wants a IOError, but at the moment, the MafIO.py code raises a ValueError.

@blaiseli
Copy link
Contributor Author

I'm reading back the commits before merge and conflict resolutions, and I now understand the reason for the error type inconsistency. I will review more of the earlier commits to check that the present version is as intended by the other contributors.

@peterjc
Copy link
Member

peterjc commented Feb 24, 2017

Error type change was a8cf8c0 and 06b6cd6

- from commit 49e7da7:

    Remove unwanted white space after docstrings (etc)

    $ pydocstyle Bio/ BioSQL/ Tests/ Scripts/ Doc/ --select D202
    ...
    D202: No blank lines allowed after function docstring

- from commit a8cf8c0:

    Avoid ValueError for file not found

    Aim here was to be user friendly, but a file-system relevant
    exception is probably better for error handling.
@@ -123,7 +121,6 @@ def write_alignment(self, alignment):
Writes every SeqRecord in a MultipleSeqAlignment object to its own
MAF block (beginning with an 'a' line, containing 's' lines)
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a deliberate change to follow PEP257 docstring style guidelines as now enforced in TravisCI with pydocstyle - see 49e7da7

else:
assert str(r1.seq) == str(r2.seq), \
"Seq does not match %s vs %s (%s vs %s)" \
% (r1.seq, r2.seq, r1.id, r2.id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code block is not needed anymore, slightly further up we have:

assert str(r1.seq) == str(r2.seq)

peterjc pushed a commit to peterjc/biopython that referenced this pull request Mar 8, 2017
Based on squashed commit of pull request biopython#504, with edits by
Peter (mostly leaving out a few changes, and to line wrapping).

This should have *NO* functional change.
peterjc pushed a commit that referenced this pull request Mar 8, 2017
Based on squashed commit of pull request #504, with edits by
Peter (mostly leaving out a few changes, and to line wrapping).

This should have *NO* functional change.
@peterjc
Copy link
Member

peterjc commented Mar 8, 2017

Most of the remaining changes have now been applied via pull request #1086 (including a fix for #1083).

There are still some potential changes to do with indexing boundaries (including the special case of asking for one column), for which @blaiseli is going to make a new pull request as discussed on #1086.

@peterjc peterjc closed this Mar 8, 2017
blaiseli added a commit to blaiseli/biopython that referenced this pull request Mar 9, 2017
The use of inclusive coodinates for the sqlite MAX index
is an attempt to fix issues discussed in the following pull requests:
biopython#504
biopython#1086

A test has been added in `test_MafIO_index.py` to check that the number
of
MAF alignment blocks returned when querying for a single position will
be 1 at the boundary between blocks (it could be 2 with the previous MAF
index, which is a bug).

The indices built with end-exclusive coordinates will not be compatible
with the present version.
blaiseli added a commit to blaiseli/biopython that referenced this pull request Mar 17, 2017
The use of inclusive coodinates for the sqlite MAX index
is an attempt to fix issues discussed in the following pull requests:
biopython#504
biopython#1086

A test has been added in `test_MafIO_index.py` to check that the number
of
MAF alignment blocks returned when querying for a single position will
be 1 at the boundary between blocks (it could be 2 with the previous MAF
index, which is a bug).

The indices built with end-exclusive coordinates will not be compatible
with the present version.
blaiseli added a commit to blaiseli/biopython that referenced this pull request Mar 27, 2017
The use of inclusive coodinates for the sqlite MAX index
is an attempt to fix issues discussed in the following pull requests:
biopython#504
biopython#1086

A test has been added in `test_MafIO_index.py` to check that the number
of
MAF alignment blocks returned when querying for a single position will
be 1 at the boundary between blocks (it could be 2 with the previous MAF
index, which is a bug).

The indices built with end-exclusive coordinates will not be compatible
with the present version.
MarkusPiotrowski pushed a commit to MarkusPiotrowski/biopython that referenced this pull request Oct 31, 2017
Based on squashed commit of pull request biopython#504, with edits by
Peter (mostly leaving out a few changes, and to line wrapping).

This should have *NO* functional change.
blaiseli added a commit to blaiseli/biopython that referenced this pull request Apr 9, 2018
The use of inclusive coodinates for the sqlite MAX index
is an attempt to fix issues discussed in the following pull requests:
biopython#504
biopython#1086

A test has been added in `test_MafIO_index.py` to check that the number
of
MAF alignment blocks returned when querying for a single position will
be 1 at the boundary between blocks (it could be 2 with the previous MAF
index, which is a bug).

The indices built with end-exclusive coordinates will not be compatible
with the present version.
blaiseli added a commit to blaiseli/biopython that referenced this pull request Apr 9, 2018
The use of inclusive coodinates for the sqlite MAX index
is an attempt to fix issues discussed in the following pull requests:
biopython#504
biopython#1086

A test has been added in `test_MafIO_index.py` to check that the number
of
MAF alignment blocks returned when querying for a single position will
be 1 at the boundary between blocks (it could be 2 with the previous MAF
index, which is a bug).

The indices built with end-exclusive coordinates will not be compatible
with the present version.
peterjc pushed a commit that referenced this pull request Apr 10, 2018
Squashed commit of pull request #1088.

The use of inclusive coodinates for the sqlite MAX index is an
attempt to fix issues discussed in the following pull requests:

#504
#1086

A test has been added in `test_MafIO_index.py` to check that the number
of MAF alignment blocks returned when querying for a single position will
be 1 at the boundary between blocks (it could be 2 with the previous MAF
index, which is a bug).

The indices built with end-exclusive coordinates will not be compatible
with the present version.

* Boundary tests with more than 2 columns.

* Tests that get_spliced gets correct sequences.

* Check only MAF files base names equality.

This enables loading an index from a different working directory than
the one from which it was built.

* Resolving index conflicts.
* Better index incompatibility error message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants