-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Necessary Updates to the Mapper code [ENSCORESW-2168]. #332
Conversation
…ary searches in mappers.
…ntered interval tree, to be used in the mapper instead of binary search.
…harmonise with mapper code.
…mapped coordinates. More uniform function prototype to allow transparent calls from Slice project method.
… using the values returned by invoking the updated mapper interfaces which return the original and mapped coordinates. The coordinates are readjusted consider the slice strand. There are some minor optimisations which avoid going through unnecessary code fragments in certain circumnstances.
…sponding seq region names.
…sponding seq region names.
…rval trees. It loads XS version if it's installed, otherwise fall back to PP implementation.
This still needs to be resolved, with feedback from the Infrastructure team and updates from Alessandro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, needs quite a bit more work. In addition to my comments inline, isn't there some sort of generic tree implementation we could use instead of rolling our own?
@@ -224,7 +224,7 @@ sub map { | |||
throw('Incorrect number of arguments.') if (!( @_ >= 6)); | |||
|
|||
my ( $self, $frm_seq_region_name, $frm_start, $frm_end, $frm_strand, | |||
$frm_cs, $to_slice ) | |||
$frm_cs, $dummy, $to_slice, $include_org_coord ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the argument list in POD accordingly. If possible, fix the things which were out of date before the change as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -277,7 +277,7 @@ sub map { | |||
throw('Incorrect number of arguments.') if(@_ < 6); | |||
|
|||
my ($self, $frm_seq_region_name, $frm_start, | |||
$frm_end, $frm_strand, $frm_cs, $fastmap, $to_slice) = @_; | |||
$frm_end, $frm_strand, $frm_cs, $fastmap, $to_slice, $include_org_coord) = @_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
modules/Bio/EnsEMBL/Mapper.pm
Outdated
@@ -149,6 +149,9 @@ use Bio::EnsEMBL::Mapper::IndelCoordinate; | |||
use Bio::EnsEMBL::Mapper::Gap; | |||
|
|||
use Bio::EnsEMBL::Utils::Exception qw(throw); | |||
use Bio::EnsEMBL::Utils::Interval; | |||
use Bio::EnsEMBL::Utils::Tree::Interval::Immutable; | |||
# use Bio::EnsEMBL::Utils::Tree::Interval::Mutable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove commented-out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
modules/Bio/EnsEMBL/Mapper.pm
Outdated
|
||
### | ||
# | ||
map { push @{$from_intervals}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of using map in void context and writing output inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
modules/Bio/EnsEMBL/Mapper.pm
Outdated
# Enable this as some preliminary experiments show this is slightly faster | ||
# on the test suite. | ||
# | ||
# my $itree = Bio::EnsEMBL::Utils::Tree::Interval::Mutable->new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
Bio::EnsEMBL::Utils::Tree::Interval::Mutable | ||
|
||
=head1 SYNOPSIS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And more...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(still here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
=head1 DESCRIPTION | ||
|
||
=head1 METHODS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And more...
modules/t/interval_tree_mutable.t
Outdated
use Test::Exception; | ||
use Test::Warnings qw(warning); | ||
|
||
use Data::Dumper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module isn't used anywhere in this file, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
use strict; | ||
|
||
use Data::Dumper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module isn't used anywhere in this file, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
use strict; | ||
|
||
use Data::Dumper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module isn't used anywhere in this file, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(still here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
…comments and in-place conditionals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just flagged two previously requested changes that have been missed. Other than those two, fine by me.
|
||
Bio::EnsEMBL::Utils::Tree::Interval::Mutable | ||
|
||
=head1 SYNOPSIS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(still here)
|
||
use strict; | ||
|
||
use Data::Dumper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(still here)
@avullo can you please merge master into your branch and push it just so we can make sure travis is happy with it all? |
@tgrego just done |
modules/Bio/EnsEMBL/Mapper.pm
Outdated
$start_idx = 0; | ||
$end_idx = $#{$lr}; | ||
foreach my $i (@{$lr}) { | ||
my $start = $i->{$from}{start}<=$i->{$from}{end}?$i->{$from}{start}:$i->{$from}{end}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the clearest. Could this and the line below get split into an single if. The $start
and $end
can get created as $i->{$from}{start}
and $i->{$from}{end}
respectively then modified if they fail the if condition. This would also reduce the checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Mark, this is addressed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That look much better, thank you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The points against using an off the shelf implementation:
- minimise reliance on external libraries (i.e. Set::IntervalTree)
- Tree::Interval only handles the non-overlapping intervals case
- ability to provide optimize (i.e. XS) code, when possible
- ability to provide different suitable implementations (i.e. mutable/immutable) depending on the use case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than some minor clarity changes and some more docs, this is looking good.
I'd like to reiterate @mkszuba question about the implementation of tree interval code and why we could not use an off the shelf implementation.
$normal_slice->end()); | ||
} | ||
my @coords = $asm_mapper->map($normal_slice->seq_region_name(), | ||
$normal_slice->start(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all tabs
$normal_slice->end()); | ||
} | ||
my @coords = $asm_mapper->map($normal_slice->seq_region_name(), | ||
$normal_slice->start(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the tabs
if(@coords > 1) { | ||
return @coords; | ||
} elsif ($include_org_coord) { | ||
return @coords unless $coords[0]{mapped}->isa('Bio::EnsEMBL::Mapper::Gap'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the unless
get moved to the elsif
block
} elsif ($include_org_coord) { | ||
return @coords unless $coords[0]{mapped}->isa('Bio::EnsEMBL::Mapper::Gap'); | ||
} else { | ||
return @coords unless $coords[0]->isa('Bio::EnsEMBL::Mapper::Gap'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this not get handled as part of the if
rather than a postfix in the else
section?
return sort_by_begin(uniq($result)); | ||
} | ||
|
||
sub _query_point { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the pod section?
_load($PP); | ||
} | ||
|
||
sub _load { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pod docs?
my $left = $self->left; | ||
|
||
$left->parent($self->parent); | ||
unless (defined $left->parent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this not be swapped so the if (defined $right->parent)
then perform what is in the else
block
my $right = $self->right; | ||
|
||
$right->parent($self->parent); | ||
unless (defined $right->parent) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this not be swapped so the if (defined $right->parent)
then perform what is in the else
block
} | ||
} else { | ||
# insert into right subtree | ||
unless (defined $self->right) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this not be swapped so the if (defined $self->right)
then perform what is in the else
block otherwise$self->right(Bio::EnsEMBL::Utils...
|
||
if ($i->start < $self->key) { | ||
# insert into left subtree | ||
unless (defined $self->left) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this not be swapped so the if (defined $self->left)
then perform what is in the else
block otherwise$self->left(Bio::EnsEMBL::Utils...
Looks like no more changes will be added. |
Pardon I don't have time to address the latest comments. They appear to be mostly stylistic in nature, something which can later be added. Moreover, POD's missing only for the private methods, which was intentional. |
Requirements
Description
This is a rebase onto master of branch experimental/mapper_update, which is meant to address most of the Mapper issues documented under [ENSCORESW-2168].
Use case
The current mapper code does not correctly handle several edge cases and emits an incorrect output. See [ENSCORESW-2168] for more details.
Benefits
A much more robust mapper code. Among several things, it's making the Genebuild Loutre import pipeline work correctly.
Possible Drawbacks
The Mapper::map_coordinates method could be slower under certain circumstances, i.e. when mapping from several non overlapping intervals as it has now to take into account the cost of building and querying an interval tree. Querying is more expensive than current binary search in case the tree is unbalanced. We might use the provided mutable tree implementation since we don't incur in the cost associated with keeping the tree balanced as we're not inserting or removing intervals after construction. This might also have the benefit of exploiting an existing XS implementation.
Testing
Have you added/modified unit tests to test the changes?
Yes
If so, do the tests pass/fail?
Yes.
Have you run the entire test suite and no regression was detected?
Yes. It fails locally but on just one unrelated test (sliceVariation.t) which is due the MySQL version I have (5.7) which is currently incompatible with the Variation API. The very same fail occurs also on master.