correction to commit "Updates to fix soft_masking discripancy - ENSC… #403
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…ORESW"
Requirements
Description
The commit here:
74e499a
was trying to correct some discrepancies in the softmasking on non-coding sequences
However, completely non-coding exons have a undefined $ex->coding_region_start resulting in warning messages
Use of uninitialized value in numeric gt (>) at ...EnsEMBL/Transcript.pm
After setting all the sequence to lower case at line 837 $exon_seq = lc($exon_seq)
the if statements
if ($ex->coding_region_start($self) > $ex->start()) {
and
if ($ex->coding_region_end($self) < $ex->end()) {
should never be done since both coding_region_start and/or coding_region_end will be undefined if
if (!defined ($ex->coding_region_start($self))) is true
Use case
This bug will output warning messages for completely UTR exons where the optional softmask has been set
for example for gene DDR2 , transcript id ENST00000367921
even though the output softmasking in correct before and after my code change, in the updated code we do not get warning messages such as
Use of uninitialized value in numeric gt (>) at .../ensembl/modules/Bio/EnsEMBL/Transcript.pm line 837.
Benefits
The change I made makes it so that when checking a complete UTR exon (when soft_masking is requested) is all lowercase, and then the comparisons with coding start and coding end with the start and end of the exon are ignored,
Those comparisons are only done IF $ex->coding_region_start is defined
if (!defined ($ex->coding_region_start($self))) {
$exon_seq = lc($exon_seq);
}else{
if ($ex->coding_region_start($self) > $ex->start()) {
...
}
}
$seq_string .= $exon_seq;
Possible Drawbacks
none that I can see
Testing
No. I have not created tests for this. The current tests available test to see if the boundaries between lowerCase and upperCase match.
For exons that are completely UTR, both the previous code and the new code would make the entire exon sequence lowercase. The only difference is that after my change there is no invalid if comparisons with undefined values and therefore no warning messages.
In order to test for this we would need
The code builds fine on the current test suit