-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENSCORESW-3349 changes made loops in parser for better performance. #503
Conversation
… change in output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not very clear to me what these changes actually bring.
If the memory problem could not be reproduced with the original code, how do we know the changes improve things?
$add_dependent_xref_sth->execute($xref_id, $dependent_xref_id); | ||
} | ||
} | ||
|
||
# Also store refseq protein as direct xref for ensembl translation, if translation exists | ||
if (defined $tl && defined $tl_of) { | ||
if ($tl_of->seq eq $tl->seq) { | ||
if (md5($tl_of->seq) eq md5($tl->seq)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what advantage does using the md5 bring?
we still call the seq method, so I am assuming the same query is still being run underneath
@@ -369,7 +387,7 @@ sub run_script { | |||
# If a best match was defined for the refseq transcript, store it as direct xref for ensembl transcript | |||
if ($best_id) { | |||
my ($acc, $version) = split(/\./, $id); | |||
$version =~ s/\D//g if $version; | |||
$version =~ s/\D//g if $version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cosmetic changes should be avoided, and if absolutely necessary, submitted as a separate commit to avoid masking the actual code changes
can't confirm if changes are needed |
Requirements
Description
Optimize loops in RefSeqCoordinateParser which runs for human xrefs. No change in output is expected.
Use case
Potential memory issues caused by holding objects in memory
Benefits
Lesser memory required for parsing
Possible Drawbacks
NA
Testing
There is no unit test for this parser.
Running the parser as a singlejob did not result into any memory failures(over multiple times).
The xref count from these changes is same as the xref count from e100 parser
If so, do the tests pass/fail?
NA