ENSCORESW-3349 changes made loops in parser for better performance. #503

ameya1981 · 2020-08-13T08:09:38Z

Requirements

Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion;
Review the contributing guidelines for this repository; remember in particular:
- do not modify code without testing for regression
- provide simple unit tests to test the changes
- if you change the schema you must patch the test databases as well, see Updating the schema
- the PR must not fail unit testing

Description

Optimize loops in RefSeqCoordinateParser which runs for human xrefs. No change in output is expected.

Use case

Potential memory issues caused by holding objects in memory

Benefits

Lesser memory required for parsing

Possible Drawbacks

NA

Testing

There is no unit test for this parser.
Running the parser as a singlejob did not result into any memory failures(over multiple times).
The xref count from these changes is same as the xref count from e100 parser

If so, do the tests pass/fail?

NA

… change in output

coveralls · 2020-08-13T08:30:27Z

Coverage remained the same at 78.333% when pulling 41ebd28 on optimize/RefSeqCoordinateParser into b82332c on master.

magaliruffier

It is not very clear to me what these changes actually bring.
If the memory problem could not be reproduced with the original code, how do we know the changes improve things?

magaliruffier · 2020-09-02T08:49:32Z

misc-scripts/xref_mapping/XrefParser/RefSeqCoordinateParser.pm

              $add_dependent_xref_sth->execute($xref_id, $dependent_xref_id);
            }
          }

 # Also store refseq protein as direct xref for ensembl translation, if translation exists
          if (defined $tl && defined $tl_of) {
-            if ($tl_of->seq eq $tl->seq) {
+            if (md5($tl_of->seq) eq md5($tl->seq)) {


what advantage does using the md5 bring?
we still call the seq method, so I am assuming the same query is still being run underneath

magaliruffier · 2020-09-02T08:51:39Z

misc-scripts/xref_mapping/XrefParser/RefSeqCoordinateParser.pm

@@ -369,7 +387,7 @@ sub run_script {
 # If a best match was defined for the refseq transcript, store it as direct xref for ensembl transcript
        if ($best_id) {
          my ($acc, $version) = split(/\./, $id);
-	  $version =~ s/\D//g if $version;
+	        $version =~ s/\D//g if $version;


cosmetic changes should be avoided, and if absolutely necessary, submitted as a separate commit to avoid masking the actual code changes

magaliruffier · 2020-09-25T10:30:41Z

can't confirm if changes are needed

ENSCORESW-3349 changes made loops in parser fo better performance. no…

41ebd28

… change in output

ameya1981 changed the title ~~ENSCORESW-3349 changes made loops in parser fo better performance. no…~~ ENSCORESW-3349 changes made loops in parser for better performance. Aug 13, 2020

ameya1981 requested a review from magaliruffier August 17, 2020 15:55

magaliruffier suggested changes Sep 2, 2020

View reviewed changes

magaliruffier closed this Sep 25, 2020

magaliruffier deleted the optimize/RefSeqCoordinateParser branch January 4, 2021 13:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENSCORESW-3349 changes made loops in parser for better performance. #503

ENSCORESW-3349 changes made loops in parser for better performance. #503

ameya1981 commented Aug 13, 2020 •

edited

Loading

coveralls commented Aug 13, 2020

magaliruffier left a comment

magaliruffier Sep 2, 2020

magaliruffier Sep 2, 2020

magaliruffier commented Sep 25, 2020

ENSCORESW-3349 changes made loops in parser for better performance. #503

ENSCORESW-3349 changes made loops in parser for better performance. #503

Conversation

ameya1981 commented Aug 13, 2020 • edited Loading

Requirements

Description

Use case

Benefits

Possible Drawbacks

Testing

coveralls commented Aug 13, 2020

magaliruffier left a comment

Choose a reason for hiding this comment

magaliruffier Sep 2, 2020

Choose a reason for hiding this comment

magaliruffier Sep 2, 2020

Choose a reason for hiding this comment

magaliruffier commented Sep 25, 2020

ameya1981 commented Aug 13, 2020 •

edited

Loading