Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up codelist tests #136

Closed
wants to merge 4 commits into from
Closed

Speed up codelist tests #136

wants to merge 4 commits into from

Conversation

andylolz
Copy link
Contributor

@andylolz andylolz commented Jul 21, 2018

Codelist tests are among the slowest dashboard tests, so it’s worth investigating how to speed these up.

This PR makes two changes, and results in significant efficiencies (benchmarking suggests this could make python calculate_stats.py loop about 20% faster):

  1. Codelist tests are currently done at element level. This means lots and lots of slow xpath calls. It’s way more efficient to perform these tests at file level.

  2. At present, codelist values are computed and then recomputed:

    @returns_numberdictdict
    def codelist_values(self):
    out = defaultdict(lambda: defaultdict(int))
    for path in codelist_mappings[self._major_version()]:
    for value in self.element.xpath(path):
    out[path][value] += 1
    return out
    @returns_numberdictdict
    def codelist_values_by_major_version(self):
    out = defaultdict(lambda: defaultdict(int))
    for path in codelist_mappings[self._major_version()]:
    for value in self.element.xpath(path):
    out[path][value] += 1
    return { self._major_version(): out }

    If codelist tests are done at file level, we can cache the results to avoid recomputing.

@coveralls
Copy link

coveralls commented Jul 21, 2018

Coverage Status

Coverage increased (+0.03%) to 40.567% when pulling 2263a35 on andylolz:element-to-file-rejig into ecc2398 on IATI:master.

@andylolz
Copy link
Contributor Author

I’ve brought this PR up-to-date, since it sounds like the tech team is currently working on dashboard efficiencies, so this may be relevant.

I’ve just tested this, and the output is the same, but python calculate_stats.py loop is about 15% faster on my machine. It’s possible this uses more memory, so it might be worth profiling that.

@andylolz
Copy link
Contributor Author

andylolz commented Jan 4, 2021

Doesn’t look like this will be merged; closing.

@andylolz andylolz closed this Jan 4, 2021
@andylolz andylolz deleted the element-to-file-rejig branch January 4, 2021 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants