Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize get_irods_content() for large studies #1519

Closed
mikkonie opened this issue Nov 1, 2022 · 2 comments
Closed

Optimize get_irods_content() for large studies #1519

mikkonie opened this issue Nov 1, 2022 · 2 comments
Assignees
Labels
app: samplesheets Issue in the samplesheets app internal Changes invisible to the user or APIs (e.g. refactoring and optimization)
Milestone

Comments

@mikkonie
Copy link
Contributor

mikkonie commented Nov 1, 2022

Looking into the current bottlenecks of rendering large studies, I've noticed that, at least in observed cases, the worst is in fact not sheet rendering or loading tables via the Ajax view, but the get_irods_content(). This can take a lion's share of the rendering time when not in edit mode.

I will have to look into this in more detail and see what can be optimized. If decent improvements are feasible with a reasonable amount of work, I'll fit this in the v0.12.1 release. If not, I'll postpone.

Possibilities include e.g. caching this info with sodarcache and retrieving from there in a similar fashion to the planned feature in #1509.

This will most likely involve optimizing the plugin methods called in study and/or app sub-apps.

Related to #956, which may provide one potential solution for improving this.

@mikkonie mikkonie added internal Changes invisible to the user or APIs (e.g. refactoring and optimization) app: samplesheets Issue in the samplesheets app labels Nov 1, 2022
@mikkonie mikkonie added this to the v0.12.1 milestone Nov 1, 2022
@mikkonie mikkonie self-assigned this Nov 1, 2022
@mikkonie
Copy link
Contributor Author

mikkonie commented Nov 4, 2022

Some findings:

  • Building the tables is slow, but is not the main culprit
  • For germline studies, the study sub-app get_shortcut_column() is by far the biggest time waster
    • Redundant queries in get_family_sources() which can probably be optimized
    • Optimally, this should come from sodarcache instead of building it every time the sheets are loaded

@mikkonie
Copy link
Contributor Author

mikkonie commented Nov 4, 2022

Done, to be merged into dev. The root cause turned out to be calling get_family_sources() once per each row, generating a redundant database query every time. Refactoring this query reduced the execution time in a 5000+ sample study from 115s to 1.2s. D'oh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app: samplesheets Issue in the samplesheets app internal Changes invisible to the user or APIs (e.g. refactoring and optimization)
Projects
None yet
Development

No branches or pull requests

1 participant