Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profile the data tests #60

Closed
dsweber2 opened this issue Nov 8, 2023 · 6 comments
Closed

profile the data tests #60

dsweber2 opened this issue Nov 8, 2023 · 6 comments
Assignees

Comments

@dsweber2
Copy link
Contributor

dsweber2 commented Nov 8, 2023

figure out exactly why epix_slide is taking so long, hopefully this will be useful for epiprocess updates.

@dshemetov dshemetov self-assigned this Nov 13, 2023
@dshemetov
Copy link
Contributor

I ran a profile with profvis

p <- profvis::profvis({
  devtools::test()
})
htmlwidgets::saveWidget(p, "profvis.html", selfcontained = TRUE)

Notes:

  • All tests take 89s.
  • The most time, 20s, is spent group_modify.data.frame inside grouped_epi_archive.
    • Half of that time is spent inside the work-function run_workflow_and_format and specifically inside the grab_residuals part of apply_frosting (see screenshot). Not sure what this does, but this seems a bit much?
    • A quarter of the time is spent in flatline_forecaster.
    • An eight of the time is spent in forge.data.frame inside the predict.epi_workflow
  • The next highest time, 10s, is spent in fit.workflow
    • Most of that is in prep.epi_recipe, with 2 seconds spent on relocate.data.frame

Doesn't seem like the issue is epix_slide and more like epipredict overhead.

Image

@nmdefries
Copy link
Contributor

nmdefries commented Nov 17, 2023

There are several data handling steps at the bottom that in aggregate take a lot of time. Are those happening in epix_slide? Could be an easier place to make improvements compared to epipredict (or do a double-pronged approach).

I'd be interested in looking into the epiprocess side.

@dshemetov
Copy link
Contributor

Not sure, from a skim I couldn't tell where those *.data.frame calls were coming from. It's possible they're mostly in epix_slide. We could try profiling epix_slide specifically and see if those show up again. Would probably help to use a big epi_archive to accentuate the cost of the data shaping operations.

If you want to look at that, that'd be appreciated!

@dshemetov
Copy link
Contributor

Here is the full proviz file btw profvis-2023-11-16.zip

@nmdefries
Copy link
Contributor

Speedups to epix_slide in progress here.

@dshemetov
Copy link
Contributor

gonna close as completed for now. can resurrect if we run into perf issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants