Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move time to outer loop in result flattening #240

Closed
wants to merge 3 commits into from
Closed

Move time to outer loop in result flattening #240

wants to merge 3 commits into from

Conversation

ctessum
Copy link
Contributor

@ctessum ctessum commented Feb 5, 2023

Hello!

Consider this example code, which should be copy-pasteable once the package registration process is completed.

It solves a PDE advection equation, using a wind fields pulled from an external file. However, there are a couple of issues.

The first issue is that—in use many cases—the data being loaded won't fit in memory, so the data loader uses a caching system that assumes an integrator that takes sequential steps in time (i.e. it's assume time is in the outer loop). This works fine until the integrator finishes and starts calculating the observables (here). In that function, time ends up being the inner loop, resulting in the data loader making a large number of unnecessary reads from disk. The result of this is that the integration process in the example above takes 30 seconds or so and then it takes 30 minutes or so to calculate the observables.

To solve this first issue, the first commit moves time to the outer loop in the observable calculating function. This improves the situation for the example above and also makes things more consistent for other data loaders and interpolators which will in general be assuming that data is accessed with time in the outer loop.

However, a second issue remains, which is that for the system above solve still spends the vast majority of the time calculating observables, but now it's mainly doing compilation here and (especially) type inference here.

The second commit below tries to improve this situation by using broadcasting instead map. It doesn't help with the speed at all, but I thought it at least makes the code a little more clear.

Let me know your thoughts!

@codecov
Copy link

codecov bot commented Feb 5, 2023

Codecov Report

Merging #240 (b2f5c65) into master (eaddda2) will decrease coverage by 5.83%.
The diff coverage is 91.66%.

@@            Coverage Diff             @@
##           master     #240      +/-   ##
==========================================
- Coverage   78.81%   72.98%   -5.83%     
==========================================
  Files          39       39              
  Lines        1987     1988       +1     
==========================================
- Hits         1566     1451     -115     
- Misses        421      537     +116     
Impacted Files Coverage Δ
src/interface/solution/timedep.jl 82.85% <91.66%> (-2.44%) ⬇️
...tization/schemes/half_offset_centred_difference.jl 42.10% <0.00%> (-57.90%) ⬇️
...schemes/spherical_laplacian/spherical_laplacian.jl 50.00% <0.00%> (-42.31%) ⬇️
src/interface/solution/common.jl 19.69% <0.00%> (-22.73%) ⬇️
src/interface/solution/solution_utils.jl 78.57% <0.00%> (-21.43%) ⬇️
src/discretization/generate_bc_eqs.jl 62.80% <0.00%> (-17.08%) ⬇️
src/MOL_discretization.jl 63.97% <0.00%> (-11.03%) ⬇️
src/system_parsing/bcs/parse_boundaries.jl 81.67% <0.00%> (-6.11%) ⬇️
...schemes/centered_difference/centered_difference.jl 94.44% <0.00%> (-5.56%) ⬇️
src/discretization/discretize_vars.jl 82.22% <0.00%> (-4.45%) ⬇️
... and 4 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@xtalax
Copy link
Member

xtalax commented Feb 6, 2023

This looks like an acceptable change, how does it affect performance for other cases? could you run an AB benchmark for a few cases just to see how this stacks up?

@xtalax
Copy link
Member

xtalax commented Feb 10, 2023

That issue you posted is the reason I decided to batch calls to the observed to get the whole time slice at once, as they are optimised for this kind of call.

For now I would prefer to provide this as an option, unless we can show that it's unconditionally faster to do it the way you suggest.

@ctessum
Copy link
Contributor Author

ctessum commented Feb 10, 2023

Thanks for your response!

For my use case, it is currently too slow to be useable without this PR, but it's also too slow to be useable with this PR, so I'm not too attached to this particular PR getting adopted. I'm still trying to wrap my head around everything that's going on here, but so far it seems the issues might be:

  1. The time step is in the inner loop in the observables, which doesn't work well with data loaders optimized for time being in the outer loop (which is the focus of this PR)
  2. (It seems like) Observables aren't calculated in the order of dependency, so the data loader gets called multiple times for the same variable when calculating the observables.
  3. Observables are calculated eagerly, which results in wasted time if there are many or computationally intensive observables that the user isn't ultimately interested in (e.g. something from a data loader or variables that only serve as connections between subsystems)
    4. Theres a custom function generated for every observed variable at every grid point, with compilation time that adds up when there are a lot of observables
    5. The observed function isn't statically typed, which requires type inference that adds up when there are a lot of observables and/or grid points.

Let me know if you think these may be actual issues going on here, or if I'm misunderstanding something. Thanks!

@xtalax
Copy link
Member

xtalax commented Feb 10, 2023

No, that's a pretty good summary of the current issues with observables, though am unsure about 5.

1 is unlikely to be solved since MTK and the ODE solvers are mostly concerned with building time series data, so having time in the inner loop is faster in these cases. It may be possible to provide an option to avoid this, though it will be complex to implement.

I'm currently working on an ability to save only subregions of the domain that one is interested in, with requested observables saving only their dependencies. I hope this will lead to better observable handling overall. This handles 2 and 3. It may also speed generation and compilation to build the observed functions with only their dependencies, which should put a dent in 4 with any luck.

With handling for array variables in MTK which may be coming soon, there may be some improvements since observables often have patterns and occur in patterned subregions of the domain for PDEs. We should be able to take advantage of this fact to generate single functions that calculate a whole region of observables, solving 4 if all goes well.

@ctessum
Copy link
Contributor Author

ctessum commented Feb 10, 2023

Thanks!

With regard to 1, just so I understand, I thought that how the ODE solvers work is that they solve for all variables at t=1, then they solve again for all variables at t=2, then they solve again for all variables at t=3, and so on, so the native way to store the results would be time-major, or having time in the outer loop.

For example, when I look at the output (e.g., sol.u where sol is the name of the result of solve) it is in the form of a vector of vectors, where the outer vector has an element for each time step, and each inner vector has an element for each state variable. So time is in the outer loop, and the variables are in the inner loop.

Am I understanding that incorrectly? Thanks for your help!

@ChrisRackauckas
Copy link
Member

I'm a bit torn on whether this is necessary. The working that MTK will do is not necessarily the same as the ordering generated by the stenciling code, since it can re-order at will (and will for some transformations). I think it is still a good idea though, since this gives "the right" ordering by default for a column major structure. It will not necessarily have a trangible effect unless preserved through, and the structure preservation of course is a (the) ongoing project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants