Move time to outer loop in result flattening #240

ctessum · 2023-02-05T04:46:49Z

Hello!

Consider this example code, which should be copy-pasteable once the package registration process is completed.

It solves a PDE advection equation, using a wind fields pulled from an external file. However, there are a couple of issues.

The first issue is that—in use many cases—the data being loaded won't fit in memory, so the data loader uses a caching system that assumes an integrator that takes sequential steps in time (i.e. it's assume time is in the outer loop). This works fine until the integrator finishes and starts calculating the observables (here). In that function, time ends up being the inner loop, resulting in the data loader making a large number of unnecessary reads from disk. The result of this is that the integration process in the example above takes 30 seconds or so and then it takes 30 minutes or so to calculate the observables.

To solve this first issue, the first commit moves time to the outer loop in the observable calculating function. This improves the situation for the example above and also makes things more consistent for other data loaders and interpolators which will in general be assuming that data is accessed with time in the outer loop.

However, a second issue remains, which is that for the system above solve still spends the vast majority of the time calculating observables, but now it's mainly doing compilation here and (especially) type inference here.

The second commit below tries to improve this situation by using broadcasting instead map. It doesn't help with the speed at all, but I thought it at least makes the code a little more clear.

Let me know your thoughts!

codecov · 2023-02-05T04:55:18Z

Codecov Report

Merging #240 (b2f5c65) into master (eaddda2) will decrease coverage by 5.83%.
The diff coverage is 91.66%.

@@            Coverage Diff             @@
##           master     #240      +/-   ##
==========================================
- Coverage   78.81%   72.98%   -5.83%     
==========================================
  Files          39       39              
  Lines        1987     1988       +1     
==========================================
- Hits         1566     1451     -115     
- Misses        421      537     +116

Impacted Files	Coverage Δ
src/interface/solution/timedep.jl	`82.85% <91.66%> (-2.44%)`	⬇️
...tization/schemes/half_offset_centred_difference.jl	`42.10% <0.00%> (-57.90%)`	⬇️
...schemes/spherical_laplacian/spherical_laplacian.jl	`50.00% <0.00%> (-42.31%)`	⬇️
src/interface/solution/common.jl	`19.69% <0.00%> (-22.73%)`	⬇️
src/interface/solution/solution_utils.jl	`78.57% <0.00%> (-21.43%)`	⬇️
src/discretization/generate_bc_eqs.jl	`62.80% <0.00%> (-17.08%)`	⬇️
src/MOL_discretization.jl	`63.97% <0.00%> (-11.03%)`	⬇️
src/system_parsing/bcs/parse_boundaries.jl	`81.67% <0.00%> (-6.11%)`	⬇️
...schemes/centered_difference/centered_difference.jl	`94.44% <0.00%> (-5.56%)`	⬇️
src/discretization/discretize_vars.jl	`82.22% <0.00%> (-4.45%)`	⬇️
... and 4 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

xtalax · 2023-02-06T11:55:40Z

This looks like an acceptable change, how does it affect performance for other cases? could you run an AB benchmark for a few cases just to see how this stacks up?

xtalax · 2023-02-10T13:25:17Z

That issue you posted is the reason I decided to batch calls to the observed to get the whole time slice at once, as they are optimised for this kind of call.

For now I would prefer to provide this as an option, unless we can show that it's unconditionally faster to do it the way you suggest.

ctessum · 2023-02-10T15:56:25Z

Thanks for your response!

For my use case, it is currently too slow to be useable without this PR, but it's also too slow to be useable with this PR, so I'm not too attached to this particular PR getting adopted. I'm still trying to wrap my head around everything that's going on here, but so far it seems the issues might be:

The time step is in the inner loop in the observables, which doesn't work well with data loaders optimized for time being in the outer loop (which is the focus of this PR)
(It seems like) Observables aren't calculated in the order of dependency, so the data loader gets called multiple times for the same variable when calculating the observables.
Observables are calculated eagerly, which results in wasted time if there are many or computationally intensive observables that the user isn't ultimately interested in (e.g. something from a data loader or variables that only serve as connections between subsystems)
~~4. Theres a custom function generated for every observed variable at every grid point, with compilation time that adds up when there are a lot of observables~~
~~5. The observed function isn't statically typed, which requires type inference that adds up when there are a lot of observables and/or grid points.~~

Let me know if you think these may be actual issues going on here, or if I'm misunderstanding something. Thanks!

EDIT: 4 and 5 are fixed by upgrading to Julia 1.9

xtalax · 2023-02-10T16:19:50Z

No, that's a pretty good summary of the current issues with observables, though am unsure about 5.

1 is unlikely to be solved since MTK and the ODE solvers are mostly concerned with building time series data, so having time in the inner loop is faster in these cases. It may be possible to provide an option to avoid this, though it will be complex to implement.

I'm currently working on an ability to save only subregions of the domain that one is interested in, with requested observables saving only their dependencies. I hope this will lead to better observable handling overall. This handles 2 and 3. It may also speed generation and compilation to build the observed functions with only their dependencies, which should put a dent in 4 with any luck.

With handling for array variables in MTK which may be coming soon, there may be some improvements since observables often have patterns and occur in patterned subregions of the domain for PDEs. We should be able to take advantage of this fact to generate single functions that calculate a whole region of observables, solving 4 if all goes well.

ctessum · 2023-02-10T17:17:50Z

Thanks!

With regard to 1, just so I understand, I thought that how the ODE solvers work is that they solve for all variables at t=1, then they solve again for all variables at t=2, then they solve again for all variables at t=3, and so on, so the native way to store the results would be time-major, or having time in the outer loop.

For example, when I look at the output (e.g., sol.u where sol is the name of the result of solve) it is in the form of a vector of vectors, where the outer vector has an element for each time step, and each inner vector has an element for each state variable. So time is in the outer loop, and the variables are in the inner loop.

Am I understanding that incorrectly? Thanks for your help!

ChrisRackauckas · 2023-02-11T14:35:26Z

I'm a bit torn on whether this is necessary. The working that MTK will do is not necessarily the same as the ordering generated by the stenciling code, since it can re-order at will (and will for some transformations). I think it is still a good idea though, since this gives "the right" ordering by default for a column major structure. It will not necessarily have a trangible effect unless preserved through, and the structure preservation of course is a (the) ongoing project.

ctessum added 2 commits February 4, 2023 19:36

Move time to outer loop in result flattening

bac9638

Use broadcasting in observable calculation

c42abe5

Add dots for broadcasting

b2f5c65

ctessum mentioned this pull request Feb 9, 2023

Generating solution output for PDE takes longer than generating solution #374

Open

EarthSciML closed this by deleting the head repository Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move time to outer loop in result flattening #240

Move time to outer loop in result flattening #240

ctessum commented Feb 5, 2023

codecov bot commented Feb 5, 2023 •

edited

Loading

xtalax commented Feb 6, 2023

xtalax commented Feb 10, 2023 •

edited

Loading

ctessum commented Feb 10, 2023 •

edited

Loading

xtalax commented Feb 10, 2023 •

edited

Loading

ctessum commented Feb 10, 2023

ChrisRackauckas commented Feb 11, 2023

Move time to outer loop in result flattening #240

Move time to outer loop in result flattening #240

Conversation

ctessum commented Feb 5, 2023

codecov bot commented Feb 5, 2023 • edited Loading

Codecov Report

xtalax commented Feb 6, 2023

xtalax commented Feb 10, 2023 • edited Loading

ctessum commented Feb 10, 2023 • edited Loading

xtalax commented Feb 10, 2023 • edited Loading

ctessum commented Feb 10, 2023

ChrisRackauckas commented Feb 11, 2023

codecov bot commented Feb 5, 2023 •

edited

Loading

xtalax commented Feb 10, 2023 •

edited

Loading

ctessum commented Feb 10, 2023 •

edited

Loading

xtalax commented Feb 10, 2023 •

edited

Loading