POEM 093: linear solution caching #190

kanekosh · 2023-10-24T18:27:26Z

This POEM proposes to cache the subsystem-level linear solutions to avoid redundant linear solves with the same (or parallel) right-hand-side vector.

naylor-b · 2024-03-29T13:49:55Z

@kanekosh In attempting to implement this, I've run into some test failures in cases where complex step is active. Is the example code valid if the b and x arrays are complex? If not, is there some way to modify it to make it valid? My current plan is to disable the caching when complex step is active. Is that an acceptable solution?

kanekosh · 2024-04-01T17:08:22Z

@naylor-b I didn't think of complex numbers when implementing this prototype. I intended this linear solution caching to help accelerate the hierarchical linear solution process in compute_totals, so I assumed the arrays are always real.

I haven't tested, but I suspect the following lines do not work under the complex step. There should be a workaround, but I don't have a good solution off the top of my head...

# Check if the RHS vector and a cached vector are parallel
# NOTE: the following parallel vector check may be inaccurate for some cases. maybe should tighten the tolerance?
dot_product = np.dot(b_vec, rhs_cache)
norm1 = np.linalg.norm(b_vec)
norm2 = np.linalg.norm(rhs_cache)
if np.isclose(abs(dot_product), norm1 * norm2, rtol=1e-100, atol=1e-15):
    # two vectors are parallel, thus we can use the cache.
    scaler = dot_product / norm2**2
    x_vec[:] = self._sol_cache_list[i] * scaler
    print('*** use caching in DirectSolver with scaler = %.2E | time = %.2E s ***' % (scaler, time.time() - time0))
    return

Is it correct to say that the linear solver gets the complex b and x only when using approx_totals to a group including a Newton solver? Or are there any other instances where the linear solver sees the complex arrays?
If I'm correct, disabling the caching under the complex step sounds valid because (I believe) the cases/groups that use approx_totals do not overlap with the cases in which the caching is effective.

naylor-b · 2024-04-01T19:19:13Z

@kanekosh I think that's correct that the only time the linear solver gets complex b and x is when computing derivatives using complex step at a level at or above a system that has a Newton solver, so it sounds like just disabling the caching when under complex step will be ok.

kanekosh · 2024-04-01T19:50:37Z

disabling the caching when under complex step will be ok.

Sounds good to me. Thank you for working on this!

naylor-b · 2024-04-01T20:15:09Z

I have a branch here: https://github.com/naylor-b/OpenMDAO/tree/linear_cache that has the caching added to DirectSolver, PetscKrylov, and ScipyKrylov solvers if you want to try it out. BTW I added another option called max_cache_entries to those solvers to allow the user some control over how much memory is being used by the cache. The default max entries is currently 3.

kanekosh · 2024-04-02T02:39:49Z

I tried your branch on the OAS examples I included in this POEM, and it turned out the caching is not activated in example2.py, where the linear solver gets a b vector parallel to the cached rhs.
It looks like the tolerance is too tight here for the parallel vector check. For example2.py, the tolerance of 3e-16 (I set it for both atol and rtol) is the smallest that can still activate the caching.
I wrote in the prototype that the parallel vector check may be inaccurate for loose tolerance. I don't exactly remember which case it failed back then, but I remember that the tolerance of 1e-10 or 1e-12 sometimes gives the wrong parallel vector check. I didn't try 3e-16 at that time. So the tolerance of 3e-16 (a slightly larger value than the machine epsilon) would be a good value here, although I cannot assure it's robust.

I'm wondering if there's a more robust and inexpensive way of checking if two numpy arrays are parallel. I didn't think much about it when implementing the current way, which checks the dot product to the product of norms.

kanekosh · 2024-04-02T02:46:04Z

Also, this is not directly related, but do you think it'd make sense to include the RHS=0 solver skip here along with the caching? The caching indirectly does that (in a slightly inefficient way), so it might not worth adding additional code for RHS=0 check, though.

naylor-b · 2024-04-03T12:40:50Z

Sorry, I must have forgotten to re-run example2 after I tweaked the tolerance.

I have an idea for a parallel array checker that's faster (on average) than the current way. I'll try it out and see if it works and is faster. I'll look into checking for RHS=0 as well. I thought we had already added an RHS=0 check earlier but I can't seem to find it.

naylor-b · 2024-04-05T03:26:59Z

@kanekosh I changed the 'use_cache' option to 'rhs_checking' and got rid of the separate 'max_cache_entries' option. You can set the 'rhs_checking' option to True, False, or a dict that allows you to set specific attributes of the LinearRHSChecker object if you don't want the default values. The allowed dict entries are 'use_cache', 'check_zero', 'rtol', 'atol', 'max_cache_entries', and 'collect_stats'. By setting these attributes, caching and zero checking can be turned on and off independently.

My idea for determining if two arrays are parallel did end up being much faster in the best case and 2 to 3x slower in the worst case (worst case was when the arrays were actually parallel). For now I'm just using your original algorithm.

I also added display of cache hits and zero check hits so we can get a feel for how often our rhs checks actually help.

I added a relevance check too that warns if caching has been turned on but there is no dependency between responses that would cause the same adjoint to be computed more than once.

For example1 and example2, I get this warning:

OpenMDAOWarning:'AS_point_1.coupled' <class Group>: 'rhs_checking' is active but no redundant adjoint dependencies were found, so caching is unlikely to be beneficial.

these are the stats for example1:

'AS_point_1.coupled' <class Group>: {'eqhits': 0, 'neghits': 0, 'parhits': 0, 'zerohits': 0, 'misses': 2, 'resets': 1}
'AS_point_0.coupled' <class Group>: {'eqhits': 0, 'neghits': 1, 'parhits': 0, 'zerohits': 0, 'misses': 2, 'resets': 1}

and these are for example2:

'AS_point_1.coupled' <class Group>: {'eqhits': 0, 'neghits': 0, 'parhits': 0, 'zerohits': 0, 'misses': 1, 'resets': 1}
'AS_point_0.coupled' <class Group>: {'eqhits': 0, 'neghits': 0, 'parhits': 1, 'zerohits': 0, 'misses': 2, 'resets': 1}

What do you think about the warning related to the adjoint dependencies? Does it make sense to check that, or do you see problems with doing it that way?

kanekosh · 2024-04-05T14:30:14Z

@naylor-b I think both the warning and stats are helpful.
~~This is more like a question/idea (not a request), but would it be possible to do the other way using the relevance check, i.e., detect redundant adjoints and maybe automatically enable the caching?~~
Edit: I just found you added another warning that does this, I think it's really helpful! Personally, I don't think you'd need to automatically enable caching in such cases, and the current warning is enough.

naylor-b · 2024-04-05T14:45:50Z

Another possibility would be to add an entry to the dict called 'auto', and if you set that to True then it would automatically activate the caching if it was needed based on the relevance. So something like:

self.linear_solver.options['rhs_checking'] = {'auto': True, ...}

Also, is it ok that we're using the same reltol and abstol for both the equality check (and negative check) and for the parallel check, or do those need different tolerances?

kanekosh · 2024-04-05T17:56:40Z

auto sounds helpful, although I don't have a strong preference.

Also, is it ok that we're using the same reltol and abstol for both ...

Yes, I think that's ok. I don't think separate tolerances would be necessary.

robfalck · 2024-04-12T17:04:09Z

@kanekosh Is the implementation PR good with you? If so I'll consider this POEM accepted.

Added implementation PR

kanekosh · 2024-04-12T17:41:47Z

The PR looks good to me, thank you!

kanekosh and others added 3 commits October 20, 2023 16:46

added poem 93

3353928

minor edits

cb7012d

anils edits

cd8093a

Update POEM_093.md

e68c03f

Added implementation PR

Accepted POEM_093.md

095f736

robfalck merged commit 958b975 into OpenMDAO:master Apr 18, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POEM 093: linear solution caching #190

POEM 093: linear solution caching #190

kanekosh commented Oct 24, 2023

naylor-b commented Mar 29, 2024

kanekosh commented Apr 1, 2024

naylor-b commented Apr 1, 2024

kanekosh commented Apr 1, 2024

naylor-b commented Apr 1, 2024

kanekosh commented Apr 2, 2024

kanekosh commented Apr 2, 2024

naylor-b commented Apr 3, 2024

naylor-b commented Apr 5, 2024

kanekosh commented Apr 5, 2024 •

edited

Loading

naylor-b commented Apr 5, 2024

kanekosh commented Apr 5, 2024

robfalck commented Apr 12, 2024

kanekosh commented Apr 12, 2024

POEM 093: linear solution caching #190

POEM 093: linear solution caching #190

Conversation

kanekosh commented Oct 24, 2023

naylor-b commented Mar 29, 2024

kanekosh commented Apr 1, 2024

naylor-b commented Apr 1, 2024

kanekosh commented Apr 1, 2024

naylor-b commented Apr 1, 2024

kanekosh commented Apr 2, 2024

kanekosh commented Apr 2, 2024

naylor-b commented Apr 3, 2024

naylor-b commented Apr 5, 2024

kanekosh commented Apr 5, 2024 • edited Loading

naylor-b commented Apr 5, 2024

kanekosh commented Apr 5, 2024

robfalck commented Apr 12, 2024

kanekosh commented Apr 12, 2024

kanekosh commented Apr 5, 2024 •

edited

Loading