Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POEM 093: linear solution caching #190

Merged
merged 5 commits into from
Apr 18, 2024
Merged

Conversation

kanekosh
Copy link
Contributor

This POEM proposes to cache the subsystem-level linear solutions to avoid redundant linear solves with the same (or parallel) right-hand-side vector.

@naylor-b
Copy link
Member

@kanekosh In attempting to implement this, I've run into some test failures in cases where complex step is active. Is the example code valid if the b and x arrays are complex? If not, is there some way to modify it to make it valid? My current plan is to disable the caching when complex step is active. Is that an acceptable solution?

@kanekosh
Copy link
Contributor Author

kanekosh commented Apr 1, 2024

@naylor-b I didn't think of complex numbers when implementing this prototype. I intended this linear solution caching to help accelerate the hierarchical linear solution process in compute_totals, so I assumed the arrays are always real.

I haven't tested, but I suspect the following lines do not work under the complex step. There should be a workaround, but I don't have a good solution off the top of my head...

# Check if the RHS vector and a cached vector are parallel
# NOTE: the following parallel vector check may be inaccurate for some cases. maybe should tighten the tolerance?
dot_product = np.dot(b_vec, rhs_cache)
norm1 = np.linalg.norm(b_vec)
norm2 = np.linalg.norm(rhs_cache)
if np.isclose(abs(dot_product), norm1 * norm2, rtol=1e-100, atol=1e-15):
    # two vectors are parallel, thus we can use the cache.
    scaler = dot_product / norm2**2
    x_vec[:] = self._sol_cache_list[i] * scaler
    print('*** use caching in DirectSolver with scaler = %.2E | time = %.2E s ***' % (scaler, time.time() - time0))
    return

Is it correct to say that the linear solver gets the complex b and x only when using approx_totals to a group including a Newton solver? Or are there any other instances where the linear solver sees the complex arrays?
If I'm correct, disabling the caching under the complex step sounds valid because (I believe) the cases/groups that use approx_totals do not overlap with the cases in which the caching is effective.

@naylor-b
Copy link
Member

naylor-b commented Apr 1, 2024

@kanekosh I think that's correct that the only time the linear solver gets complex b and x is when computing derivatives using complex step at a level at or above a system that has a Newton solver, so it sounds like just disabling the caching when under complex step will be ok.

@kanekosh
Copy link
Contributor Author

kanekosh commented Apr 1, 2024

disabling the caching when under complex step will be ok.

Sounds good to me. Thank you for working on this!

@naylor-b
Copy link
Member

naylor-b commented Apr 1, 2024

I have a branch here: https://github.com/naylor-b/OpenMDAO/tree/linear_cache that has the caching added to DirectSolver, PetscKrylov, and ScipyKrylov solvers if you want to try it out. BTW I added another option called max_cache_entries to those solvers to allow the user some control over how much memory is being used by the cache. The default max entries is currently 3.

@kanekosh
Copy link
Contributor Author

kanekosh commented Apr 2, 2024

I tried your branch on the OAS examples I included in this POEM, and it turned out the caching is not activated in example2.py, where the linear solver gets a b vector parallel to the cached rhs.
It looks like the tolerance is too tight here for the parallel vector check. For example2.py, the tolerance of 3e-16 (I set it for both atol and rtol) is the smallest that can still activate the caching.
I wrote in the prototype that the parallel vector check may be inaccurate for loose tolerance. I don't exactly remember which case it failed back then, but I remember that the tolerance of 1e-10 or 1e-12 sometimes gives the wrong parallel vector check. I didn't try 3e-16 at that time. So the tolerance of 3e-16 (a slightly larger value than the machine epsilon) would be a good value here, although I cannot assure it's robust.

I'm wondering if there's a more robust and inexpensive way of checking if two numpy arrays are parallel. I didn't think much about it when implementing the current way, which checks the dot product to the product of norms.

@kanekosh
Copy link
Contributor Author

kanekosh commented Apr 2, 2024

Also, this is not directly related, but do you think it'd make sense to include the RHS=0 solver skip here along with the caching? The caching indirectly does that (in a slightly inefficient way), so it might not worth adding additional code for RHS=0 check, though.

@naylor-b
Copy link
Member

naylor-b commented Apr 3, 2024

Sorry, I must have forgotten to re-run example2 after I tweaked the tolerance.

I have an idea for a parallel array checker that's faster (on average) than the current way. I'll try it out and see if it works and is faster. I'll look into checking for RHS=0 as well. I thought we had already added an RHS=0 check earlier but I can't seem to find it.

@naylor-b
Copy link
Member

naylor-b commented Apr 5, 2024

@kanekosh I changed the 'use_cache' option to 'rhs_checking' and got rid of the separate 'max_cache_entries' option. You can set the 'rhs_checking' option to True, False, or a dict that allows you to set specific attributes of the LinearRHSChecker object if you don't want the default values. The allowed dict entries are 'use_cache', 'check_zero', 'rtol', 'atol', 'max_cache_entries', and 'collect_stats'. By setting these attributes, caching and zero checking can be turned on and off independently.

My idea for determining if two arrays are parallel did end up being much faster in the best case and 2 to 3x slower in the worst case (worst case was when the arrays were actually parallel). For now I'm just using your original algorithm.

I also added display of cache hits and zero check hits so we can get a feel for how often our rhs checks actually help.

I added a relevance check too that warns if caching has been turned on but there is no dependency between responses that would cause the same adjoint to be computed more than once.

For example1 and example2, I get this warning:

OpenMDAOWarning:'AS_point_1.coupled' <class Group>: 'rhs_checking' is active but no redundant adjoint dependencies were found, so caching is unlikely to be beneficial.

these are the stats for example1:

'AS_point_1.coupled' <class Group>: {'eqhits': 0, 'neghits': 0, 'parhits': 0, 'zerohits': 0, 'misses': 2, 'resets': 1}
'AS_point_0.coupled' <class Group>: {'eqhits': 0, 'neghits': 1, 'parhits': 0, 'zerohits': 0, 'misses': 2, 'resets': 1}

and these are for example2:

'AS_point_1.coupled' <class Group>: {'eqhits': 0, 'neghits': 0, 'parhits': 0, 'zerohits': 0, 'misses': 1, 'resets': 1}
'AS_point_0.coupled' <class Group>: {'eqhits': 0, 'neghits': 0, 'parhits': 1, 'zerohits': 0, 'misses': 2, 'resets': 1}

What do you think about the warning related to the adjoint dependencies? Does it make sense to check that, or do you see problems with doing it that way?

@kanekosh
Copy link
Contributor Author

kanekosh commented Apr 5, 2024

@naylor-b I think both the warning and stats are helpful.
This is more like a question/idea (not a request), but would it be possible to do the other way using the relevance check, i.e., detect redundant adjoints and maybe automatically enable the caching?
Edit: I just found you added another warning that does this, I think it's really helpful! Personally, I don't think you'd need to automatically enable caching in such cases, and the current warning is enough.

@naylor-b
Copy link
Member

naylor-b commented Apr 5, 2024

Another possibility would be to add an entry to the dict called 'auto', and if you set that to True then it would automatically activate the caching if it was needed based on the relevance. So something like:

self.linear_solver.options['rhs_checking'] = {'auto': True, ...}

Also, is it ok that we're using the same reltol and abstol for both the equality check (and negative check) and for the parallel check, or do those need different tolerances?

@kanekosh
Copy link
Contributor Author

kanekosh commented Apr 5, 2024

auto sounds helpful, although I don't have a strong preference.

Also, is it ok that we're using the same reltol and abstol for both ...

Yes, I think that's ok. I don't think separate tolerances would be necessary.

@robfalck
Copy link
Contributor

@kanekosh Is the implementation PR good with you? If so I'll consider this POEM accepted.

Added implementation PR
@kanekosh
Copy link
Contributor Author

The PR looks good to me, thank you!

@robfalck robfalck merged commit 958b975 into OpenMDAO:master Apr 18, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants