-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fluids: Add PyTorch external DD SGS evaluation #1581
base: main
Are you sure you want to change the base?
Conversation
30f4204
to
50af404
Compare
ab6b800
to
984a964
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To explain the reasoning for this file:
CMake is the only supported build system for PyTorch/libTorch. I initially tried to write a small CMake script to just get the necessary compiler flags, but this ended up not working because normal pip-installed versions of PyTorch have libraries built with CUDA and CMake requires that CUDA be available for those libraries even though it's completely unnecessary if you're not using the CUDA backends of PyTorch.
To circumvent this, the PyTorch python package gives some useful information to build against the underlying libTorch (include and library directories mostly).
However, it still doesn't give required compiler flags, namely GLIBCXX_USE_CXXX11_ABI
. To get around that, I parse the TorchConfig.cmake
file which has that information embedded in it. Rather than break out my grep-fu, I decided it'd be easiest to do it all in Python since I'm using Python for the include and library paths (and for where TorchConfig.cmake
is located).
The easiest way I could think to transfer all this information to the Makefile
while keeping the number of Python startups (which can take about 5 seconds per startup depending on the system in my experience) to a minimum is to write it to a pkgconfig file from Python that the make file then uses. Yeah, this relies on some POSIX compliant file system behavior, but I figured that's the case for most build environments.
Any who, thanks for attending my TED talk/rant.
26b57d3
to
31dc6d3
Compare
This is good for review now. Tests pass on Noether for CPU and CUDA. I'm having some difficulties getting PyTorch+ROCM to build with Spack, but that's not a huge priority since we don't have hardware to run that on right now. |
- Rename sequential_internal -> *_ceed
To have the log_events accessbile to torch (in C++), I needed to separate out the header file containing the extern PetscLogEvent declarations. While I was at it, I figured it'd be more clear to have a separate log_events.c file as well to have the actual "storage" of the PetscLogEvents and the RegisterLogEvents function itself.
On Sunspot, on-device inference is not working reliably. I'm not sure exactly why at the moment (whether it's a libCEED backend issue or something else).
31dc6d3
to
8d82690
Compare
Add PyTorch as a external DD SGS evaluation. This is a "follow up" to #1361 where we use PyTorch to run the data-driven model instead of a native implementation.
ToDos:
PetscLogEvents
(particularly for inference and the data transfer steps)USE_LIBTORCH
testing)