fluids: Add PyTorch external DD SGS evaluation #1581

jrwrigh · 2024-05-09T22:52:33Z

Add PyTorch as a external DD SGS evaluation. This is a "follow up" to #1361 where we use PyTorch to run the data-driven model instead of a native implementation.

ToDos:

Add in PetscLogEvents (particularly for inference and the data transfer steps)
Add documentation
Cleanup the Makefile additions (remove the automatic USE_LIBTORCH testing)
Add weak symbols so that this can be compiled without pytorch
Add command line switch to enable pytorch vs internal sequential
Move createPyTorchModel into the testing directory (it's only useful for documenting the creation of the testing model (instead of it just being a binary blob).
Rename "sequential_internal" to "sequential_ceed"

jrwrigh · 2024-05-20T22:23:21Z

examples/fluids/pytorch_pkgconfig.py

To explain the reasoning for this file:

CMake is the only supported build system for PyTorch/libTorch. I initially tried to write a small CMake script to just get the necessary compiler flags, but this ended up not working because normal pip-installed versions of PyTorch have libraries built with CUDA and CMake requires that CUDA be available for those libraries even though it's completely unnecessary if you're not using the CUDA backends of PyTorch.

To circumvent this, the PyTorch python package gives some useful information to build against the underlying libTorch (include and library directories mostly).

However, it still doesn't give required compiler flags, namely GLIBCXX_USE_CXXX11_ABI. To get around that, I parse the TorchConfig.cmake file which has that information embedded in it. Rather than break out my grep-fu, I decided it'd be easiest to do it all in Python since I'm using Python for the include and library paths (and for where TorchConfig.cmake is located).

The easiest way I could think to transfer all this information to the Makefile while keeping the number of Python startups (which can take about 5 seconds per startup depending on the system in my experience) to a minimum is to write it to a pkgconfig file from Python that the make file then uses. Yeah, this relies on some POSIX compliant file system behavior, but I figured that's the case for most build environments.

Any who, thanks for attending my TED talk/rant.

jrwrigh · 2024-05-27T03:27:10Z

This is good for review now. Tests pass on Noether for CPU and CUDA. I'm having some difficulties getting PyTorch+ROCM to build with Spack, but that's not a huge priority since we don't have hardware to run that on right now.

- Rename sequential_internal -> *_ceed

To have the log_events accessbile to torch (in C++), I needed to separate out the header file containing the extern PetscLogEvent declarations. While I was at it, I figured it'd be more clear to have a separate log_events.c file as well to have the actual "storage" of the PetscLogEvents and the RegisterLogEvents function itself.

On Sunspot, on-device inference is not working reliably. I'm not sure exactly why at the moment (whether it's a libCEED backend issue or something else).

jrwrigh added examples 0-WIP labels May 9, 2024

jrwrigh self-assigned this May 9, 2024

jrwrigh force-pushed the jrwrigh/pytorch_external_sgs branch 2 times, most recently from 30f4204 to 50af404 Compare May 9, 2024 23:00

jrwrigh force-pushed the jrwrigh/pytorch_external_sgs branch 4 times, most recently from ab6b800 to 984a964 Compare May 20, 2024 14:29

jrwrigh marked this pull request as ready for review May 20, 2024 21:57

jrwrigh added 1-In Review and removed 0-WIP labels May 20, 2024

jrwrigh commented May 20, 2024

View reviewed changes

jrwrigh force-pushed the jrwrigh/pytorch_external_sgs branch 3 times, most recently from 26b57d3 to 31dc6d3 Compare May 26, 2024 19:42

jrwrigh requested review from rickybalin, jeremylt and jedbrown May 28, 2024 15:19

jrwrigh added 6 commits May 28, 2024 13:25

fluids: Add PyTorch external DD SGS evaluation

04c777e

- Rename sequential_internal -> *_ceed

doc(fluids): Add documentation for pytorch SGS

30f5af9

fluids: Use TORCH_DEVICE_CPU for sycl backends

1ac88de

On Sunspot, on-device inference is not working reliably. I'm not sure exactly why at the moment (whether it's a libCEED backend issue or something else).

fluids: Force recompilation if Makefile changes

eefde61

test(fluids): Add SGS torch testing

8d82690

jrwrigh force-pushed the jrwrigh/pytorch_external_sgs branch from 31dc6d3 to 8d82690 Compare May 28, 2024 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fluids: Add PyTorch external DD SGS evaluation #1581

fluids: Add PyTorch external DD SGS evaluation #1581

jrwrigh commented May 9, 2024 •

edited

jrwrigh May 20, 2024 •

edited

jrwrigh commented May 27, 2024

fluids: Add PyTorch external DD SGS evaluation #1581

Are you sure you want to change the base?

fluids: Add PyTorch external DD SGS evaluation #1581

Conversation

jrwrigh commented May 9, 2024 • edited

jrwrigh May 20, 2024 • edited

Choose a reason for hiding this comment

jrwrigh commented May 27, 2024

jrwrigh commented May 9, 2024 •

edited

jrwrigh May 20, 2024 •

edited