Skip to content

Feature/device capable codegen support#292

Closed
J-Simkin wants to merge 28 commits intoModellingWebLab:masterfrom
J-Simkin:feature/device-capable-codegen-support
Closed

Feature/device capable codegen support#292
J-Simkin wants to merge 28 commits intoModellingWebLab:masterfrom
J-Simkin:feature/device-capable-codegen-support

Conversation

@J-Simkin
Copy link

@J-Simkin J-Simkin commented Mar 3, 2026

Description

I have modified codegen to emit an extra file for standard BackwardEuler and Cvode models that provide GPU (device) kernels. These kernels will be used by future updates to Chaste in order to solve the models on device. These changes depend on the updates mentioned in Chaste/Chaste#497, and hence should not be merged until Chaste itself has been updated,

CPU codegen changes

  • Modifed ChastePrinter to be device agnostic, see the description within Device capable codegen support Chaste/Chaste#497 as to how this works.
  • Updated templates backward_euler_model.hpp and cvode_model.hpp to include necessary constants and declarations required for device solves, these both being guarded by a macro that ensures this still works when compiling it as regular C++.
  • Kernels definitions have been included at the bottom of these templates (rather than being written into the hpp itself) and are similarly guarded like the kernel declarations. (note this is against the recommended code structure for Chaste, but I believe factoring these kernels into a new autogenerated file is a good idea to separate the GPU implementations)
  • I have added a guarded include in backward_euler_model.hpp to the file that provides the kernels that perform the backward euler solve. This file is not included in the changes suggested in #497 and is saved for a future Chaste update. Also note this include could probably be brought to the top of the file in the future.

GPU codegen changes

  • Added new context manager inside of ChasteModel to be used within a python with block such that we can modify our formatting functions such as _format_rY_lookup to be conditioned on the current context and emit the appropriate C++ code depending on if we are currently generating for CPU or GPU.
  • Added new variables to the jinja context for use in device kernels, specifically:
    • y_derivative_equations_device
    • rY_lookup_device
  • And within the the Cvode model's context:
    • jacobian_equations_device
    • jacobian_entries_device
    • sparse_jacobian_entries
    • sparse_rowptr
    • sparse_colind
    • sparse_nnz
  • Added new templates: backward_euler_kernels.hpp and cvode_model_kernels.hpp
  • Modifed the command line script to take a new argument on whether or not to generate kernels

Motivation and Context

This changes will allow us to run these models on GPU within Chaste.
Relates to Chaste/Chaste#497 and will be used by future changes to Chaste that provide this support. This is a breaking change in that Chaste will not work with this codegen until the previous issue has been resolved.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Documentation checklist

  • I have updated all documentation in the code where necessary.
  • I have checked spelling in all (new) comments and documentation.
  • I have added a note to RELEASE.md if relevant (new feature, breaking change, or notable bug fix).
  • I have updated version & citation.txt & citation.cff version.

I have written and updated the Docstrings which I believe covers the two checked points above.

Testing

  • Testing is done automatically and codecov shows test coverage
  • This cannot be tested automatically

To test my new additions, I have updated the reference models to match the new output, and I have modified test_codegen.py to tell appropriate models to generate the kernels as well as the original files. I also modified conftest.py::compare_model_against_reference to check if the model is generating kernels, and then behave accordingly.

I also updated the txt files within chaste_codegen/data/tests to match the new codegen output and use CHASTE_MATH etc.

Finally I also updated the test_console_script_help.txt and test_console_script_usage.txt files to match my new command line argument.

Hence codegen can be tested automatically.

All tests within Chaste pass (not just the Continuous testpack).

Numerical validation of kernels

Although the kernels are currently unused in Chaste, I have tested their correctness through using them in my local GPU implementation and writing to file.

I then compared all state variables calculated in the GPU version to the equivalent CPU implementation for every single time point and found the maximum absolute errors for both BE and Cvode implementations, testing BE with both float and double on GPU.

I got the following results:
LuoRudy1991BackwardEuler vs LuoRudy1991BackwardEulerGpuFloat:
error < $10^{-2}$ for all variables at all time points

LuoRudy1991BackwardEuler vs LuoRudy1991BackwardEulerGpuDouble:
error < $10^{-13}$ for all variables at all time points

LuoRudy1991Cvode vs LuoRudy1991CvodeGpuDoubleDense:
error generally < $10^{-2}$ but there are large error spikes and abnormalities

LuoRudy1991Cvode vs LuoRudy1991CvodeGpuDoubleSparse:
error generally < $10^{-2}$ but there are large error spikes and abnormalities

The Cvode kernels are mathematically correct and safe to merge but the solver itself requires some more tweaking to fix the described behaviour.

…or GPU functions to template writer rather than python codegen script
…ll instead of emitting a functor for each cpu cell
… GPU array generation for CVODE using context manager
…eted unused logic and added some safety feature
…files (even if they are not written to a file)
@J-Simkin
Copy link
Author

J-Simkin commented Mar 8, 2026

On reflection, the changes I have suggested here could be implemented in a much cleaner way. Instead of modifying the existing ChastePrinter, we could make a new printer which implements the GPU changes and then using the new device_mode context that I suggested adding, we could just use this to switch between a device printer and a CPU printer (the standard printer) where necessary.

This means none of the CPU tests would need to change and there is no risk of us breaking something that already works and the autogenerated code should be more readable and we can avoid macro switching.

I'll have to revisit this when I get the time.

@J-Simkin J-Simkin closed this Mar 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant