-
Notifications
You must be signed in to change notification settings - Fork 46
Trilinos refactoring and updating #424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* main packages updates: Epetra->Tpetra; AztecOO->Belos; Ifpack->Ifpack2; ML->MueLu * refactor linear algebra backend to use Teuchos and Kokkos: - wrap Tpetra vectors and matrices with Teuchos::RCP for safer memory management and automatic reference counting. - use Kokkos (via Tpetra::Node/KokkosDeviceWrapperNode) to abstract parallel execution (same code to run efficiently on CPU and GPU theoretically) * removed all static objects * change of IC/ICT preconditioners in favor of RILUK(0)/RILUK(1)
* removed ubuntu20 and ubuntu22 folders. Created ubunut/ folder that contains dockerfile based on ubuntu-latest * modified the solver/dockerfile to use the ubuntu-latest docker container as a base
…ner: * simvascular/libraries:latest is the image containing the latest trilinos built
* increased the maximum iteration number for linear solver * add the fluid case iliac artery with ml preconditioner
ktbolt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcodoni Nice work !
Make sure to update the Trilinos build instructions in the README.
* kokkos::finalize has been moved inside the TrilinosImpl class, fixing the compilation errors whenever trilinos is not used * trilinos linear solver residual test has been properly set, fixing first non-linear iteration problem of maxing out the linear iterations reducing excessively and unnecessarily the residual
ktbolt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good !
* string length was computed incorrectly, leading to silent buffer overflow. * this only surfaced as a critical memory crash when linking with the latest Trilinos libraries.
…ics into trilinos-refactoring
- Added --ignore-errors mismatch to avoid geninfo line mismatch errors - Added --rc geninfo_unexecuted_blocks=1 to silence GCC warnings
|
@dcodoni What's the plan here ? |
|
@ktbolt macos now is failing the ustruct tests even though they were succeeding before. The coverage problem I honestly have no idea yet! |
|
@dcodoni MacOS tests failed because of an Ubuntu is failing because of a |
|
@ktbolt the ubuntu test fails during generation of code coverage with error: geninfo: ERROR: mismatched end line. I tried then to ignore this type of errors by including in the CmakeLists the following flags: --ignore-errors mismatch --rc geninfo_unexecuted_blocks=1. This made it go further but at the end I got another make error, which I haven't investigated still. |
|
Should we also exclude the |
|
@mrp089 that is actually a good idea, and I think it may solve the problem I have been having |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #424 +/- ##
==========================================
- Coverage 67.47% 67.35% -0.13%
==========================================
Files 170 169 -1
Lines 32622 32620 -2
Branches 5740 5718 -22
==========================================
- Hits 22012 21970 -42
- Misses 10472 10513 +41
+ Partials 138 137 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@dcodoni I think it is cleaner to exclude the unit tests from the coverage step, the macros used by the unit tests are probably causing problems. |
* Built solver sources into a shared object library (solver_objs) so they can be reused by both the main executable and unit tests. This avoids recompiling solver sources twice when ENABLE_UNIT_TEST is enabled. * Excluded unit tests from code coverage by removing coverage flags from the run_all_unit_tests target (-fno-profile-arcs, -fno-test-coverage). This prevents .gcda files from being generated for test code, ensuring coverage only reflects solver sources.
from the coverage.
…ics into trilinos-refactoring
…e section: * --ignore-errors mismatch --rc geninfo_unexecuted_blocks=1 * --ignore-errors unused These flags allowed to save a coverage file even in presence of errors of mismatch type. Fix some parametres setting in the Trilinos GMRES solver: krylov space, max iteration and max restarts.
Modified tests.yml to use the latest libraries container that has been built with latest Ubuntu and contains updated Trilinos library.
* add timer for the linear solver in Trilinos * add the possibility to read the MueLu Options for the ML preconditioner from file mueluOptions.xml if file is not present built-in parameters are used
|
@ktbolt I found some allocation problems in the Trilinos implementation that showed up only at high number of cores and for CFD problems. I am trying to investigate it. Let's not merge it now. |
|
@dcodoni Yes I was wondering if something like that might show up. |
* the localt to global map was created in sorted order, while the graph allocated in unsorted order * the graph is now created using the sorted order like the map
|
@ktbolt I fixed the bug in the LHS matrix graph creation. The problem was in the allocation of the non-zero elements per row during graph creation. The local to global map of the matrix used the sorted ordering while the graph was using information in unsorted ordering. This mismatch caused the wrong allocation problem, since Tpetra does not allow for dynamic allocation unlike Epetra. |
|
@dcodoni Tricky bug to fix, good job ! |
* Trilinos updated: * main packages updates: Epetra->Tpetra; AztecOO->Belos; Ifpack->Ifpack2; ML->MueLu * refactor linear algebra backend to use Teuchos and Kokkos: - wrap Tpetra vectors and matrices with Teuchos::RCP for safer memory management and automatic reference counting. - use Kokkos (via Tpetra::Node/KokkosDeviceWrapperNode) to abstract parallel execution (same code to run efficiently on CPU and GPU theoretically) * removed all static objects * change of IC/ICT preconditioners in favor of RILUK(0)/RILUK(1) * Updating dockerfiles: * removed ubuntu20 and ubuntu22 folders. Created ubunut/ folder that contains dockerfile based on ubuntu-latest * modified the solver/dockerfile to use the ubuntu-latest docker container as a base * Updated dockerfiles and workflow file to use the latest docker container: * simvascular/libraries:latest is the image containing the latest trilinos built * Update the input files of trilinos related test cases for fluid and fsi * increased the maximum iteration number for linear solver * add the fluid case iliac artery with ml preconditioner * Modifications and improvements: * kokkos::finalize has been moved inside the TrilinosImpl class, fixing the compilation errors whenever trilinos is not used * trilinos linear solver residual test has been properly set, fixing first non-linear iteration problem of maxing out the linear iterations reducing excessively and unnecessarily the residual * Updating Trilinos packages list in README.md * Final update on dockerfile * Fix buffer overflow in genBC_Integ_X (set_bc.cpp): * string length was computed incorrectly, leading to silent buffer overflow. * this only surfaced as a critical memory crash when linking with the latest Trilinos libraries. * Changes to CMake to fix coverage and unit test mismatch * Modifying cmake file t ofix the coverage error * Fix coverage target for GCC (lcov/geninfo): - Added --ignore-errors mismatch to avoid geninfo line mismatch errors - Added --rc geninfo_unexecuted_blocks=1 to silence GCC warnings * fix coverage: correct path expansion in 'lcov --remove ' command * Removing the unit tests from coverage * Refactor CMake: unify solver build, exclude unit tests from coverage * Built solver sources into a shared object library (solver_objs) so they can be reused by both the main executable and unit tests. This avoids recompiling solver sources twice when ENABLE_UNIT_TEST is enabled. * Excluded unit tests from code coverage by removing coverage flags from the run_all_unit_tests target (-fno-profile-arcs, -fno-test-coverage). This prevents .gcda files from being generated for test code, ensuring coverage only reflects solver sources. * Removing the unit tests folder '*/CMakeFiles/run_all_unit_tests.dir/*' from the coverage. * Remove '*/CMakeFiles/run_all_unit_tests.dir/*' from the coverage list * Adding the following flags in the Cmake file under the enable_coverage section: * --ignore-errors mismatch --rc geninfo_unexecuted_blocks=1 * --ignore-errors unused These flags allowed to save a coverage file even in presence of errors of mismatch type. Fix some parametres setting in the Trilinos GMRES solver: krylov space, max iteration and max restarts. * Updates in new trilinos implementation: * add timer for the linear solver in Trilinos * add the possibility to read the MueLu Options for the ML preconditioner from file mueluOptions.xml if file is not present built-in parameters are used * Using fsils for the mesh equation in FSI test cases * Fixed allocation problem in the graph creation: * the localt to global map was created in sorted order, while the graph allocated in unsorted order * the graph is now created using the sorted order like the map
The Trilinos implementation is updated to use the latest packages available in the library. (address #422 )
Current situation
Currently the Trilinos implementation relies on archived, not supported or deprecated packages. In particular, it is based on the following packages:
Since Ifpack and ML are deprecated, they are not developed anymore and new optimized preconditioners will not be available in our solver.
Release Notes
Packages updated:
The core package Teuchos is used for memory management, communication utilities, parameter lists, and interoperability across Trilinos packages.
All updated packages are wrapped under a Kokkos Node abstraction, enabling the same code to target different execution spaces. This provides the possibility of running for example on GPUs in the future, while still supporting efficient CPU execution when GPU usage is not required.
The workflow files are updated to use the latest docker image which contains the updated Trilinos built.
The dockerfiles are updated also for reference.
Documentation
Documentation about Trilinos linear algebra should be added.
Testing
Code of Conduct & Contributing Guidelines