Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cantera 2.6.0: Multiple test failures if built with Lapack support and with Sundials 6.2.0 #1419

Closed
band-a-prend opened this issue Dec 26, 2022 · 12 comments
Labels
compiling tests Issues about tests, running the tests, or test results

Comments

@band-a-prend
Copy link
Contributor

band-a-prend commented Dec 26, 2022

Problem description

When compile Cantera 2.6.0 with Lapack support and Sundials 6.2.0 (system library) and then run scons test the multiple CXX and PYTHON tests failure.

Steps to reproduce

  1. Build Cantera 2.6.0 with command
    scons -j4 build CC=x86_64-pc-linux-gnu-gcc CXX=x86_64-pc-linux-gnu-g++ cc_flags=-march=native -O2 -pipe cxx_flags=-std=c++11 debug=no FORTRAN=x86_64-pc-linux-gnu-gfortran FORTRANFLAGS=-march=native -O2 -pipe optimize_flags=-Wno-inline renamed_shared_libraries=no use_pch=no system_fmt=y system_sundials=y system_eigen=y system_yamlcpp=y env_vars=all extra_inc_dirs=/usr/include/eigen3 blas_lapack_libs=lapack,blas f90_interface=y python_package=full python_cmd=python3.10 prefix=/usr

i.e. scons options:

CC = 'x86_64-pc-linux-gnu-gcc'
CXX = 'x86_64-pc-linux-gnu-g++'
cc_flags = '-march=native -O2 -pipe'
prefix = '/usr'
python_package = 'full'
python_cmd = 'python3.10'
f90_interface = 'y'
FORTRAN = 'x86_64-pc-linux-gnu-gfortran'
FORTRANFLAGS = '-march=native -O2 -pipe'
system_eigen = 'y'
system_fmt = 'y'
system_yamlcpp = 'y'
system_sundials = 'y'
blas_lapack_libs = 'lapack,blas'
env_vars = 'all'
use_pch = False
optimize_flags = '-Wno-inline'
debug = False
extra_inc_dirs = '/usr/include/eigen3'
renamed_shared_libraries = False
  1. Run 'scons test`
  2. See error about numerous test failures (see attached build.log) due to CanteraError thrown by CVodesIntegrator::step.

Behavior

Expected that all tests except skipped will be pass like if build Cantera against Sundials 5.2.0 instead of Sundials 6.2.0 (affected version).

System information

  • Cantera version: 2.6.0 (gentoo package)
  • OS: Gentoo Linux
  • Python/MATLAB/other software versions: Python 3.10.9, Sundials 6.2.0 (isn't affected with Sundials 5.2.0), GCC 11.3.x

Attachments
build.log

Additional context

The same error I saw with Sundials 5.8.0.
Could I provide additional information?

@ischoegl
Copy link
Member

ischoegl commented Dec 27, 2022

Sundials 6.2.0 was not tested when Cantera 2.6 was released (at that point, CI was tested and passed for Sundials 2, 3, 4, 5.8, 6.0); the relevant CI runner (which also linked against LAPACK) is

multiple-sundials:
name: Sundials ${{ matrix.sundials-ver }}
runs-on: ubuntu-latest
timeout-minutes: 60
env:
PYTHON_VERSION: 3.8
defaults:
run:
shell: bash -l {0}
strategy:
matrix:
sundials-ver: [ 2, 3, 4, 5.8, 6.0 ]
fail-fast: false
steps:
- uses: actions/checkout@v2
name: Checkout the repository
with:
submodules: recursive
- uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: ${{ env.PYTHON_VERSION }}
miniforge-version: latest
activate-environment: test
name: Set up conda
- name: Install conda dependencies
# See https://github.com/conda-forge/boost-cpp-feedstock/issues/41 for why we
# use boost-cpp rather than boost from conda-forge
run: |
conda install -q sundials=${{ matrix.sundials-ver }} scons numpy ruamel.yaml \
cython boost-cpp fmt eigen yaml-cpp h5py pandas libgomp openblas pytest
- name: Build Cantera
run: |
scons build system_fmt=y system_eigen=y system_yamlcpp=y system_sundials=y \
blas_lapack_libs='lapack,blas' -j2 VERBOSE=True debug=n \
optimize_flags='-O3 -ffast-math -fno-finite-math-only'
- name: Test Cantera
run: scons test show_long_tests=yes verbose_tests=yes

While not necessarily related, some deprecation warnings (#1349) were only recently fixed in #1406. I am not sure that there will be a patch for 2.6, so the recommended resolution would be to compile 2.6 with a less recent version of Sundials, while fixing underlying issues for the upcoming 3.0 release.

@band-a-prend
Copy link
Contributor Author

It's seem that issue is related Sundials built with gcc 11.x: in my system if sundials 5.2, 5.8 or 6.2 (presented in repo) are built with lapack+int64 support then it's lapack tests are failed. And there are no test failures if built it with gcc 12.x and cantera tests are passed too. Moreover the sundials 6.5.0 isn't affected this issue for me both gcc 11.x and gcc 12.x (with cantera too).

@ischoegl
Copy link
Member

Interesting. Based on your report there are a large number of failures for CVodesIntegrator::integrate that are dependent on the gcc version. That part of the code is largely unchanged for the upcoming release - could you see whether the errors are the same still?

@speth
Copy link
Member

speth commented Jan 2, 2023

Cantera does not support the use of BLAS/LAPACK libraries that use 64-bit integer types. I'm surprised you're finding it works with any compiler or Sundials version.

One of the problems is I don't think there's a general way to check from the BLAS/LAPACK library alone whether it was built for 32 or 64 bit integer types. I guess for Sundials, there should be a #define constant that we can check, but this doesn't protect against linking to the wrong BLAS/LAPACK libraries.

@band-a-prend
Copy link
Contributor Author

@ischoegl
I will try to recheck and compare test logs on this week. Maybe will recheck on my Linux Mint notebook too as alternative system or at Ubuntu virtual box system that will be faster.

@speth
It's not blas/lapack with int64 but Sundials with enabled of SUNDIALS_INDEX_SIZE 64 bit (set up as default).

@speth
Copy link
Member

speth commented Jan 3, 2023

Ah, I see. I was unaware that Sundials provided flexibility in this regard. Cantera currently assumes that the Sundials index type is just long int, which is a 64 bit integer on 64-bit Linux and macOS systems, but not in some other cases, such as 32-bit Linux or either 32 or 64-bit Windows (see https://en.cppreference.com/w/cpp/language/types).

However, all usage of this type (look for sd_size_t in CVodesIntegrator.cpp) is fairly simple, and I don't think there are any places where this could overflow and cause problems. If you can isolate this in an easier-to-reproduce location (anything that can be spun up in Docker, for example), that would be appreciated.

@band-a-prend
Copy link
Contributor Author

@ischoegl
If seems that Ubuntu-22.04.1 I tried to test has libsundials-dev (library pack) built without lapack support and if system_sundials=y option is used then lapack support will disabled for me.

The additional problem is that system_sundials=n causes failed to build >Sundials-5.4.0 because scons rules doesn't disable xbraid support for Arcode due to braid.h absence.

But as I can see the tests are run with Conda installed packages including Sundials and it has Lapack support enabled (e.g. https://github.com/Cantera/cantera/actions/runs/3853787085/jobs/6567107735). But this test env uses GCC-12.2.0

The re-testing in other env/OS for GCC-11.x takes more time :(

@speth
About using Sundials int: the https://github.com/Cantera/cantera/blob/main/ext/sundials_config.h.in states #define SUNDIALS_INT32_T 1 and #define SUNDIALS_INDEX_TYPE int32_t. Does it mean int32 is forced for system_sundials=n?

I noticed that Sundials 5.4.0 Changelog states:

This change may cause a runtime error in existing user code.
In IDAS and CVODES, the functions for forward integration with checkpointing
(IDASolveF, CVodeF) are now subject to a restriction on the number of time
steps allowed to reach the output time. This is the same restriction applied to
the IDASolve and CVode functions. The default maximum number of steps is
500, but this may be changed using the <IDA|CVode>SetMaxNumSteps function.
This change fixes a bug that could cause an infinite loop in the IDASolveF
and CVodeF and functions.

A minor inconsistency in CVODE(S) and a bug ARKODE when checking the Jacobian
evaluation frequency has been fixed. As a result codes using using a
non-default Jacobian update frequency through a call to
CVodeSetMaxStepsBetweenJac or ARKStepSetMaxStepsBetweenJac will need to
increase the provided value by 1 to achieve the same behavior as before. For
greater clarity the functions CVodeSetMaxStepsBetweenJac,
ARKStepSetMaxStepsBetweenJac, and ARKStepSetMaxStepsBetweenLSet have been
deprecated and replaced with CVodeSetJacEvalFrequency,
ARKStepSetJacEvalFrequency, and ARKStepSetLSetupFrequency respectively.
Additionally, the function CVodeSetLSetupFrequency has been added to CVODE(S)
to set the frequency of calls to the linear solver setup function.

In 6.5.0: Fixed an underflow bug during root finding in ARKODE, CVODE, CVODES, IDA and IDAS.
and in 6.4.0 Fixed a memory leak in CVODE and CVODES where the projection memory would not be deallocated when calling CVodeFree.

@ischoegl
Copy link
Member

If seems that Ubuntu-22.04.1 I tried to test has libsundials-dev (library pack) built without lapack support and if system_sundials=y option is used then lapack support will disabled for me.

This sounds like the expected behavior.

The additional problem is that system_sundials=n causes failed to build >Sundials-5.4.0 because scons rules doesn't disable xbraid support for Arcode due to braid.h absence.

I'm not familiar with Arcode (or additional settings you may have used), but missing includes can be addressed with the SCons extra_inc_dirsoption. Overall, I assume that much of this (xbraid?) can be addressed by setting SCons options manually. In most cases, SCons knows what to do for a 'standard' configuration, but the implemented logic may not be sufficient to correctly augment sophisticated configurations.

But as I can see the tests are run with Conda installed packages including Sundials and it has Lapack support enabled (e.g. https://github.com/Cantera/cantera/actions/runs/3853787085/jobs/6567107735). But this test env uses GCC-12.2.0.

The re-testing in other env/OS for GCC-11.x takes more time :(

For these tests, CI uses the default compiler for ubuntu-latest; as this is 22.04 LTS, this should be GCC-11.2? (see gcc versions)

Based on this, it appears that the problem may not be reproducible for main?

@band-a-prend
Copy link
Contributor Author

Overall, I assume that much of this (xbraid?) can be addressed by setting SCons options manually.

Yes, I think the problem with arcode should be solved when I will pass `CPPDEFINES={'ENABLE_XBRAID': 0}) the way it done for googletest in ext/SConscript.

For these tests, CI uses the default compiler for ubuntu-latest; as this is 22.04 LTS, this should be GCC-11.2?

I assumed the tests for sundials 5.8 used gcc 12 because of conda configuration refers libgcc-ng-12 (raw log):
https://github.com/Cantera/cantera/commit/9989491497749a9528ed3bb3744391a7b6a89258/checks/10478157516/logs

But maybe it's doesn't mean the same version of conda libgcc-ng is used for gcc to build.

Anyway the main problem is insufficient free time for me to dive into problem and create stable reproducer. But I will try to prepare it.

@speth
Copy link
Member

speth commented Jan 21, 2023

The additional problem is that system_sundials=n causes failed to build >Sundials-5.4.0 because scons rules doesn't disable xbraid support for Arcode due to braid.h absence.

There seems to be a bit of confusion here. If you compile with system_sundials=n, then you should be using the vendored copy of Sundials that's in ext/sundials, which is presently Sundials 5.3.0. I think it's to be expected that updates to the Cantera build scripts will be required when we update this vendored version. However, that update isn't something that I see as particularly urgent, since we don't rely on of the new features and I don't think any of the bug fixes affect our usage.

That said, I'm not sure why a flag related to ARKODE would need to be set -- we only compile CVODES, IDAS, and the necessary linear algebra modules from the vendored Sundials.

@ischoegl ischoegl added compiling tests Issues about tests, running the tests, or test results labels Mar 26, 2023
@band-a-prend
Copy link
Contributor Author

Finaly I had no time to quick adopt the scons scripts to build appropriate Sundials configuration under Ubuntu system to reproduce it.
I suppose it could be GCC + Sundials versions related issue and as modern Sundials usuals comes with newer GCC where this issue isn't observed. Also in Gentoo system GCC has additional patches and it may be Gentoo specfic problem.

Could I close this issue if I will not observe it on Cantera 3.0.0b1 for latest system Sundials and GCC?

@speth
Copy link
Member

speth commented Jul 15, 2023

Thanks for the update @band-a-prend. I think if you're not seeing this bug anymore with Cantera 3.0.0b1, whether it's from a change in Cantera or some other software versions installed on Gentoo, we can safely close this. Of course, if it pops up again, you can always create a new issue.

@speth speth closed this as completed Jul 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiling tests Issues about tests, running the tests, or test results
Projects
No open projects
Development

No branches or pull requests

3 participants