Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABI break in CHOLMOD 5.1.0 without SOVERSION bump #741

Closed
svillemot opened this issue Jan 17, 2024 · 11 comments
Closed

ABI break in CHOLMOD 5.1.0 without SOVERSION bump #741

svillemot opened this issue Jan 17, 2024 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@svillemot
Copy link
Contributor

In commit 0b8b9a7, the field blas_dump was added to the structure cholmod_common. The size of the structure has thus been modified.

This is an ABI break. As a consequence, the SOVERSION of libcholmod should have been bumped (i.e. libcholmod.so.5 should have been renamed libcholmod.so.6).

This unhandled ABI break is causing crashes in various Debian packages, see http://bugs.debian.org/1061049

As a reminder, the SOVERSION should be increased whenever there is a backward-incompatible change in the application binary interface (ABI), i.e. whenever a program compiled against the old version of the library needs to be recompiled in order to run against the new version of the library. Also note that even though in most SuiteSparse libraries the major version number and the SOVERSION are identical, the two are fundamentally different concepts. For example, one has to increase the SOVERSION when a minor and obsolete exported function is removed, but this is arguably not a change significant enough to warrant a major version bump; conversely, adding many new functions can justify a major version bump, but this should not lead to a SOVERSION bump if nothing is broken in the existing ABI.

@DrTimothyAldenDavis
Copy link
Owner

Oops. My mistake.

What's the least disruptive way to resolve this, in your opinion? I could simply call CHOLMOD v6. It has undergone a lot of changes with the addition of single precision support (but that was what v5 was supposed to indicate).

I could #ifdef out the blas_dump variable, since it's only meant for diagnosing performance issues in the BLAS. It could change to

#ifdef BLAS_DUMP
FILE *blas_dump ; 
#endif

and then BLAS_DUMP would not be used in production, just development. If this is the only ABI-breaking change, then I could call this CHOLMOD 5.1.2.

Which option is least disruptive, do you think?

@dkogan
Copy link

dkogan commented Jan 17, 2024

Hi. If the ABI change could be un-done (with an #ifdef for instance), that would be the least-churn option. I would have several questions/comments in that case:

  • If blas_dump is usually disabled with an #ifdef, then enabling it will break the ABI. Whoever does that will need to be aware of that, and would need to rebuild all dependents. If this is just for testing stuff, that's probably ok. I'd suggest a comment be added next to the #ifdef saying to make sure everything is rebuilt.
  • This change happened in release 7.4, which is already out. And 7.5 is out as well. Those two already break the ABI in respect to 7.3. With the #ifdef you would release a 7.6 (or something like that), which breaks the ABI again, relative to 7.5, to make it compatible with 7.3 again. If any users of 7.4 or 7.5 are out there, then THEY would be complaining about the ABI break. I don't know if there're a lot of them out there.
  • I'd want to check for other potential ABI breaks. Let me run an automated tool, and reply here with the results.

@DrTimothyAldenDavis DrTimothyAldenDavis added the bug Something isn't working label Jan 18, 2024
@DrTimothyAldenDavis
Copy link
Owner

yes, turning it on would break the ABI, but the case is not meant to be used in production, just in development. There is no cmake way to set BLAS_DUMP, for example.

Yes, I'd release a 7.5.2 perhaps, with this bug fix. This could be considered a bug introduced in 7.4.x and fixed in 7.5.2, and thus versions 7.4.x through 7.5.1 would still have the bug. If anyone is using 7.4.x to 7.5.1, and they notice the ABI break, they could just fix that bug by upgrading to 7.5.2.

I should have an ABI-checker myself, I suppose. I try to be careful but sometimes I slip up.

@DrTimothyAldenDavis
Copy link
Owner

turning it on would break the ABI ...

I mean adding the #idef BLAS_DUMP, as the least-churn option.

@dkogan
Copy link

dkogan commented Jan 18, 2024

Hi. I just ran the automated tool to compare libcholmod in 7.3.1 vs 7.5.1.

I downloaded libcholmod5 and libcholmod5-dbgsym from https://snapshot.debian.org. Then I extracted the contents, and compared the ABIs:

V0=7.3.1+dfsg-2
V1=7.5.1+dfsg-1

mkdir $V0 && for f (*$V0*.deb) { dpkg -x $f $V0 }
mkdir $V1 && for f (*$V1*.deb) { dpkg -x $f $V1 }

abi-dumper $V0/usr/lib/*/*.so.? --search-debuginfo=$V0 -lver $V0 -o $V0.dump
abi-dumper $V1/usr/lib/*/*.so.? --search-debuginfo=$V1 -lver $V1 -o $V1.dump

abi-compliance-checker -l cholmod5 -old $V0.dump -new $V1.dump

The results: https://notes.secretsauce.net/cholmod5-7.3.1+dfsg-2_to_7.5.1+dfsg-1-compat_report.html

So it flagged the issue we're talking about (albeit with a "Low" severity, which I disagree with), and found no other problems. So adding the #ifdef and releasing a 7.5.2 is probably good. Sébastien: does that sound good to you?

@svillemot
Copy link
Contributor Author

svillemot commented Jan 18, 2024

What's the least disruptive way to resolve this, in your opinion? I could simply call CHOLMOD v6. It has undergone a lot of changes with the addition of single precision support (but that was what v5 was supposed to indicate).

Note that my last point was precisely to stress that there is no fundamental reason why bumping the SOVERSION to 6 should imply bumping the major version number of CHOLMOD to 6. Many libraries disconnect the two, because they are distinct concepts. To say it otherwise, one possibility would be to have CHOLMOD 5.1.2 ship libcholmod.so.6.

I could #ifdef out the blas_dump variable, since it's only meant for diagnosing performance issues in the BLAS. It could change to

#ifdef BLAS_DUMP
FILE *blas_dump ; 
#endif

and then BLAS_DUMP would not be used in production, just development. If this is the only ABI-breaking change, then I could call this CHOLMOD 5.1.2.

Which option is least disruptive, do you think?

This option is fine with me, as long as you can ensure that there will never be a reason to define BLAS_DUMP in the Debian package (because, as @dkogan has mentioned, this would reintroduce an ABI breakage without SOVERSION bump). Actually, this solution is the one that is the easiest for me to implement as Debian packager.

@DrTimothyAldenDavis
Copy link
Owner

This option is fine with me, as long as you can ensure that there will never be a reason to define BLAS_DUMP in the Debian package (because, as @dkogan has mentioned, this would reintroduce an ABI breakage without SOVERSION bump). Actually, this solution is the one that is the easiest for me to implement as Debian packager.

Yes -- that's exactly what I can do. BLAS_DUMP would be for development only, not production. I didn't use an #ifdef because I thought it would be better to have a fixed size struct, but that's not very important.

@DrTimothyAldenDavis
Copy link
Owner

Thanks for that report. The changes of int mode to int values is upward-compatible with 7.3.1. Some of the functions in 7.3.1 only handled real matrices (not complex), and values=0 was pattern-only, and nonzero was "do the values". If the matrix was complex in 7.3.1, an error code was returned (complex not supported). Now, mode=0 means pattern only (as before), mode=1 means "do the values but not complex conjugate", and mode=2 means "do the values, and use the complex conjugage if complex".

Other changes of mode -> values were just to make this parameter consistent in its name. Usage is unchanged.

The restrict keyword was removed because it causes problems with an older version MS VIsual Studio, if I recall. The behavior of that method is not modified. So this is safe.

The only change is blas_dump, and I can #ifdef that away.

This commit should fix the problem: 53f0dc4

I also will need to revise the top-level ChangeLog for Suitesparse 7.5.2.

@svillemot
Copy link
Contributor Author

Thanks for fixing the issue. I have already uploaded the patch to Debian, without waiting for the new release, because it was somewhat urgent (many packages broken).

@DrTimothyAldenDavis
Copy link
Owner

Does SuiteSparse 7.6.0.beta1 look OK? I'd like to post it soon since this ABI break is an important thing to fix, but it would be best to get your feedback first.

@DrTimothyAldenDavis
Copy link
Owner

Fixed in SuiteSparse 7.6.0. Thanks again for catching this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants