Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Single Precision #487

Merged
merged 8 commits into from
Jan 16, 2024
Merged

Add Support for Single Precision #487

merged 8 commits into from
Jan 16, 2024

Conversation

rayegun
Copy link
Member

@rayegun rayegun commented Jan 10, 2024

Incomplete, posting now for codecov purposes. There are some outstanding issues on how to do promotion and conversion, since the CHOLMOD built-in capabilities are limited.

@ViralBShah
Copy link
Member

Wow that is a huge PR!

Copy link

codecov bot commented Jan 10, 2024

Codecov Report

Attention: 35 lines in your changes are missing coverage. Please review.

Comparison is base (75081bc) 85.17% compared to head (9731b71) 85.00%.

Files Patch % Lines
src/solvers/cholmod.jl 80.87% 35 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #487      +/-   ##
==========================================
- Coverage   85.17%   85.00%   -0.17%     
==========================================
  Files          12       12              
  Lines        8866     8959      +93     
==========================================
+ Hits         7552     7616      +64     
- Misses       1314     1343      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rayegun
Copy link
Member Author

rayegun commented Jan 10, 2024

The vast majority of it is trying to handle promotion at least somewhat sanely. Since mixed precision is not supported (and probably never will be).

@rayegun
Copy link
Member Author

rayegun commented Jan 11, 2024

Curses, I have no idea what's going on with Float32 on 32-bit plats. Are there arcane architectural differences afoot here? Or just a bad wrap somewhere :/

@rayegun
Copy link
Member Author

rayegun commented Jan 11, 2024

This seems to be an artifact of using sprand here perhaps? Does that make sense to you @ViralBShah? I suppose it's possible that the matrix generated here is close to non-posdef, and that minor architectural changes are causing the issue?

If I run

julia> Ftmp = SparseMatrixCSC{Float32, Int32}(sprandn(Tv, 100, 100, 0.1)); Ftmp = Ftmp'Ftmp + I;

julia> cholesky(Ftmp)

There's a 1 in 4 chance it fails on 32-bit Linux. It never fails using Float64. It also never fails on 64 bit platforms, no matter which datatypes or index types are used.

@ViralBShah
Copy link
Member

ViralBShah commented Jan 11, 2024

Just added a heavy diagonal:

SparseMatrixCSC{Float32, Int32}(sprandn(Tv, 100, 100, 0.1)); Ftmp = Ftmp'Ftmp + 100I

Float32 matrices will be more likely to be ill-conditioned. Let's see if my commit helps.

@ViralBShah
Copy link
Member

ViralBShah commented Jan 11, 2024

There's a 20min timeout. Should we reduce the set of tests run?

@ViralBShah
Copy link
Member

Does the failing test on ComplexF32 suggest that there is a bug somewhere in the cholesky for single precision just added in the complex case?

@rayegun
Copy link
Member Author

rayegun commented Jan 11, 2024

I need to trim the tests. I think I just located a bug in the wrapper, I'm running the tests on Ubuntu-32 right now. There was an assumption that only held by accident in the old code. Passing scale factors in sdmult and factorize was not really adhering to the documented API

@ViralBShah
Copy link
Member

The tests are timing out on Windows after 6 hours. I have triggered the x64 test again. I believe it was going through successfully earlier in this PR, but perhaps the hanging is a sign of memory corruption.

@dkarrasch
Copy link
Member

dkarrasch commented Jan 11, 2024

I think Windows tests have been hanging for a while, also in other PRs, on main there have been issues since f890a1e.

@rayegun
Copy link
Member Author

rayegun commented Jan 11, 2024

The tests are currently massive for cholesky. I will work on reducing, we don't really need a complete product of Float32, Float64 x Real, Complex x Int32, Int64 for every test.

I just did that to naively check everything while making the changes

@ViralBShah
Copy link
Member

ViralBShah commented Jan 11, 2024

The hangs since SuiteSparse 7.4 are a bit worrisome, but shouldn't hold this PR.

@rayegun
Copy link
Member Author

rayegun commented Jan 12, 2024

Are those hangs really from SuiteSparse? The times for the SuiteSparse tests don't look that terrible. They're faster than my MacBook. Which definitely doesn't take 5 hours.

@rayegun
Copy link
Member Author

rayegun commented Jan 12, 2024

Oh looks like indeed they happen after SuiteSparse 7.4 update. I wonder why 🤔 because the actual times don't look terrible.

I'll test on my Windows machine.

@ViralBShah
Copy link
Member

With #490 clearly identifying the issue on Windows, I think it is safe to merge this.

@rayegun rayegun merged commit 240ef90 into main Jan 16, 2024
4 of 8 checks passed
@rayegun rayegun deleted the rk/single-support branch January 16, 2024 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants