Skip to content

3.0.0

Latest
Compare
Choose a tag to compare
@vasdommes vasdommes released this 07 May 05:23
· 33 commits to master since this release
0dc600b

Optimizations:

Implemented new distributed matrix multiplication algorithm for calculating Q matrix.
It employs MPI shared windows, Chinese Remainder Theorem, FLINT and BLAS libraries.

Our benchmarks for large SDPB problems demonstrated more than ~2.5x overall program speedup and much better performance scaling with increasing number of CPUs and nodes, as compared to the 2.7.0 release.
In addition, improved RAM usage in the new algorithm now allows to solve even larger problems where Q matrix does not fit into single node memory.

See #142, #212.

image

New features:

  • New option --maxSharedMemory allowing to reduce memory usage in the new matrix multiplication algorithm. If the limit is not set, it is calculated automatically. See #209, #229.
  • Added new --verbosity trace level #230.

Other improvements:

  • Print iterations data and condition numbers to iterations.json in the output folder. See #228, #231, #232, #233.
  • In debug mode, write profiling data for both timing run and actual run, see #215.
  • In debug mode, print maximal memory usage #231.
  • Graceful exit on SIGTERM, see #208.
  • Build multiplatform Docker image (AMD64+ARM64) on CircleCI, see #225.

New dependencies:

What's Changed

  • Fix #201 Graceful exit on SIGTERM + cosmetic fixes by @vasdommes in #208
  • Fix #207 bigint-syrk-blas: account for MPI shared memory limits (by splitting shared windows) by @vasdommes in #209
  • Calculate Q matrix using Chinese Remainder Theorem, FLINT and BLAS libraries by @vasdommes in #142
  • Fix #211 Optimize reduce-scatter for Q matrix by @vasdommes in #212
  • Fix #175 Debug mode: write profiling for actual run + misc improvements by @vasdommes in #215
  • Misc improvements: shared window warnings, openblas->cblas in waf configure, update dependencies in Dockerfile, add non-HPD to docs by @vasdommes in #217
  • Fix printing window size messages: input and output window were mixed up by @vasdommes in #218
  • Fix #219 Updating block_timings leads to checkpoint loading errors by @vasdommes in #220
  • Minor memory-related fixes and improvements: pretty-print bytes, by @vasdommes in #221
  • Build multiplatform Docker image (AMD64+ARM64) on CircleCI by @vasdommes in #225
  • Minor fixes and improvements by @vasdommes in #227
  • Print SDPB iterations to out/iterations.json, compute condition numbers for each step by @vasdommes in #228
  • Fix #206 Determine --maxSharedMemory automatically, if not set by user. by @vasdommes in #229
  • Add --verbosity=trace level by @vasdommes in #230
  • Minor improvements: full precision for iterations.json, print max MemUsed, improve test output by @vasdommes in #231
  • Minor fixes: fix compilation for Boost 1.81, fix precision for iterations.json by @vasdommes in #232
  • Compute R-err and print it to iterations.json by @vasdommes in #233
  • Minor fixes: UB in compute_block_grid_mapping(), tests for iterations.json, compilation by @vasdommes in #237
  • Update docs for FLINT + minor configuration changes by @vasdommes in #239
  • Update docs to 3.0.0 by @vasdommes in #240

Full Changelog: 2.7.0...3.0.0