EPIC sdp2input/pvm2sdp refactoring + parallel reading optimization #176

vasdommes · 2024-01-23T17:13:55Z

Major refactoring of spd2input/pvm2sdp and related code, preparing the code to create a unified pmp2sdp executable handling all formats (.xml, .json, .m). As a byproduct, both sdp2input and pvm2sdp can now read all formats too.
remove spectrum --format option, now it's determined automatically.
New JSON parser allowing to skip array elements.
Fix File reading in sdp2input/pvm2sdp is not parallelized #150
In old version, every process was reading all input files (and picking the matrices it needed). Now input files are distributed among processes according to compute_block_grid_mapping() algorithm (with file size used as a cost), similar to SDPB block distribution.
Each group of processes gets one or more input files. Matrices from these files are distributed among the processes in a round-robin way.
Added more checks for Polynomial_Matrix_Program and output sdp blocks.

TODO: obsolete test files moved to outer_limits/toy/ directory, review and use or delete them! TODO: test also sdp2functions TODO: add realistic tests, e.g. using end-to-end data

…ates. Previous message incorrectly assumed (procsPerNode/procGranularity) core groups per node. In fact, big blocks can be assigned to big groups, containing more than procGranularity cores. The actual number of core groups per node lies in [1, procsPerNode/procGranularity] range.

…input_files()

Introduced PMWP_File_Parse_Result and PMWP_SDP classes to handle parsing data. (PMWP is Positive_Matrix_With_Prefactor used in json/mathematica input). As a byproduct: fix #152 spectrum not working in parallel (fixed by treating local and global block indices more accurately) TODO: - Same refactoring for pvm2sdp - More efficient IO parallelization, see #150

Introduced PVM_File_Parse_Result and PVM_SDP classes to handle parsing data. (PVM is Polynomial_Vector_Matrix used in xml input). Removed duplicated code from pvm2sdp TODO: - More efficient IO parallelization, see #150 - lots of code is duplicated for PVM and PMWP. We should deduplicate it, e.g. convert from one format to another when possible. - Make new executables which can read both formats, i.e. combine pvm2sdp + sdp2input, pvm2functions + sdp2functions, spectrum --format).

Allows to use Verbosity in boost::program_options without conversion from int_verbosity.

TODO better and unified parser design - states, callbacks etc. TODO optimize: skip matrices, avoid excessive copying TODO remove spectrum format TODO merge pvm2... and sdp2... TODO update docs

Input file format is now determined automatically.

Also: - added unit_tests for json - return PMP_File_Parse_Result from read_json(), read_mathematica(), read_xml() - Introduce PMP_File_Parse_Result::read() instead of constructor TODO reuse the new parser in sdp2functions etc.

+ fix clang-tidy warnings

…b_util/block_mapping The code will be reused by read_polynomial_matrix_program()

Parallelization algorithm: - Input files are distributed among (groups of) processes via compute_block_grid_mapping() - Within each group, matrices from input files are distributed in a round-robin way TODO: since IO is often a bottleneck, we can reduce it: root of each group copies file content to a shared memory window, and other processes read it.

…nomial_Vector> This is a simple std::vector<Polynomial> with function Zero(). On Expanse HPC, compilation of El::Matrix<std::vector<Polynomial> failed with error: 'class std::vector<Polynomial>' has no member named 'Zero' Zero() is (sometimes) called by El::MemZero() when allocating memory to matrix.

vasdommes · 2024-01-23T17:44:45Z

These plots show dramatic improvement of sdp2input reading time for GNY problems of different size (nmax=8, 18, 22). Timing data - from Expanse HPC.

Note that for large problems writing to zip archive becomes a bottleneck. It is done by a single process and cannot be parallelized.

@vasdommes

Copied over wscript, @vasdommes had already made a few changes from waf to cmake. Cleaned up, and added more TODOs

vasdommes added 16 commits January 8, 2024 01:52

outer_limits.test.cxx: update test data, use pvm2functions

54ed58f

TODO: obsolete test files moved to outer_limits/toy/ directory, review and use or delete them! TODO: test also sdp2functions TODO: add realistic tests, e.g. using end-to-end data

Refactor sdp2input: split write_output() to convert() and write_sdpb_…

964480f

…input_files()

Remove rank from write_sdpb_input_files() parameters

ecd3a3d

Add operator>> for Verbosity

414d4dd

Allows to use Verbosity in boost::program_options without conversion from int_verbosity.

EPIC Polynomial Matrix Program refactoring

82ec4cb

TODO better and unified parser design - states, callbacks etc. TODO optimize: skip matrices, avoid excessive copying TODO remove spectrum format TODO merge pvm2... and sdp2... TODO update docs

spectrum: remove --format option

3231d0d

Input file format is now determined automatically.

New JSON parser with element skipping

b265b50

Also: - added unit_tests for json - return PMP_File_Parse_Result from read_json(), read_mathematica(), read_xml() - Introduce PMP_File_Parse_Result::read() instead of constructor TODO reuse the new parser in sdp2functions etc.

Archive_Reader: throw error with message if next_entry() failed.

2b1b77a

+ fix clang-tidy warnings

Extract helper code from sdp_solve/Block_Info/allocate_blocks/ to sdp…

1f07ccf

…b_util/block_mapping The code will be reused by read_polynomial_matrix_program()

Fix #173 Throw error if some process does not get any SDP block

0463010

Validate Polynomial_Matrix_Program and Output_SDP

c719303

vasdommes added io performance labels Jan 23, 2024

vasdommes added this to the 2.7.0 milestone Jan 23, 2024

vasdommes changed the title ~~EPIC sdpinput/pmp2sdp refactoring + Fix #150 File reading in sdp2input/pvm2sdp is not parallelized~~ EPIC sdp2input/pvm2sdp refactoring + Fix #150 File reading in sdp2input/pvm2sdp is not parallelized Jan 23, 2024

vasdommes mentioned this pull request Jan 23, 2024

An option to write sdp to directory instead of zip archive #177

Closed

vasdommes changed the title ~~EPIC sdp2input/pvm2sdp refactoring + Fix #150 File reading in sdp2input/pvm2sdp is not parallelized~~ EPIC sdp2input/pvm2sdp refactoring + parallel reading optimization Jan 24, 2024

vasdommes merged commit e226001 into master Jan 24, 2024
2 checks passed

vasdommes deleted the parallel-pmp-read branch January 24, 2024 05:36

vasdommes mentioned this pull request Jan 24, 2024

Throw error if input PMP file is empty #171

Closed

vasdommes referenced this pull request in bharathr98/sdpb Feb 8, 2024

Initial commit, with a few immediate translations

7852499

Copied over wscript, @vasdommes had already made a few changes from waf to cmake. Cleaned up, and added more TODOs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC sdp2input/pvm2sdp refactoring + parallel reading optimization #176

EPIC sdp2input/pvm2sdp refactoring + parallel reading optimization #176

vasdommes commented Jan 23, 2024

vasdommes commented Jan 23, 2024

EPIC sdp2input/pvm2sdp refactoring + parallel reading optimization #176

EPIC sdp2input/pvm2sdp refactoring + parallel reading optimization #176

Conversation

vasdommes commented Jan 23, 2024

vasdommes commented Jan 23, 2024