Fix #81 Block_Info::read_block_costs(): use more realistic RAM estimate #82

vasdommes · 2023-07-05T19:17:23Z

#81
This improves block distribution among the cores during timing run only.

We use the same RAM estimate as in #83,
2#(B) + 5#(PSD) + 2#(S) + 2#(Bilinear pairing)

… estimate

davidsd · 2023-07-05T20:29:27Z

Looks good to me! Have you checked that this improves memory distribution for the timing run in some cases?

vasdommes · 2023-07-08T03:44:22Z

I've made several runs (on master and on block-costs-ram-estimates branches) on Expanse cluster for GNY model with different parameters, see plots below.
I took total program size (first number of proc/self/statm) for each rank at the beginning of run.step.initializeSchurComplementSolver.Q.synchronize_Q (max. RAM usage is roughly at that moment), at the second timing step.

First case shows better RAM distribution for new costs.
It is, however, suspicious that for old costs total RAM usage appears to be lower, I don't have any good explanation for this.
I've repeated the run twice and got the same result. Note, however, that during the actual run RAM usage for both cases is the same (~3.4GB for all cores).

Except for the (mysterious) first case, we see that both approaches yield similar RAM variation. Why?

The old block cost is Schur block size #(S) = P'xP'.
In the new block cost expression, the leading term (for our GNY model) is B matrix block size #(B') = P'xN. Note that N=const for all blocks.
Thus, in both cases blocks are essentially sorted by P', and the heaviest blocks always go to different nodes/cores (Worst-Fit-First). The difference should appear due to: (1) other terms in the new cost expression, and (2) different distribution of smaller blocks among cores.

To obtain really different results, one has to consider a problem where, e.g., positive semidefinite blocks contribution #(PSD) is the leading term, and some blocks have higher #(PSD) and lower #(S) at the same time.

P.S. Even we do not observe a major difference here, I think that we should use the new RAM estimate, because (if we got it right) it should work well for a broader range of problems.

vasdommes · 2023-07-12T04:58:53Z

BTW I found a case when RAM distribution per cores remains highly non-uniform even for actual step.
However, it is similar for old and new costs.

Block Grid Mapping
Node	Num Procs	Cost		Block List
==================================================
0	2		3.94883e+06	{(17,78)}
0	2		3.94883e+06	{(15,78)}
0	2		3.94883e+06	{(13,78)}
0	2		3.94883e+06	{(11,78)}
0	2		3.94883e+06	{(18,78)}
0	2		3.82447e+06	{(9,76)}
0	2		3.57941e+06	{(7,72)}
0	2		3.33921e+06	{(5,68)}
0	2		3.10387e+06	{(3,64)}
0	1		5.7468e+06	{(1,60)}
0	1		3.72182e+06	{(0,58),(225,70)}
0	1		3.45131e+06	{(119,79),(210,79),(198,79),(253,1)}
0	1		3.45131e+06	{(120,79),(209,79),(197,79),(254,1)}
0	1		3.44355e+06	{(121,79),(208,79),(196,79)}
0	1		3.91278e+06	{(122,79),(207,79),(195,78),(59,58)}
0	1		3.45777e+06	{(123,79),(206,79),(233,78),(255,1)}
0	1		3.91246e+06	{(124,79),(236,79),(194,77),(215,59)}
0	1		3.90371e+06	{(125,79),(238,79),(232,77),(176,58)}
0	1		3.92095e+06	{(126,79),(239,79),(193,76),(178,61)}
0	1		3.91217e+06	{(129,79),(240,79),(231,76),(177,60)}
0	1		3.95608e+06	{(130,79),(241,79),(192,75),(183,66)}
0	1		3.94722e+06	{(131,79),(242,79),(230,75),(182,65)}
0	1		3.95595e+06	{(132,79),(243,79),(191,74),(222,67)}
0	1		3.94706e+06	{(133,79),(244,79),(229,74),(221,66)}
0	1		3.95585e+06	{(134,79),(245,79),(228,73),(185,68)}
0	1		3.94695e+06	{(135,79),(246,79),(190,73),(184,67)}
0	1		3.95578e+06	{(136,79),(247,79),(227,72),(186,69)}
0	1		3.94686e+06	{(118,79),(248,79),(189,72),(223,68)}
0	1		3.95574e+06	{(128,79),(249,79),(226,71),(187,70)}
0	1		3.94681e+06	{(127,79),(250,79),(188,71),(224,69)}
0	1		3.92607e+06	{(71,78),(149,70)}
0	1		3.92607e+06	{(73,78),(109,70)}
0	1		3.92607e+06	{(74,78),(32,70)}
0	1		3.92607e+06	{(75,78),(65,70)}
0	1		3.8967e+06	{(76,78),(148,69)}
0	1		3.8967e+06	{(77,78),(31,69)}
0	1		3.8967e+06	{(78,78),(84,69)}
0	1		3.8967e+06	{(170,78),(108,69)}
0	1		3.8674e+06	{(169,78),(64,68)}
0	1		3.8674e+06	{(168,78),(30,68)}
0	1		3.8674e+06	{(167,78),(107,68)}
0	1		3.8674e+06	{(166,78),(147,68)}
0	1		3.83821e+06	{(117,78),(146,67)}
0	1		3.83821e+06	{(89,78),(83,67)}
0	1		3.83821e+06	{(70,78),(106,67)}
0	1		3.83821e+06	{(157,78),(29,67)}
0	1		3.80909e+06	{(158,78),(105,66)}
0	1		3.80909e+06	{(159,78),(63,66)}
0	1		3.80909e+06	{(160,78),(145,66)}
0	1		3.80909e+06	{(161,78),(28,66)}
0	1		3.78007e+06	{(162,78),(104,65)}
0	1		3.78007e+06	{(163,78),(27,65)}
0	1		3.78007e+06	{(90,78),(82,65)}
0	1		3.78007e+06	{(165,78),(144,65)}
0	1		3.75112e+06	{(97,78),(62,64)}

1	2		3.94883e+06	{(16,78)}
1	2		3.94883e+06	{(14,78)}
1	2		3.94883e+06	{(12,78)}
1	2		3.94883e+06	{(10,78)}
1	2		3.94883e+06	{(19,78)}
1	2		3.70133e+06	{(8,74)}
1	2		3.4587e+06	{(6,70)}
1	2		3.22093e+06	{(4,66)}
1	1		5.97606e+06	{(2,62)}
1	1		3.75112e+06	{(96,78),(26,64)}
1	1		3.75112e+06	{(95,78),(143,64)}
1	1		3.75112e+06	{(94,78),(103,64)}
1	1		3.72228e+06	{(93,78),(81,63)}
1	1		3.72228e+06	{(92,78),(25,63)}
1	1		3.72228e+06	{(91,78),(142,63)}
1	1		3.72228e+06	{(164,78),(102,63)}
1	1		3.69351e+06	{(55,78),(24,62)}
1	1		3.69351e+06	{(54,78),(61,62)}
1	1		3.69351e+06	{(53,78),(141,62)}
1	1		3.69351e+06	{(52,78),(101,62)}
1	1		3.66485e+06	{(51,78),(80,61)}
1	1		3.66485e+06	{(50,78),(100,61)}
1	1		3.66485e+06	{(49,78),(23,61)}
1	1		3.66485e+06	{(56,78),(140,61)}
1	1		3.63625e+06	{(47,78),(139,60)}
1	1		3.63625e+06	{(46,78),(99,60)}
1	1		3.63625e+06	{(45,78),(22,60)}
1	1		3.63625e+06	{(44,78),(60,60)}
1	1		3.60776e+06	{(43,78),(138,59)}
1	1		3.60776e+06	{(42,78),(21,59)}
1	1		3.60776e+06	{(41,78),(79,59)}
1	1		3.57934e+06	{(40,78),(98,58)}
1	1		3.57934e+06	{(48,78),(137,58)}
1	1		3.57934e+06	{(69,78),(20,58)}
1	1		3.95329e+06	{(72,78),(237,79),(251,79),(220,65)}
1	1		3.94444e+06	{(57,78),(235,79),(252,79),(219,64)}
1	1		3.94444e+06	{(58,78),(234,79),(205,79),(181,64)}
1	1		3.93561e+06	{(175,78),(214,79),(203,79),(218,63)}
1	1		3.93561e+06	{(174,78),(213,79),(202,79),(180,63)}
1	1		3.92679e+06	{(173,78),(212,79),(201,79),(217,62)}
1	1		3.92679e+06	{(172,78),(204,79),(200,79),(179,62)}
1	1		3.91799e+06	{(171,78),(211,79),(199,79),(216,61)}
1	1		3.92548e+06	{(39,77),(85,71)}
1	1		3.92548e+06	{(156,77),(110,71)}
1	1		3.92548e+06	{(116,77),(33,71)}
1	1		3.92548e+06	{(88,77),(150,71)}
1	1		3.92502e+06	{(155,76),(111,72)}
1	1		3.92502e+06	{(68,76),(34,72)}
1	1		3.92502e+06	{(115,76),(151,72)}
1	1		3.92502e+06	{(38,76),(66,72)}
1	1		3.92477e+06	{(154,75),(152,73)}
1	1		3.92477e+06	{(37,75),(86,73)}
1	1		3.92477e+06	{(114,75),(112,73)}
1	1		3.92477e+06	{(87,75),(35,73)}
1	1		3.92466e+06	{(36,74),(113,74)}
1	1		3.92466e+06	{(153,74),(67,74)}

Fix davidsd#81 Block_Info::read_block_costs(): use more realistic RAM…

3019fcd

… estimate

vasdommes merged commit 4edf567 into davidsd:master Jul 11, 2023

vasdommes deleted the block-costs-ram-estimates branch July 11, 2023 22:55

vasdommes added the enhancement label Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #81 Block_Info::read_block_costs(): use more realistic RAM estimate #82

Fix #81 Block_Info::read_block_costs(): use more realistic RAM estimate #82

vasdommes commented Jul 5, 2023 •

edited

Loading

davidsd commented Jul 5, 2023

vasdommes commented Jul 8, 2023 •

edited

Loading

vasdommes commented Jul 12, 2023

Fix #81 Block_Info::read_block_costs(): use more realistic RAM estimate #82

Fix #81 Block_Info::read_block_costs(): use more realistic RAM estimate #82

Conversation

vasdommes commented Jul 5, 2023 • edited Loading

davidsd commented Jul 5, 2023

vasdommes commented Jul 8, 2023 • edited Loading

vasdommes commented Jul 12, 2023

vasdommes commented Jul 5, 2023 •

edited

Loading

vasdommes commented Jul 8, 2023 •

edited

Loading