Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #81 Block_Info::read_block_costs(): use more realistic RAM estimate #82

Merged
merged 1 commit into from
Jul 11, 2023

Conversation

vasdommes
Copy link
Collaborator

@vasdommes vasdommes commented Jul 5, 2023

#81
This improves block distribution among the cores during timing run only.

We use the same RAM estimate as in #83,
2#(B) + 5#(PSD) + 2#(S) + 2#(Bilinear pairing)

@davidsd
Copy link
Owner

davidsd commented Jul 5, 2023

Looks good to me! Have you checked that this improves memory distribution for the timing run in some cases?

@vasdommes
Copy link
Collaborator Author

vasdommes commented Jul 8, 2023

I've made several runs (on master and on block-costs-ram-estimates branches) on Expanse cluster for GNY model with different parameters, see plots below.
I took total program size (first number of proc/self/statm) for each rank at the beginning of run.step.initializeSchurComplementSolver.Q.synchronize_Q (max. RAM usage is roughly at that moment), at the second timing step.

First case shows better RAM distribution for new costs.
It is, however, suspicious that for old costs total RAM usage appears to be lower, I don't have any good explanation for this.
I've repeated the run twice and got the same result. Note, however, that during the actual run RAM usage for both cases is the same (~3.4GB for all cores).

Except for the (mysterious) first case, we see that both approaches yield similar RAM variation. Why?

  • The old block cost is Schur block size #(S) = P'xP'.
  • In the new block cost expression, the leading term (for our GNY model) is B matrix block size #(B') = P'xN. Note that N=const for all blocks.
  • Thus, in both cases blocks are essentially sorted by P', and the heaviest blocks always go to different nodes/cores (Worst-Fit-First). The difference should appear due to: (1) other terms in the new cost expression, and (2) different distribution of smaller blocks among cores.

To obtain really different results, one has to consider a problem where, e.g., positive semidefinite blocks contribution #(PSD) is the leading term, and some blocks have higher #(PSD) and lower #(S) at the same time.

P.S. Even we do not observe a major difference here, I think that we should use the new RAM estimate, because (if we got it right) it should work well for a broader range of problems.

image
image
image
image

@vasdommes vasdommes merged commit 4edf567 into davidsd:master Jul 11, 2023
@vasdommes vasdommes deleted the block-costs-ram-estimates branch July 11, 2023 22:55
@vasdommes
Copy link
Collaborator Author

BTW I found a case when RAM distribution per cores remains highly non-uniform even for actual step.
However, it is similar for old and new costs.

image

Block Grid Mapping
Node	Num Procs	Cost		Block List
==================================================
0	2		3.94883e+06	{(17,78)}
0	2		3.94883e+06	{(15,78)}
0	2		3.94883e+06	{(13,78)}
0	2		3.94883e+06	{(11,78)}
0	2		3.94883e+06	{(18,78)}
0	2		3.82447e+06	{(9,76)}
0	2		3.57941e+06	{(7,72)}
0	2		3.33921e+06	{(5,68)}
0	2		3.10387e+06	{(3,64)}
0	1		5.7468e+06	{(1,60)}
0	1		3.72182e+06	{(0,58),(225,70)}
0	1		3.45131e+06	{(119,79),(210,79),(198,79),(253,1)}
0	1		3.45131e+06	{(120,79),(209,79),(197,79),(254,1)}
0	1		3.44355e+06	{(121,79),(208,79),(196,79)}
0	1		3.91278e+06	{(122,79),(207,79),(195,78),(59,58)}
0	1		3.45777e+06	{(123,79),(206,79),(233,78),(255,1)}
0	1		3.91246e+06	{(124,79),(236,79),(194,77),(215,59)}
0	1		3.90371e+06	{(125,79),(238,79),(232,77),(176,58)}
0	1		3.92095e+06	{(126,79),(239,79),(193,76),(178,61)}
0	1		3.91217e+06	{(129,79),(240,79),(231,76),(177,60)}
0	1		3.95608e+06	{(130,79),(241,79),(192,75),(183,66)}
0	1		3.94722e+06	{(131,79),(242,79),(230,75),(182,65)}
0	1		3.95595e+06	{(132,79),(243,79),(191,74),(222,67)}
0	1		3.94706e+06	{(133,79),(244,79),(229,74),(221,66)}
0	1		3.95585e+06	{(134,79),(245,79),(228,73),(185,68)}
0	1		3.94695e+06	{(135,79),(246,79),(190,73),(184,67)}
0	1		3.95578e+06	{(136,79),(247,79),(227,72),(186,69)}
0	1		3.94686e+06	{(118,79),(248,79),(189,72),(223,68)}
0	1		3.95574e+06	{(128,79),(249,79),(226,71),(187,70)}
0	1		3.94681e+06	{(127,79),(250,79),(188,71),(224,69)}
0	1		3.92607e+06	{(71,78),(149,70)}
0	1		3.92607e+06	{(73,78),(109,70)}
0	1		3.92607e+06	{(74,78),(32,70)}
0	1		3.92607e+06	{(75,78),(65,70)}
0	1		3.8967e+06	{(76,78),(148,69)}
0	1		3.8967e+06	{(77,78),(31,69)}
0	1		3.8967e+06	{(78,78),(84,69)}
0	1		3.8967e+06	{(170,78),(108,69)}
0	1		3.8674e+06	{(169,78),(64,68)}
0	1		3.8674e+06	{(168,78),(30,68)}
0	1		3.8674e+06	{(167,78),(107,68)}
0	1		3.8674e+06	{(166,78),(147,68)}
0	1		3.83821e+06	{(117,78),(146,67)}
0	1		3.83821e+06	{(89,78),(83,67)}
0	1		3.83821e+06	{(70,78),(106,67)}
0	1		3.83821e+06	{(157,78),(29,67)}
0	1		3.80909e+06	{(158,78),(105,66)}
0	1		3.80909e+06	{(159,78),(63,66)}
0	1		3.80909e+06	{(160,78),(145,66)}
0	1		3.80909e+06	{(161,78),(28,66)}
0	1		3.78007e+06	{(162,78),(104,65)}
0	1		3.78007e+06	{(163,78),(27,65)}
0	1		3.78007e+06	{(90,78),(82,65)}
0	1		3.78007e+06	{(165,78),(144,65)}
0	1		3.75112e+06	{(97,78),(62,64)}

1	2		3.94883e+06	{(16,78)}
1	2		3.94883e+06	{(14,78)}
1	2		3.94883e+06	{(12,78)}
1	2		3.94883e+06	{(10,78)}
1	2		3.94883e+06	{(19,78)}
1	2		3.70133e+06	{(8,74)}
1	2		3.4587e+06	{(6,70)}
1	2		3.22093e+06	{(4,66)}
1	1		5.97606e+06	{(2,62)}
1	1		3.75112e+06	{(96,78),(26,64)}
1	1		3.75112e+06	{(95,78),(143,64)}
1	1		3.75112e+06	{(94,78),(103,64)}
1	1		3.72228e+06	{(93,78),(81,63)}
1	1		3.72228e+06	{(92,78),(25,63)}
1	1		3.72228e+06	{(91,78),(142,63)}
1	1		3.72228e+06	{(164,78),(102,63)}
1	1		3.69351e+06	{(55,78),(24,62)}
1	1		3.69351e+06	{(54,78),(61,62)}
1	1		3.69351e+06	{(53,78),(141,62)}
1	1		3.69351e+06	{(52,78),(101,62)}
1	1		3.66485e+06	{(51,78),(80,61)}
1	1		3.66485e+06	{(50,78),(100,61)}
1	1		3.66485e+06	{(49,78),(23,61)}
1	1		3.66485e+06	{(56,78),(140,61)}
1	1		3.63625e+06	{(47,78),(139,60)}
1	1		3.63625e+06	{(46,78),(99,60)}
1	1		3.63625e+06	{(45,78),(22,60)}
1	1		3.63625e+06	{(44,78),(60,60)}
1	1		3.60776e+06	{(43,78),(138,59)}
1	1		3.60776e+06	{(42,78),(21,59)}
1	1		3.60776e+06	{(41,78),(79,59)}
1	1		3.57934e+06	{(40,78),(98,58)}
1	1		3.57934e+06	{(48,78),(137,58)}
1	1		3.57934e+06	{(69,78),(20,58)}
1	1		3.95329e+06	{(72,78),(237,79),(251,79),(220,65)}
1	1		3.94444e+06	{(57,78),(235,79),(252,79),(219,64)}
1	1		3.94444e+06	{(58,78),(234,79),(205,79),(181,64)}
1	1		3.93561e+06	{(175,78),(214,79),(203,79),(218,63)}
1	1		3.93561e+06	{(174,78),(213,79),(202,79),(180,63)}
1	1		3.92679e+06	{(173,78),(212,79),(201,79),(217,62)}
1	1		3.92679e+06	{(172,78),(204,79),(200,79),(179,62)}
1	1		3.91799e+06	{(171,78),(211,79),(199,79),(216,61)}
1	1		3.92548e+06	{(39,77),(85,71)}
1	1		3.92548e+06	{(156,77),(110,71)}
1	1		3.92548e+06	{(116,77),(33,71)}
1	1		3.92548e+06	{(88,77),(150,71)}
1	1		3.92502e+06	{(155,76),(111,72)}
1	1		3.92502e+06	{(68,76),(34,72)}
1	1		3.92502e+06	{(115,76),(151,72)}
1	1		3.92502e+06	{(38,76),(66,72)}
1	1		3.92477e+06	{(154,75),(152,73)}
1	1		3.92477e+06	{(37,75),(86,73)}
1	1		3.92477e+06	{(114,75),(112,73)}
1	1		3.92477e+06	{(87,75),(35,73)}
1	1		3.92466e+06	{(36,74),(113,74)}
1	1		3.92466e+06	{(153,74),(67,74)}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants