# Memory profiling of MPI-parallel runs

## Initialization

In order to execute the individual solver runs,
we are going to employ the mini batch processor,
for running the calculations on the local machine.
We also have to initialize the workflow management system and create 
a database.

Note: 
1. This tutorial can be found in the source code repository as as `convStudy.ipynb`. 
   One can directly load this into Jupyter to interactively work with the following code examples.
2. **In the following line, the reference to `BoSSSpad.dll` is required**. 
   You must either set `#r "BoSSSpad.dll"` to something which is appropirate for your computer
   (e.g. `C:\Program Files (x86)\FDY\BoSSS\bin\Release\net5.0\BoSSSpad.dll` if you installed the binary distribution),
   or, if you are working with the source code, you must compile `BoSSSpad` and put it side-by-side to this worksheet file
   (from the original location in the repository, you can use the scripts `getbossspad.sh`, resp. `getbossspad.bat`).

In [None]:
#r "BoSSSpad.dll"
//#r "C:\Users\kummer\Documents\BoSSS-kdev2\public\src\L4-application\BoSSSpad\bin\Debug\net6.0\BoSSSpad.dll"
using static BoSSS.Application.BoSSSpad.BoSSSshell;
Init();

In [None]:
using System;
using System.Collections.Generic;
using System.Linq;
using ilPSP;
using ilPSP.Utils;
using BoSSS.Platform;
using BoSSS.Platform.Utils.Geom;
using BoSSS.Foundation;
using BoSSS.Foundation.Grid;
using BoSSS.Foundation.Grid.Classic;
using BoSSS.Foundation.IO;
using BoSSS.Solution;
using BoSSS.Solution.Control;
using BoSSS.Solution.GridImport;
using BoSSS.Solution.Statistic;
using BoSSS.Solution.Utils;
using BoSSS.Solution.Gnuplot;
using BoSSS.Application.BoSSSpad;
using BoSSS.Application.XNSE_Solver;
using BoSSS.Application.GridGen;

In [None]:
BoSSSshell.WorkflowMgm.Init("memprofile");
wmg.SetNameBasedSessionJobControlCorrelation();

In [None]:
GetDefaultQueue()

## Memory instrumetation of grid generation

### Peform runs

In [None]:
int[] Resolutions_3D = new int[] { 256 };
int[] NoOfProcs = new int[] { 16, 32, 64 };
var ggcS = new List<(GridGenControl C, int MPIsize)>();

foreach(int MPIzs in NoOfProcs) {
for(int cnt = 0; cnt < Resolutions_3D.Length; cnt++) {
    int Res = Resolutions_3D[cnt];    
    
    double[] _xNodes = GenericBlas.Linspace(-1, +1, Res + 1);
    double[] _yNodes = GenericBlas.Linspace(-1, +1, Res + 1);
    double[] _zNodes = GenericBlas.Linspace(-1, +1, Res + 1);
    int J = (_xNodes.Length - 1)*(_yNodes.Length - 1)*(_zNodes.Length - 1);
    
    string GridName = string.Format(wmg.CurrentProject + "-MeshInit_J" + J + "_Sz" + MPIzs);
    
    {
        //int NoOfProcs = (int) Math.Min(182, Math.Max(1, Math.Ceiling(J/200000.0)));
        Console.WriteLine("Must create: " + GridName + " with " + MPIzs + " processors.");
        
        var C = new GridGenControl();
        ggcS.Add((C, MPIzs));
        C.SetDatabase(wmg.DefaultDatabase);
        
        C.GridName = GridName;
        
        // ***********************************************************
        C.MemoryInstrumentationLevel = ilPSP.Tracing.MemoryInstrumentationLevel.GcAndPrivateMemory;
        // ***********************************************************

        C.GridBlocks = new GridGenControl.MeshBlock[] {
            new GridGenControl.Cartesian3D() {
                xNodes = _xNodes,
                yNodes = _yNodes,
                zNodes = _zNodes
            }
        };

        C.BoundaryRegions.Add((
            new BoundingBox(new double[] { -1-1e-8, -2, -2 }, new double[] { -1+1e-8, +2, +2 }), 
            "wall_left"));
        C.BoundaryRegions.Add((
            new BoundingBox(new double[] { +1-1e-8, -2, -2 }, new double[] { +1+1e-8, +2, +2 }), 
            "wall_right"));
        C.BoundaryRegions.Add((
            new BoundingBox(new double[] { -2, -1-1e-8, -2 }, new double[] { +2, -1+1e-8, +2 }), 
            "wall_front"));
        C.BoundaryRegions.Add((
            new BoundingBox(new double[] { -2, +1-1e-8, -2 }, new double[] { +2, +1+1e-8, +2 }), 
            "wall_back"));
        C.BoundaryRegions.Add((
            new BoundingBox(new double[] { -2, -2, -1-1e-8 }, new double[] { +2, +2, -1+1e-8 }), 
            "wall_top"));
        C.BoundaryRegions.Add((
            new BoundingBox(new double[] { -2, -2, +1-1e-8 }, new double[] { +2, +2, +1+1e-8 }), 
            "wall_bottom"));
        
        
        C.SessionName = "GridCreation-" + GridName;
    } 
}
}

In [None]:
ggcS

In [None]:
foreach(var tt in ggcS) {
    Console.WriteLine(" Submitting: " + tt.C.SessionName); 
    var j = tt.C.CreateJob();
    j.RetryCount = 1;
    j.NumberOfMPIProcs = tt.MPIsize;
    j.Activate();
}

In [None]:
wmg.BlockUntilAllJobsTerminate(7200); // wait at maximum two hours
wmg.AllJobs

Asserting success:

In [None]:
var NoSuccess = wmg.AllJobs.Values.Where(job => job.Status != JobStatus.FinishedSuccessful).ToArray();
NUnit.Framework.Assert.Zero(NoSuccess.Count(), "Some Jobs Failed");

### Analysis and plot

In [None]:
var plot = wmg.Sessions.GetMPItotalMemory();

We are going to observe that the memory scaling is far from perfect at this point;

In [None]:
plot.PlotNow()

Maximum of each trace:

In [None]:
var Maxima = plot.dataGroups.Select(grp => (grp.Name, grp.Values.Max()));
Maxima

In [None]:
NUnit.Framework.Assert.Less(Maxima.ElementAt(0).Item2, 100000.0);
NUnit.Framework.Assert.Less(Maxima.ElementAt(1).Item2, 100000.0);
NUnit.Framework.Assert.Less(Maxima.ElementAt(2).Item2, 200000.0);

#### Reporting of largest Allocators

In [None]:
wmg.Sessions

In [None]:
wmg.Sessions.Single(sess => sess.Name.EndsWith("Sz16")).ReportLargestAllocators().Take(5)

In [None]:
wmg.Sessions.Single(sess => sess.Name.EndsWith("Sz32")).ReportLargestAllocators().Take(5)

In [None]:
wmg.Sessions.Single(sess => sess.Name.EndsWith("Sz64")).ReportLargestAllocators().Take(5)

#### Reporting of difference/imbalance in between different Runs:

In [None]:
wmg.Sessions.ReportLargestAllocatorImbalance().Take(5)