# Weak Scaling of rotating Sphere (Stokes, steady)

We investigate: How efficient can resources been allocated (fixed problem size per core)? Theory of Gusavson's law states: the problem size lineary scales with the available resources for a efficient implementation and fixed runtime (e.g. double no of cores can solve a problem of double size (DOF) in the same time). Note: adjusting problem size during run is not possible with our approach, so we stick to fixed size per core instead.

In [None]:
//#r "./../../../../../public/src/L4-application/BoSSSpad/bin/Release/net5.0/BoSSSpad.dll"
#r "BoSSSpad.dll"
using System;
using ilPSP;
using ilPSP.Utils;
using BoSSS.Platform;
using BoSSS.Foundation;
using BoSSS.Foundation.XDG;
using BoSSS.Foundation.Grid;
using BoSSS.Solution;
using BoSSS.Application.XNSE_Solver;
using BoSSS.Application.BoSSSpad;
using BoSSS.Foundation.Grid.Classic;
using BoSSS.Foundation.IO;
using BoSSS.Solution.AdvancedSolvers;
using BoSSS.Solution.Control;
using BoSSS.Solution.XNSECommon;
using BoSSS.Solution.NSECommon;
using BoSSS.Application.XNSE_Solver.LoadBalancing;
using BoSSS.Solution.LevelSetTools;
using BoSSS.Solution.XdgTimestepping;

using static BoSSS.Application.BoSSSpad.BoSSSshell;
Init();

## Init Database, Client and Workflowmanager
Set names of database and tables to be written out. 
Names are generated of environment variables (build information of jenkins).
There are defaults though, you see no need change anything.

In [None]:
// Is used at Jenkins to generate individual names (for output .json)
string dbname = System.Environment.GetEnvironmentVariable("DATABASE_NAME");
string buildname = System.Environment.GetEnvironmentVariable("JOB_NAME");
//defaults
buildname = String.IsNullOrEmpty(buildname)? "Benchmark" : buildname;
//string thedate = $"{System.DateTime.Today.Day}-{System.DateTime.Today.Month}-{System.DateTime.Today.Year}";
dbname = String.IsNullOrEmpty(dbname)? "dbname" : dbname;
string table_name = String.Concat(buildname, "_", dbname);

Auxiliary datatype to map gridID onto controlobjects and control objects onto job settings.

In [None]:
struct Parameterz{
    public Parameterz(int _Cores, int _Poly, int _Res){
        Cores = _Cores;
        Poly = _Poly;
        Res = _Res;
    }
    public int Cores;
    public int Poly;
    public int Res;
}

define job settings. One <code>Parameterz</code> marks one job setting (injective mapping). Settings will lead to approximately same problem size per core:

|	 | DOF/cell | cores | Res | DOF/core |
|----|:----:|:-------------:|:------:|:------:|
| k2 | 34 	| 4,32,256     | 11*m | 11,314 |
| k2 | 34 	| 8,64,512     | 14*m | 11,662 |
| k2 | 34 	| 16,128,1024  | 17*m | 10,440 |
| k3 | 70 	| 4,32,256     | 9*m  | 12,758 |
| k3 | 70 	| 8,64,512     | 11*m | 11,646 |
| k3 | 70 	| 16,128,1024  | 14*m | 12,005 |
| k4 | 125 	| 4,32,256     | 7*m  | 10,719 |
| k4 | 125 	| 8,64,512     | 9*m  | 11,391 |
| k4 | 125 	| 16,128,1024  | 11*m | 10,398 |

(Res is cells in direction of space within domain; m=(1,2,4))

rectengular cells are used with x/y/z=3/1/1. Why this setting was chosen:
- nonlinear increase in memory per core (more than 16 nodes had to be allocated: 384*16 GB = 6,144 GB!)
- best match for DOF/core through k2, k3 and k4
Note: that 10k DOF is the limie for Schwarz block size, so there is always 1 Schwarz block at each level per core

In [None]:
// set parameterz
var Parameterz = new List<Parameterz>();

 // 8 cores
 Parameterz.Add(new Parameterz(8,2,11));
 Parameterz.Add(new Parameterz(8,3,9));
 Parameterz.Add(new Parameterz(8,4,7));
/*
 // 16 cores
 Parameterz.Add(new Parameterz(16,2,18));
 Parameterz.Add(new Parameterz(16,3,14));
 Parameterz.Add(new Parameterz(16,4,11));
 
 // 32 cores
 Parameterz.Add(new Parameterz(32,2,22));
 Parameterz.Add(new Parameterz(32,3,18));
 Parameterz.Add(new Parameterz(32,4,14));


// 64 cores
 Parameterz.Add(new Parameterz(64,2,28));
 Parameterz.Add(new Parameterz(64,3,22));
 Parameterz.Add(new Parameterz(64,4,18));
 */

// problematic, high memory consumption !!!

 // 128 cores
 Parameterz.Add(new Parameterz(128,2,36));
 Parameterz.Add(new Parameterz(128,3,28));
 Parameterz.Add(new Parameterz(128,4,22));

/*
 // 256 cores
 Parameterz.Add(new Parameterz(256,2,44));
 Parameterz.Add(new Parameterz(256,3,36));
 Parameterz.Add(new Parameterz(256,4,28));
*/

 // 512 cores
 //Parameterz.Add(new Parameterz(512,2,56));
 //Parameterz.Add(new Parameterz(512,3,44));
 //Parameterz.Add(new Parameterz(512,4,36));
int MemoryPerCore = 2000;

Define solver parameters

In [None]:
bool useAMR = false;
bool useLoadBal = true;
int NoOfTimeSteps = 1;
bool Steady = false;
bool IncludeConvection = false;
var Gshape = Shape.Sphere;

Define solver parameters

In [None]:
ExecutionQueues

Client setup and <code>\#SBATCH</code> configuration:
- <code>-N</code> (nodes),
- <code>-C</code> (Processor architecture),
- <code>--mem-per-cpu</code> (allocated memory per core).
<br> Note: <code>--mem-per-cpu</code> must be set that the job is accepted by Lichtenberg scheduler.

In [None]:
var myBatch = (SlurmClient)GetDefaultQueue();
var AddSbatchCmds = new List<string>();
AddSbatchCmds.AddRange(new string[]{"#SBATCH -C avx512", "#SBATCH --mem-per-cpu="+MemoryPerCore});
myBatch.AdditionalBatchCommands = AddSbatchCmds.ToArray();
myBatch.AdditionalBatchCommands

In [None]:
string WFlowName = table_name;
BoSSS.Application.BoSSSpad.BoSSSshell.WorkflowMgm.Init(WFlowName);
BoSSS.Application.BoSSSpad.BoSSSshell.WorkflowMgm.SetNameBasedSessionJobControlCorrelation();

Set database

In [None]:
string WFlowName = table_name;
BoSSS.Application.BoSSSpad.BoSSSshell.WorkflowMgm.Init(WFlowName);
BoSSS.Application.BoSSSpad.BoSSSshell.WorkflowMgm.SetNameBasedSessionJobControlCorrelation();
var myDB = BoSSS.Application.BoSSSpad.BoSSSshell.WorkflowMgm.DefaultDatabase; myDB

## Generate Grid
- Domain (-2,4)x(-1,1)x(-1,1)
- equidistant, rectengular (x/y/z=3/1/1) cells, resolution is chosen according to <code>Parameterz.Res</code>

In [None]:
static class Utils {
    // DOF per cell in 3D
    public static int Np(int p) {
        return (p*p*p + 6*p*p + 11*p + 6)/6;
    }    
}

In [None]:
double xMax = 4.0, yMax = 1.0, zMax = 1.0;
double xMin = -2.0, yMin = -1.0,zMin = -1.0;

generate all grids defined by <code>Parameterz.Res</code>. If grid already exists, continue. 

In [None]:
var Grids = new Dictionary<int, IGridInfo>();
foreach(var P in Parameterz){
    int Res = P.Res;
    if(Grids.TryGetValue(Res,out IGridInfo ignore))
        continue;
    int Stretching = (int)Math.Floor(Math.Abs(xMax-xMin)/Math.Abs(yMax-yMin));
    //int Stretching = 1;
    var _xNodes = GenericBlas.Linspace(xMin, xMax, Stretching*Res + 1);
    var _yNodes = GenericBlas.Linspace(yMin, yMax, Res + 1);
    var _zNodes = GenericBlas.Linspace(zMin, zMax, Res + 1);

    GridCommons grd;
    string gname = "RotBenchmarkGrid";
    
    var tmp = new List<IGridInfo>();
    foreach(var grid in myDB.Grids){
        try{
            bool IsMatch = grid.Name.Equals(gname)&&grid.NumberOfCells==(_xNodes.Length-1)*(_yNodes.Length-1)*(_zNodes.Length-1);
            if(IsMatch) tmp.Add(grid);
        }
        catch(Exception ex) {
            Console.WriteLine(ex.Message);
        }
    }
    //var tmp = myDB.Grids.Where(g=>g.Name.Equals(gname)&&g.NumberOfCells==Res*Res*Res); // this leads to exception in case of broken grids
    if(tmp.Count()>=1){
        Console.WriteLine("Grid found: "+tmp.Pick(0).Name);
        Grids.Add(Res,tmp.Pick(0));
        continue;
    }
    
    grd = Grid3D.Cartesian3DGrid(_xNodes, _yNodes, _zNodes);
    grd.Name = gname;
    //grd.AddPredefinedPartitioning("debug", MakeDebugPart);

    grd.EdgeTagNames.Add(1, "Velocity_inlet");
    grd.EdgeTagNames.Add(2, "Wall");
    grd.EdgeTagNames.Add(3, "Pressure_Outlet");

    grd.DefineEdgeTags(delegate (double[] _X) {
        var X = _X;
        double x, y, z;
        x = X[0];
        y = X[1];
        z = X[2];
        if(Math.Abs(x-xMin)<1E-8)
            return 1;
        else
            return 3;
    });
    myDB.SaveGrid(ref grd,false);
    Grids.Add(Res,grd);
} Grids.Keys.ToList()

## Generate Control object

### governing equations
- incompressible steady Stokes:
<br>$\nabla p - \eta \Delta \vec{u} = \vec{f} \ \ in \ \ \Omega_F $
<br>$\nabla \cdot \vec{u} = 0 \quad in \ \ \Omega_F$
- with boundary conditions:
<br>$\vec{u}(\vec{x})  =  \vec{u}_{Inlet} \ \  on \ \ \Gamma_{Inlet} = \{ \vec{X} \in \partial \Omega_F |  x=-2 \}$
<br>$p \mathbf{I} - \frac{1}{Re} \nabla \vec{u} \vec{n}_{ \Gamma_{pOut} } = 0 \ \ on \ \ \Gamma_{pOut} = \partial \Omega \backslash \Gamma_{Inlet} $ 
<br>$\vec{u}(\vec{x}) = \boldsymbol{\omega} \times \vec{r} \quad on \ \ \mathcal{J} = \partial \Omega_S \cap \partial \Omega_F$

- Inlet-Velocity $u_{Inlet}=\frac{Re*\mu_A}{\rho_A*d_{hyd}}$
- angular velocity of rotating sphere $\boldsymbol{\omega}=\frac{Re*\mu_A}{\rho_A*d_{hyd}*1m}$

### Notes:
- for simplicity we stick to a steady linear problem (we do not have to worry about time stepping and NL-solver at this point)
- why sphere? answer: we do not have to enforce continuity of Levelset. Although sharp edges demand higher local resolution (skip AMR at this point)

In [None]:
int SpaceDim = 3; 

In [None]:
Func<IGridInfo, int, XNSE_Control> GenXNSECtrl = delegate(IGridInfo grd, int k){
    XNSE_Control C = new XNSE_Control();
    // basic database options
    // ======================
    C.SetDatabase(myDB);
    C.savetodb = true;
    int J  = grd.NumberOfCells;
    C.SessionName = string.Format("J{0}_k{1}_t{2}", J, k,NoOfTimeSteps);
    if(IncludeConvection){
        C.SessionName += "_NSE";
        C.Tags.Add("NSE");
    } else {
        C.SessionName += "_Stokes";
        C.Tags.Add("Stokes");
    }
    C.Tags.Add(SpaceDim + "D");
    if(Steady)C.Tags.Add("steady");
    else C.Tags.Add("transient");
    C.Tags.Add("reortho_Iter2_sameRes");

    // DG degrees
    // ==========
    C.SetFieldOptions(k, Math.Max(k, 2));
    C.saveperiod = 1;
    //C.TracingNamespaces = "*";

    C.GridGuid = grd.ID;
    C.GridPartType = GridPartType.clusterHilbert;
    C.DynamicLoadbalancing_ClassifierType = ClassifierType.CutCells;
    C.DynamicLoadBalancing_On = useLoadBal;
    C.DynamicLoadBalancing_RedistributeAtStartup = true;
    C.DynamicLoadBalancing_Period = 1;
    C.DynamicLoadBalancing_ImbalanceThreshold = 0.1;

    // Physical Parameters
    // ===================
    const double rhoA = 1;
    const double Re = 50;
    double muA = 1;
    
    double partRad = 0.3001;
    double anglev = Re*muA/rhoA/(2*partRad);
    //double anglev = 0.0;
    double d_hyd = 2*partRad;
    double VelocityIn = Re*muA/rhoA/d_hyd;
    double[] pos = new double[SpaceDim];

    C.PhysicalParameters.IncludeConvection = IncludeConvection;
    C.PhysicalParameters.Material = true;
    C.PhysicalParameters.rho_A = rhoA;
    C.PhysicalParameters.mu_A = muA;

    C.Rigidbody.SetParameters(pos,anglev,partRad,SpaceDim);
    C.Rigidbody.SpecifyShape(Gshape);
    C.Rigidbody.SetRotationAxis("x");

    C.AddInitialValue(VariableNames.LevelSetCGidx(0), new Formula("X => -1"));
    C.UseImmersedBoundary = true;
    
    C.AddInitialValue("Pressure", new Formula(@"X => 0"));
    C.AddBoundaryValue("Pressure_Outlet");
    C.AddBoundaryValue("Velocity_inlet","VelocityX",new Formula($"(X) => {VelocityIn}"));
    //C.AddInitialValue("VelocityX", new Formula($"(X,t) => {VelocityIn}"));

    C.CutCellQuadratureType = BoSSS.Foundation.XDG.XQuadFactoryHelper.MomentFittingVariants.Saye;
    C.UseSchurBlockPrec = true;
    C.AgglomerationThreshold = 0.1;
    C.AdvancedDiscretizationOptions.ViscosityMode = ViscosityMode.FullySymmetric;
    C.Option_LevelSetEvolution2 = LevelSetEvolution.Prescribed;
    C.Option_LevelSetEvolution = LevelSetEvolution.None;
    C.Timestepper_LevelSetHandling = LevelSetHandling.None;
    C.LinearSolver.NoOfMultigridLevels = 4;
    C.LinearSolver.ConvergenceCriterion = 1E-6;
    C.LinearSolver.MaxSolverIterations = 500;
    C.LinearSolver.MaxKrylovDim = 50;
    C.LinearSolver.TargetBlockSize = 1000;
    C.LinearSolver.verbose = true;
    C.LinearSolver.SolverCode = LinearSolverCode.exp_Kcycle_schwarz;
    C.NonLinearSolver.SolverCode = NonLinearSolverCode.Newton;
    C.NonLinearSolver.ConvergenceCriterion = 1E-6;
    C.NonLinearSolver.MaxSolverIterations = 10;
    C.NonLinearSolver.verbose = true;

    C.AdaptiveMeshRefinement = useAMR;
    if (useAMR) {
        C.SetMaximalRefinementLevel(1);
        C.AMR_startUpSweeps = 0;
    }

    // Timestepping
    // ============
    double dt = -1;
    if(Steady){
        C.TimesteppingMode = AppControl._TimesteppingMode.Steady;
        dt = 1000;
        C.NoOfTimesteps = 1;
    } else {
        C.TimesteppingMode = AppControl._TimesteppingMode.Transient;        
        dt = 0.1;        
        C.NoOfTimesteps = NoOfTimeSteps;
    }
    C.TimeSteppingScheme = TimeSteppingScheme.ImplicitEuler;
    C.dtFixed = dt;
    return C;
};

In [None]:
var controls = new Dictionary<Parameterz,XNSE_Control>();
foreach(var P in Parameterz){
    int k = P.Poly;
    Grids.TryGetValue(P.Res,out IGridInfo grd);
    controls.Add(P,GenXNSECtrl(grd,k));
} controls.Values.Select(s=>s.SessionName)

## Submit & Run Jobs at Server

mapping control files to job configuration alias number of cores in particular.
<br>Memory consumption was severe. Number of nodes had to be adjusted due to memory consumption (384 GB per node at Lichtenberg 2). So not all cores of a node where operating (96 cores / node at Lichtenberg 2). <code>NodeRegression</code> is a regression of memory to cores, to estimate nodes to allocate for runs. Hopefully this is not necessary anymore in the future ...

In [None]:
static Action<int,BatchProcessorClient> NodeRegression =  delegate (int cores, BatchProcessorClient thisBatch) {
    int NoOfNodes = (int)Math.Ceiling(0.75*Math.Pow(cores,0.44));
    List<string> Cmdtmp = (thisBatch as SlurmClient).AdditionalBatchCommands.ToList();
    Cmdtmp.Add($"#SBATCH -N {NoOfNodes}");
    (thisBatch as SlurmClient).AdditionalBatchCommands = Cmdtmp.ToArray();
};

In [None]:
controls.Select(s=>s.Value.SessionName)

In [None]:
int iSweep=0;
foreach(var ctrl in controls){
    try{
    int cores= ctrl.Key.Cores;
    var ctrlobj = ctrl.Value;
    string sessname = ctrlobj.SessionName;
    ctrlobj.SessionName = sessname + "_c"+cores+"_Re50_"+"mue_"+ctrlobj.PhysicalParameters.mu_A;
    var aJob   = new Job("rotSphereInlet_"+Gshape+ctrlobj.SessionName,typeof(XNSE));
    aJob.SetControlObject(ctrlobj);
    aJob.NumberOfMPIProcs         = cores;
    aJob.ExecutionTime            = "3:00:00";
    aJob.UseComputeNodesExclusive = true;

    if(myBatch is SlurmClient) NodeRegression.Invoke(cores,myBatch);

    aJob.Activate(myBatch);
    iSweep++;
    } catch (Exception ex){
        Console.WriteLine(ex.Message);
    }
}

Wait until all jobs terminate. Checking in 60 sec intervals. Printing out every 15 min ...

In [None]:
BoSSS.Application.BoSSSpad.BoSSSshell.WorkflowMgm.BlockUntilAllJobsTerminate(3*3600,60);