# July_2nd Meeting

## Main bottleneck: assembleλmatrix

I rewrite this part 
```
B = -(F[e]' \ Fbar[vrng,λrng])
```

to 

```
B = similar(Fbar[vrng,λrng])
for lb in λrng
    B[:,lb-λrng[1]+1] = F[e]' \ Fbar[vrng,lb]
end
```


Because it was more memory efficient from my previous formulation, and using for loop allows the using of @threads macro. But the lhs and rhs are of different data type, I couldn't use @threads or .= to assign results for speedup

```
typeof(Fbar[vrng, lb]) = SparseVector{Float64,Int64}
size(Fbar[vrng, lb]) = (289,)

typeof((F[e])' \ Fbar[vrng, lb]) = SuiteSparse.CHOLMOD.Sparse{Float64}
size((F[e])' \ Fbar[vrng, lb]) = (289, 1)

typeof(B[:, (lb - λrng[1]) + 1]) = SparseVector{Float64,Int64}
size(B[:, (lb - λrng[1]) + 1]) = (289,)
```



### for a single solve 
```
display(@benchmark B = $F[$e]' \ $Fbar[$vrng,$λrng]) 
```

```
lvl = 1
N: 128
BenchmarkTools.Trial:
  memory estimate:  540.31 KiB
  allocs estimate:  57
  --------------
  minimum time:     354.901 μs (0.00% GC)
  median time:      477.801 μs (0.00% GC)
  mean time:        633.214 μs (12.71% GC)
  maximum time:     135.778 ms (57.39% GC)
  --------------
  samples:          7858
  evals/sample:     1
 ```
 
 
 ## For assembleλmatrix() function
 ```
 rs2 = @benchmark assembleλmatrix($FToλstarts, $vstarts, $EToF, $FToB, $locfactors, $D, $FbarT)
 display(rs2)
 
 ```
 
 ```
lvl = 1
N: 128
Time for direct solve in forming λ: 0.1549990177154541
Time elapsed (assembleλmatrix) for lvl 1 = 0.44099998474121094
BenchmarkTools.Trial: 
  memory estimate:  268.08 MiB
  allocs estimate:  2234771
  --------------
  minimum time:     275.601 ms (0.00% GC)
  median time:      320.938 ms (0.00% GC)
  mean time:        329.361 ms (6.92% GC)
  maximum time:     383.987 ms (15.51% GC)
  --------------
  samples:          16
  evals/sample:     1
 ```
 
 
 ### The actual size of the matrix that we formed
 
 ```
 println(Base.summarysize(B))    # This is for lvl=1, N:128
 3233704    
 ```
 
 The unit is byte here, which is equivalent to 3.23 MB. Apprently, we used way more memory than what might be actually needed.
 

### The size of input variables

```
lvl = 1
N: 128
Time for direct solve in forming λ: 0.3900001049041748
Base.summarysize(FToλstarts) = 1200
Base.summarysize(vstarts) = 560
Base.summarysize(EToF) = 2088
Base.summarysize(FToB) = 1192
Base.summarysize(locfactors) = 1064
Base.summarysize(D) = 15272
Base.summarysize(FbarT) = 452776
Base.summarysize(B) = 3233704
Time elapsed (assembleλmatrix) for lvl 1 = 0.619999885559082
3233704
BenchmarkTools.Trial:
  memory estimate:  268.08 MiB
  allocs estimate:  2234771
  --------------
  minimum time:     270.548 ms (0.00% GC)
  median time:      320.611 ms (0.00% GC)
  mean time:        344.188 ms (8.69% GC)
  maximum time:     468.399 ms (19.59% GC)
  --------------
  samples:          15
  evals/sample:     1
 ```

## The lose of Uniqueness after cholesky factorization for sparse arrays

In [29]:
using LinearAlgebra, SparseArrays
using BenchmarkTools

In [33]:
n = 10
A = randn(n,n) + n*I
D = randn(n,n) + n*I
B = copy(A)
C = copy(A)

10×10 Array{Float64,2}:
  8.32786    0.693927    2.65727   …  -1.44212    -1.49326   -0.619995
  0.887947  10.2028      0.79298      -0.0876955   0.505224   1.29697
  0.554862   0.519721   11.957         1.43327     0.870661   0.834926
  0.585187   1.03877     0.594165      0.104889   -1.09138    0.399069
  0.635657  -0.700768   -0.148571     -0.130156   -0.959544   1.72176
  0.336809  -0.739029    0.085315  …  -1.1913      1.00665    1.83767
 -0.754212   0.394895   -1.54056       0.501873   -0.16344   -0.492459
  0.416951   0.0557032  -0.17261      10.8881      0.34887    0.952062
  1.40034    0.20756     1.28929       0.505223    9.62325    0.761276
 -0.414654   0.214014    0.523935     -0.211288    0.536887  10.2106

In [34]:
list = [A,B,C,D]

4-element Array{Array{Float64,2},1}:
 [8.327858712821735 0.6939269098856935 … -1.4932625607784997 -0.6199953766451872; 0.8879474155846846 10.202800316621781 … 0.5052237214140406 1.2969745647500217; … ; 1.400340790666633 0.20756001469719654 … 9.623251677348625 0.7612756409374186; -0.41465420605600134 0.21401367214488198 … 0.5368870559426947 10.21057564916159]
 [8.327858712821735 0.6939269098856935 … -1.4932625607784997 -0.6199953766451872; 0.8879474155846846 10.202800316621781 … 0.5052237214140406 1.2969745647500217; … ; 1.400340790666633 0.20756001469719654 … 9.623251677348625 0.7612756409374186; -0.41465420605600134 0.21401367214488198 … 0.5368870559426947 10.21057564916159]
 [8.327858712821735 0.6939269098856935 … -1.4932625607784997 -0.6199953766451872; 0.8879474155846846 10.202800316621781 … 0.5052237214140406 1.2969745647500217; … ; 1.400340790666633 0.20756001469719654 … 9.623251677348625 0.7612756409374186; -0.41465420605600134 0.21401367214488198 … 0.5368870559426947 10.2105756

In [35]:
unique(list) # Only two unique elements, since B 

2-element Array{Array{Float64,2},1}:
 [8.327858712821735 0.6939269098856935 … -1.4932625607784997 -0.6199953766451872; 0.8879474155846846 10.202800316621781 … 0.5052237214140406 1.2969745647500217; … ; 1.400340790666633 0.20756001469719654 … 9.623251677348625 0.7612756409374186; -0.41465420605600134 0.21401367214488198 … 0.5368870559426947 10.21057564916159]
 [10.491159728910194 1.0380741125275268 … -0.08695821768760689 0.22170667193805202; -0.3469501107911587 10.111258442644681 … 1.9439556425883215 -0.3027922155138604; … ; 1.4249695467173005 -1.1850063330394995 … 9.331473857007838 0.2520877096400087; -1.0106293170621834 -0.31590503082776844 … 1.1696234403539112 8.99390595598669]

In [36]:
list_chol = cholesky.(Symmetric.(list))

4-element Array{Cholesky{Float64,Array{Float64,2}},1}:
 Cholesky{Float64,Array{Float64,2}}([2.885802958072802 0.24046233231014222 … -0.5174513237645757 -0.21484328128183525; 0.8879474155846846 3.1851182369516122 … 0.19768536887536792 0.42341796470920273; … ; 1.400340790666633 0.20756001469719654 … 2.953134771645326 0.16509235382980783; -0.41465420605600134 0.21401367214488198 … 0.5368870559426947 3.0392716454321587], 'U', 0)
 Cholesky{Float64,Array{Float64,2}}([2.885802958072802 0.24046233231014222 … -0.5174513237645757 -0.21484328128183525; 0.8879474155846846 3.1851182369516122 … 0.19768536887536792 0.42341796470920273; … ; 1.400340790666633 0.20756001469719654 … 2.953134771645326 0.16509235382980783; -0.41465420605600134 0.21401367214488198 … 0.5368870559426947 3.0392716454321587], 'U', 0)
 Cholesky{Float64,Array{Float64,2}}([2.885802958072802 0.24046233231014222 … -0.5174513237645757 -0.21484328128183525; 0.8879474155846846 3.1851182369516122 … 0.19768536887536792 0.4234179647092027

In [37]:
unique(list_chol) # only two unique elemts, after cholesky factorization for dense arrays

2-element Array{Cholesky{Float64,Array{Float64,2}},1}:
 Cholesky{Float64,Array{Float64,2}}([2.885802958072802 0.24046233231014222 … -0.5174513237645757 -0.21484328128183525; 0.8879474155846846 3.1851182369516122 … 0.19768536887536792 0.42341796470920273; … ; 1.400340790666633 0.20756001469719654 … 2.953134771645326 0.16509235382980783; -0.41465420605600134 0.21401367214488198 … 0.5368870559426947 3.0392716454321587], 'U', 0)
 Cholesky{Float64,Array{Float64,2}}([3.2390059785233793 0.3204915703801113 … -0.026847192707946162 0.06844898509237245; -0.3469501107911587 3.163628232893361 … 0.6171900734852891 -0.10264465807350935; … ; 1.4249695467173005 -1.1850063330394995 … 2.921582677668187 0.14180597546519466; -1.0106293170621834 -0.31590503082776844 … 1.1696234403539112 2.9476160728199514], 'U', 0)

In [38]:
list_sparse = sparse.(list)

4-element Array{SparseMatrixCSC{Float64,Int64},1}:
 
  [1 ,  1]  =  8.32786
  [2 ,  1]  =  0.887947
  [3 ,  1]  =  0.554862
  [4 ,  1]  =  0.585187
  [5 ,  1]  =  0.635657
  [6 ,  1]  =  0.336809
  [7 ,  1]  =  -0.754212
  [8 ,  1]  =  0.416951
  [9 ,  1]  =  1.40034
  [10,  1]  =  -0.414654
  [1 ,  2]  =  0.693927
  [2 ,  2]  =  10.2028
  ⋮
  [8 ,  9]  =  0.34887
  [9 ,  9]  =  9.62325
  [10,  9]  =  0.536887
  [1 , 10]  =  -0.619995
  [2 , 10]  =  1.29697
  [3 , 10]  =  0.834926
  [4 , 10]  =  0.399069
  [5 , 10]  =  1.72176
  [6 , 10]  =  1.83767
  [7 , 10]  =  -0.492459
  [8 , 10]  =  0.952062
  [9 , 10]  =  0.761276
  [10, 10]  =  10.2106
 
  [1 ,  1]  =  8.32786
  [2 ,  1]  =  0.887947
  [3 ,  1]  =  0.554862
  [4 ,  1]  =  0.585187
  [5 ,  1]  =  0.635657
  [6 ,  1]  =  0.336809
  [7 ,  1]  =  -0.754212
  [8 ,  1]  =  0.416951
  [9 ,  1]  =  1.40034
  [10,  1]  =  -0.414654
  [1 ,  2]  =  0.693927
  [2 ,  2]  =  10.2028
  ⋮
  [8 ,  9]  =  0.34887
  [9 ,  9]  =  9.62325
  [10,  9

In [39]:
unique(list_sparse) # Still only two unique elements

2-element Array{SparseMatrixCSC{Float64,Int64},1}:
 
  [1 ,  1]  =  8.32786
  [2 ,  1]  =  0.887947
  [3 ,  1]  =  0.554862
  [4 ,  1]  =  0.585187
  [5 ,  1]  =  0.635657
  [6 ,  1]  =  0.336809
  [7 ,  1]  =  -0.754212
  [8 ,  1]  =  0.416951
  [9 ,  1]  =  1.40034
  [10,  1]  =  -0.414654
  [1 ,  2]  =  0.693927
  [2 ,  2]  =  10.2028
  ⋮
  [8 ,  9]  =  0.34887
  [9 ,  9]  =  9.62325
  [10,  9]  =  0.536887
  [1 , 10]  =  -0.619995
  [2 , 10]  =  1.29697
  [3 , 10]  =  0.834926
  [4 , 10]  =  0.399069
  [5 , 10]  =  1.72176
  [6 , 10]  =  1.83767
  [7 , 10]  =  -0.492459
  [8 , 10]  =  0.952062
  [9 , 10]  =  0.761276
  [10, 10]  =  10.2106
 
  [1 ,  1]  =  10.4912
  [2 ,  1]  =  -0.34695
  [3 ,  1]  =  -0.245731
  [4 ,  1]  =  -1.17309
  [5 ,  1]  =  0.967867
  [6 ,  1]  =  1.30788
  [7 ,  1]  =  1.90704
  [8 ,  1]  =  1.75011
  [9 ,  1]  =  1.42497
  [10,  1]  =  -1.01063
  [1 ,  2]  =  1.03807
  [2 ,  2]  =  10.1113
  ⋮
  [8 ,  9]  =  -0.391103
  [9 ,  9]  =  9.33147
  [10,  9]  

In [40]:
list_sparse_chol = cholesky.(Symmetric.(list_sparse)) # This returns the structure used in the code

4-element Array{SuiteSparse.CHOLMOD.Factor{Float64},1}:
 SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  55
nnz:     55
success: true

 SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  55
nnz:     55
success: true

 SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  55
nnz:     55
success: true

 SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  55
nnz:     55
success: true


In [41]:
unique(list_sparse_chol)   # Now it's no longer only 2 unique elements by default unique functions

4-element Array{SuiteSparse.CHOLMOD.Factor{Float64},1}:
 SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  55
nnz:     55
success: true

 SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  55
nnz:     55
success: true

 SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  55
nnz:     55
success: true

 SuiteSparse.CHOLMOD.Factor{Float64}
type:    LLt
method:  simplicial
maxnnz:  55
nnz:     55
success: true


In [42]:
Base.summarysize(list_chol)

3528

In [43]:
Base.summarysize(list_sparse_chol) # This doesn't look like to represent the full data, I feel like the locfactors that is passed into the function is not really the factorized results for cholesky

104

In [44]:
Base.summarysize(list)

3432

In [45]:
Base.summarysize(list_sparse)   # The sparse arary formulation for a dense array is larger in size then dense array apparently

7464

In [46]:
list_sparse_lu = lu.(list_sparse)

4-element Array{SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64},1}:
 SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64}(Ptr{Nothing} @0x000000003dffd390, Ptr{Nothing} @0x000000003e314970, 10, 10, [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9  …  0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [8.327858712821735, 0.8879474155846846, 0.5548616569672883, 0.5851871738750613, 0.6356572810646595, 0.3368086552931352, -0.7542124716551896, 0.4169511296003448, 1.400340790666633, -0.41465420605600134  …  -0.6199953766451872, 1.2969745647500217, 0.8349259603535737, 0.39906853913411416, 1.721758488395891, 1.8376701008531864, -0.492458604085149, 0.9520618739383508, 0.7612756409374186, 10.21057564916159], 0)
 SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64}(Ptr{Nothing} @0x000000003dffd510, Ptr{Nothing} @0x000000003e319d70, 10, 10, [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9  …  0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [8.327858712821735, 0.8879474155846846, 0.5548616569672883, 

In [47]:
unique(list_sparse_lu) # Still 4 unique elements by default unique functions

4-element Array{SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64},1}:
 SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64}(Ptr{Nothing} @0x000000003dffd390, Ptr{Nothing} @0x000000003e314970, 10, 10, [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9  …  0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [8.327858712821735, 0.8879474155846846, 0.5548616569672883, 0.5851871738750613, 0.6356572810646595, 0.3368086552931352, -0.7542124716551896, 0.4169511296003448, 1.400340790666633, -0.41465420605600134  …  -0.6199953766451872, 1.2969745647500217, 0.8349259603535737, 0.39906853913411416, 1.721758488395891, 1.8376701008531864, -0.492458604085149, 0.9520618739383508, 0.7612756409374186, 10.21057564916159], 0)
 SuiteSparse.UMFPACK.UmfpackLU{Float64,Int64}(Ptr{Nothing} @0x000000003dffd510, Ptr{Nothing} @0x000000003e319d70, 10, 10, [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9  …  0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [8.327858712821735, 0.8879474155846846, 0.5548616569672883, 

In [48]:
list_sparse_lu[1] == list_sparse_lu[2]

false

In [49]:
Base.summarysize(list_sparse_lu)  # The size looks correct, because it's larger than the list_sparse itself

7560

In [50]:
@benchmark lu.(list_sparse)

BenchmarkTools.Trial: 
  memory estimate:  142.33 KiB
  allocs estimate:  252
  --------------
  minimum time:     45.300 μs (0.00% GC)
  median time:      49.200 μs (0.00% GC)
  mean time:        61.664 μs (3.42% GC)
  maximum time:     3.754 ms (17.25% GC)
  --------------
  samples:          10000
  evals/sample:     1

In [51]:
@benchmark cholesky.(Symmetric.(list_sparse))

BenchmarkTools.Trial: 
  memory estimate:  39.06 KiB
  allocs estimate:  146
  --------------
  minimum time:     24.700 μs (0.00% GC)
  median time:      25.900 μs (0.00% GC)
  mean time:        44.189 μs (23.93% GC)
  maximum time:     55.405 ms (90.56% GC)
  --------------
  samples:          10000
  evals/sample:     1

In [52]:
b = randn(n)

10-element Array{Float64,1}:
  0.6723234166465615
  0.4122849762906771
 -0.5759678843626039
  0.35141505192262046
 -1.4407677865129205
 -0.21550991387799628
  0.21968639857866154
  0.3963015215727377
 -1.4532203109960133
 -0.021531581408536234

In [54]:
@benchmark list_sparse_lu[1] \ b

BenchmarkTools.Trial: 
  memory estimate:  832 bytes
  allocs estimate:  4
  --------------
  minimum time:     798.067 ns (0.00% GC)
  median time:      808.654 ns (0.00% GC)
  mean time:        837.539 ns (0.78% GC)
  maximum time:     4.965 μs (78.58% GC)
  --------------
  samples:          10000
  evals/sample:     104

In [56]:
@benchmark list_sparse_chol[1] \ b   # but the benchmark here seems to show that list_sparse_chol actually stores the results after cholesky factorization

BenchmarkTools.Trial: 
  memory estimate:  936 bytes
  allocs estimate:  9
  --------------
  minimum time:     725.954 ns (0.00% GC)
  median time:      783.206 ns (0.00% GC)
  mean time:        1.368 μs (12.46% GC)
  maximum time:     552.090 μs (77.43% GC)
  --------------
  samples:          10000
  evals/sample:     131

In [58]:
@benchmark list_sparse[1] \ b   # significantly slower than the above two results for factorized matrices

BenchmarkTools.Trial: 
  memory estimate:  36.52 KiB
  allocs estimate:  67
  --------------
  minimum time:     12.700 μs (0.00% GC)
  median time:      14.500 μs (0.00% GC)
  mean time:        17.550 μs (3.60% GC)
  maximum time:     3.205 ms (26.50% GC)
  --------------
  samples:          10000
  evals/sample:     1

## Rework on the code to accept variables from environment

Purpose: To avoid changing the julia file each time to test different mesh with different levels. I want the code to accept variables from the environment. <br>
[ArgParse.jl](https://argparsejl.readthedocs.io/en/latest/argparse.html)

```
function parse_commandline()
    s = ArgParseSettings()

    @add_arg_table s  begin
        "--block_num", "-b"
            help = "an option with an argument"
            arg_type = Int
            default = 4
        "--level_num", "-l"
            help = "another option with an argument"
            arg_type = Int
            default = 4

        "--sbp_level", "-o"
            help = "SBP operators order"
            arg_type = Int
            default = 6
    end
    return parse_args(s)
end


parsed_args = parse_commandline()

# @show parsed_args

block_num = parsed_args["block_num"]
level_num = parsed_args["level_num"]
SBP_lvl = parsed_args["sbp_level"]


let
    # number of blocks in each side
    # n_block = 8
    n_block = block_num
    
    # SBP interior order
    # SBPp   = 6
    SBPp = SBP_lvl
    
    # num_of_lvls = 4
    num_of_lvls = level_num

```

With these changes, I can write a sbatch file that automatically execute the same julia code multiple times with different variables. <br>

```
#!/bin/bash
#SBATCH --account=erickson   ### change this to your actual account for charging
#SBATCH --partition=short      ### queue to submit to
#SBATCH --job-name=block_auto    ### job name
#SBATCH --output=block_auto.out   ### file in which to store job stdout
#SBATCH --error=block_auto.err    ### file in which to store job stderr
#SBATCH --time=1-00:00:00                ### wall-clock time limit, in minutes
#SBATCH --mem=128000              ### memory limit per node, in MB
#SBATCH --nodes=1               ### number of nodes to use
#SBATCH --ntasks-per-node=1     ### number of tasks to launch per node
#SBATCH --cpus-per-task=1       ### number of cores for each task

module load julia
cd ..
for block in 2 4 8 16
do
    JULIA_NUM_THREADS=4 julia test_multithreading.jl -b $block
done
```



### Results for 16 x 16 block, with 4 lvls. Maximum N: 2048 (2^11)

```
lvl = 4
N: 2048
Time for direct solve in forming λ: 3054.992022037506
256
256
Time elapsed (assembleλmatrix) for lvl 4 = 3356.643639087677
BenchmarkTools.Trial:
  memory estimate:  310.78 GiB
  allocs estimate:  4122006752
  --------------
  minimum time:     3349.839 s (2.43% GC)
  median time:      3349.839 s (2.43% GC)
  mean time:        3349.839 s (2.43% GC)
  maximum time:     3349.839 s (2.43% GC)
  --------------
  samples:          1
  evals/sample:     1(lvl, ϵ[lvl]) = (4, 7.878551842159896e-8)
Time elapsed for the whole code is approximately 17985.978904008865
Time elapsed (reading matrices) for lvl 4 = 0.027836203575134277
Time elapsed (linear solve with reading matrices) for lvl 4 = 132.41919736862184
Time elapsed (All three parts) for lvl 4 = 110.44069814682007
[25.998885943287647, 5.432191066145421, 5.806870289716743]
```

I am not quite sure about the validity of @benchmark macro when it comes to the measurement of the memory. Because The memory estimate is 310.78 GiB here, and I only requested 128 GiB. <br>


I check the job status via slurm system `seff 12371580` (12371580 is the job id)
```
Job ID: 12371580
Cluster: slurm_cluster
User/Group: yiminc/talapas
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 08:04:19
CPU Efficiency: 99.99% of 08:04:23 core-walltime
Job Wall-clock time: 08:04:23
Memory Utilized: 18.39 GB
Memory Efficiency: 14.71% of 125.00 GB
```


So the memory estimate that @benchmark macro gives is the total memory estimate for each intermediate results and the actual maximum memory utilized could be much smaller because there is garbage collector working to recycle memory.

Other jobs are still running
```
[yiminc@talapas-ln1 decomposite]$ sacct -u yiminc
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
12366127     4_block_l+    longfat   erickson          1    PENDING      0:0 
12369551     2_block_s+      short   erickson          1    RUNNING      0:0 
12369551.ba+      batch              erickson          1    RUNNING      0:0 
12369551.ex+     extern              erickson          1    RUNNING      0:0 
12369552     4_block_s+      short   erickson          1    RUNNING      0:0 
12369552.ba+      batch              erickson          1    RUNNING      0:0 
12369552.ex+     extern              erickson          1    RUNNING      0:0 
12369553     8_block_s+      short   erickson          1    RUNNING      0:0 
12369553.ba+      batch              erickson          1    RUNNING      0:0 
12369553.ex+     extern              erickson          1    RUNNING      0:0 
12369554     16_block_+      short   erickson          1    RUNNING      0:0 
12369554.ba+      batch              erickson          1    RUNNING      0:0 
12369554.ex+     extern              erickson          1    RUNNING      0:0 
12371580     block_auto      short   erickson          1  COMPLETED      0:0 
12371580.ba+      batch              erickson          1  COMPLETED      0:0 
12371580.ex+     extern              erickson          1  COMPLETED      0:0 

```

One job in longfat (1-14 days, very large memory) partition is still waiting. I also tried long partition, but I was informed no available node for my memory requirement 128000M.