# July_2nd Meeting

## Main bottleneck: assembleλmatrix

I rewrite this part 
```
B = -(F[e]' \ Fbar[vrng,λrng])
```

to 

```
B = similar(Fbar[vrng,λrng])
for lb in λrng
    B[:,lb-λrng[1]+1] = F[e]' \ Fbar[vrng,lb]
end
```


Because it was more memory efficient from my previous formulation, and using for loop allows the using of @threads macro. But the lhs and rhs are of different data type, I couldn't use @threads or .= to assign results for speedup

```
typeof(Fbar[vrng, lb]) = SparseVector{Float64,Int64}
size(Fbar[vrng, lb]) = (289,)

typeof((F[e])' \ Fbar[vrng, lb]) = SuiteSparse.CHOLMOD.Sparse{Float64}
size((F[e])' \ Fbar[vrng, lb]) = (289, 1)

typeof(B[:, (lb - λrng[1]) + 1]) = SparseVector{Float64,Int64}
size(B[:, (lb - λrng[1]) + 1]) = (289,)
```



## for a single solve 
```
display(@benchmark B = $F[$e]' \ $Fbar[$vrng,$λrng]) 
```

```
lvl = 1
N: 128
BenchmarkTools.Trial:
  memory estimate:  540.31 KiB
  allocs estimate:  57
  --------------
  minimum time:     354.901 μs (0.00% GC)
  median time:      477.801 μs (0.00% GC)
  mean time:        633.214 μs (12.71% GC)
  maximum time:     135.778 ms (57.39% GC)
  --------------
  samples:          7858
  evals/sample:     1
 ```
 
 
 ## For assembleλmatrix() function
 ```
 rs2 = @benchmark assembleλmatrix($FToλstarts, $vstarts, $EToF, $FToB, $locfactors, $D, $FbarT)
 display(rs2)
 
 ```
 
 ```
lvl = 1
N: 128
Time for direct solve in forming λ: 0.1549990177154541
Time elapsed (assembleλmatrix) for lvl 1 = 0.44099998474121094
BenchmarkTools.Trial: 
  memory estimate:  268.08 MiB
  allocs estimate:  2234771
  --------------
  minimum time:     275.601 ms (0.00% GC)
  median time:      320.938 ms (0.00% GC)
  mean time:        329.361 ms (6.92% GC)
  maximum time:     383.987 ms (15.51% GC)
  --------------
  samples:          16
  evals/sample:     1
 ```
 
 
 ## The actual size of the matrix that we formed
 
 ```
 println(Base.summarysize(B))    # This is for lvl=1, N:128
 3233704    
 ```
 
 The unit is byte here, which is equivalent to 3.23 MB. Apprently, we used way more memory than what might be actually needed.
 

## The size of input variables

```
lvl = 1
N: 128
Time for direct solve in forming λ: 0.3900001049041748
Base.summarysize(FToλstarts) = 1200
Base.summarysize(vstarts) = 560
Base.summarysize(EToF) = 2088
Base.summarysize(FToB) = 1192
Base.summarysize(locfactors) = 1064
Base.summarysize(D) = 15272
Base.summarysize(FbarT) = 452776
Base.summarysize(B) = 3233704
Time elapsed (assembleλmatrix) for lvl 1 = 0.619999885559082
3233704
BenchmarkTools.Trial:
  memory estimate:  268.08 MiB
  allocs estimate:  2234771
  --------------
  minimum time:     270.548 ms (0.00% GC)
  median time:      320.611 ms (0.00% GC)
  mean time:        344.188 ms (8.69% GC)
  maximum time:     468.399 ms (19.59% GC)
  --------------
  samples:          15
  evals/sample:     1
 ```