# Block Quasi-Newton Updates

Out of place symmetric dense updates for H approximating $A^{-1}$ where $DG \approx A.DX$.

In practice $A=\nabla^2 f(x)$ and $DG$ is either a difference between gradient $g=\nabla f$ 
values or one-sided samples from an AD derivative of $g$. 

Currently no dimension or type checking! 

## Notes
 1. I am going to experiment with truncating the  pseudo inverse 
more aggressively than usual. I am going to call this 
*throttling*. 

1. We can not use a Cholesky because we do not 
know the update is consistent with +def.

1. I have a new reference for 
the PSB (and which I am using for all the others as well) 
On the Derivation of Quasi-Newton Formulas for
Optimization in Function Spaces
Radoslav G. Vuchkov, Cosmin G. Petra & Noémi Petra
Radoslav G. Vuchkov, Cosmin G. Petra & Noémi Petra (2020) On the
Derivation of Quasi-Newton Formulas for Optimization in 
Function Spaces, Numerical Functional
Analysis and Optimization, 41:13, 1564-1587, 
DOI: 10.1080/01630563.2020.1785496
Link: https://doi.org/10.1080/01630563.2020.1785496

1. I have found more recent references and which contains a block PSB and is focused on issues of symmetry.
@article{Boutet_2020,
	doi = {10.1007/s10589-019-00164-z},
	url = {https://doi.org/10.1007%2Fs10589-019-00164-z},
	year = 2020,
	month = {jan},
	publisher = {Springer Science and Business Media {LLC}},
	volume = {75},
	number = {2},
	pages = {441--466},
	author = {Nicolas Boutet and Rob Haelterman and Joris Degroote},
	title = {Secant update version of quasi-Newton {PSB} with weighted multisecant equations},
	journal = {Computational Optimization and Applications}
}
This article lists losts of stuff but not an H version of the Block PSB. 

1. Our symmetry issues are much less!  The AD updates are perfectly symmetric.  We have one secant update that is 
potentially in minor conflict with symmetry.  We can do the AD block update and the single secant update in either order. 
in either order. 


## SRMin 

Direct extension of SR1 to multiple vectors.  Remember it is self dual.

Derivation is simple
$$
(H + D H ) DG = DX 
\\
\Delta H  DG = DX - H DG
$$
and so
$$
DH = (DX - H DG) \Gamma (DX - H DG)^\top.
$$
Defining $T=(DX - H DG)$ which we assume to be full rank 
and substituting into the equations
gives 
$$
T \Gamma T^\top  DG = T 
\\
T^\top T \Gamma T^\top  DG = T^\top T
\\
\Gamma T^\top  DG = I
$$
All we need to do is compute $\Gamma = (T^\top  DG)^{-1} = ((DX - H DG)^\top DG)^{-1}$ and plug back in.

In [120]:
using LinearAlgebra
function SRMin_H(H, DX, DG)
    # The optional truncation parameter for the Pseudo Inverse pinv(M, tol) should be investigated. 
    T = (DX-H*DG);   
    # This should be made more efficient to avoid assignments and return updates in place
    H + T*pinv(T'*DG)*T' 
end

SRMin_H (generic function with 1 method)

# BFGS

Block BFGS:  Broyden Fletcher Goldfarb Shanno

2020: Ref p1565 Dx is s and DG is y

In [115]:
function BFGS_H(H, DX, DG)
    Gamma = pinv(DX'*DG)   # Symmetric if DG = A*Dx 
    (I - DX*Gamma*DG')*H*(I - DG*Gamma*DX') + DX*Gamma*DX' # Julia magic instantiates an identity matrix!
end

BFGS_H (generic function with 1 method)

# DFP
Block DFP: Davidon Fletcher Powell

2020: Ref p1565 Dx is s and DG is y

In [116]:
function DFP_H(H, DX, DG)
    HDG=H*DG
    Gamma1 = pinv(DX'*DG)    # Symmetric if DG = A*Dx
    Gamma2 = pinv(DG'*HDG)   # Symmetric
    H - HDG*Gamma2*HDG' + DX*Gamma1*DX'
end

DFP_H (generic function with 1 method)

# PSB
Block Powell Symmetric Broyden: 

Nicolas Boutet and Rob Haelterman and Joris Degroote: Ref p6 DX is S and DG is Y
Formula below Formula 4 has typos! 

Back to Robert Schnabel 
Quasi-Newton Methods Using Multiple Secant equations
CU-CS-247-83 Eq 3.6 p19.  As before DX is S and DG is Y
**BUT** H is the approximation to A **not** $A^{-1}$.
Since everything this is self dual I get the inverse
formula by flipping DX and DG.

## Notes
1. This is S1 from LOD.
1. It is symmetrized with a correction term.  

In [141]:
using LinearAlgebra
function PSB_H(H, DX, DG)
    DGpInv = DG*pinv(DG'*DG)
    DXMinusHDG = DX - H*DG
    H + DGpInv*DXMinusHDG' + DXMinusHDG*DGpInv' - DGpInv*DXMinusHDG'*DGpInv*DG'
end

PSB_H (generic function with 1 method)

# Testing Symmetric
I am testing on symmetric data first. 

# Symmetric test

```julia
n=56; s=13; eps = 1.0*10^-1
A=rand(n,n); A=A+A'; 
H0 = inv(A) + eps*rand(n,n); H0=H0+H0';
DX = rand(n,s); DG = A*DX; 
HSRM = SRMin_H(H0, DX, DG) 
HBFG =  BFGS_H(H0, DX, DG) 
HDF  =   DFP_H(H0, DX, DG) 
HPSB =   PSB_H(H0, DX, DG) 
println("residuals = ", map(H -> norm(H*DG - DX), [HSRM, HBFG, HDF, HPSB])/norm(H0))
println("sym mismathc = ", map(H -> norm(H-H'), [HSRM, HBFG, HDF, HPSB])/norm(H0))
println("|H-#| = ", map(H -> norm(H-H0), [HSRM, HBFG, HDF, HPSB])/norm(H0))

Messing with the data a smidge! This is the circumstance that can break things.  As you can see there
the output is much less symmetric

# Symmetry Breaking test

```julia 
n=56; s=3; eps = 1.0*10^-1; eps2 = 1.0*10^-2;
A=rand(n,n); A=A+A'; 
invA=inv(A);
H0 = invA + eps*rand(n,n); H0=H0+H0';
DX = rand(n,s); DG = A*DX + eps2*rand(n,s); 
HSRM = SRMin_H(H0, DX, DG) 
HBFG =  BFGS_H(H0, DX, DG) 
HDFP  =   DFP_H(H0, DX, DG) 
HPSB =   PSB_H(H0, DX, DG) 
println("residuals = ", map(H -> norm(H*DG - DX), [HSRM, HBFG, HDFP, HPSB])/norm(H0))
println("sym mismatch = ", map(H -> norm(H-H'), [HSRM, HBFG, HDFP, HPSB])/norm(H0))
println("|H0-#| = ", map(H -> norm(H-H0), [HSRM, HBFG, HDFP, HPSB])/norm(H0))
println("|inv(A)-#| rel = ", map(H -> norm(H-invA), [HSRM, HBFG, HDFP, HPSB])/norm(H0-invA))
println("|A-inv(#)| rel = ", map(H -> norm(inv(H)-A), [HSRM, HBFG, HDFP, HPSB])/norm(inv(H0)-A))

In [None]:
Wish I understood why they behave like this.  It looks as though PSB is pretty OK.  