# RMT and outliers

This example examines the effects of outliers on SVD performance
for estimating a low-rank matrix from noisy data,
from the perspective of random matrix theory,
using the Julia language.

Add the Julia packages that are need for this demo.
Change `false` to `true` in the following code block
if you are using any of the following packages for the first time.

In [None]:
if false
    import Pkg
    Pkg.add([
        "InteractiveUtils"
        "LaTeXStrings"
        "LinearAlgebra"
        "MIRTjim"
        "Plots"
        "Random"
        "StatsBase"
    ])
end

Tell Julia to use the following packages for this example.
Run `Pkg.add()` in the preceding code block first, if needed.

In [None]:
using InteractiveUtils: versioninfo
using LaTeXStrings
using LinearAlgebra: Diagonal, norm, rank, svd, svdvals
using MIRTjim: jim, prompt
using Plots.PlotMeasures: px
using Plots: default, gui, plot, plot!, scatter!, savefig
using Random: seed!
using StatsBase: mean
default(markerstrokecolor=:auto, label="", widen=true, markersize = 6,
 labelfontsize = 24, legendfontsize = 18, tickfontsize = 14, linewidth = 3,
)
seed!(0)

The following line is helpful when running this file as a script;
this way it will prompt user to hit a key after each image is displayed.

In [None]:
isinteractive() && prompt(:prompt);

## Image example

Apply an SVD-based low-rank approximation approach
to some data with outliers.

## Latent matrix
Make a matrix that has low rank:

In [None]:
tmp = [
    zeros(1,20);
    0 1 0 0 0 0 1 0 0 0 1 1 1 1 0 1 1 1 1 0;
    0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0;
    0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0;
    0 0 1 1 1 1 0 0 0 0 1 1 0 0 0 0 0 1 1 0;
    zeros(1,20)
]';
rank(tmp)

Turn it into an image:

In [None]:
Xtrue = kron(1 .+ 8*tmp, ones(9,9))
rtrue = rank(Xtrue)

plots with consistent size

In [None]:
jim1 = (X ; kwargs...) -> jim(X; size = (700,300),
 leftmargin = 10px, rightmargin = 10px, kwargs...);

and consistent display range

In [None]:
jimc = (X ; kwargs...) -> jim1(X; clim=(0,9), kwargs...);

and with NRMSE label

In [None]:
nrmse = (Xh) -> round(norm(Xh - Xtrue) / norm(Xtrue) * 100, digits=1)
args = (xaxis = false, yaxis = false, colorbar = :none) # book
args = (;) # web
jime = (X; kwargs...) -> jimc(X; xlabel = "NRMSE = $(nrmse(X)) %",
 args..., kwargs...,
)
bm = s -> "\\mathbf{\\mathit{$s}}"
title = latexstring("\$$(bm(:X))\$ : Latent image")
pt = jimc(Xtrue; title, xlabel = " ", args...)

## Helper functions

Bernoulli outliers with magnitude `τ` and probability `p`:

In [None]:
function outliers(dims::Dims, τ::Real = 6, p::Real = 0.05)
    Z = τ * sign.(randn(dims)) .* (rand(dims...) .< p)
    return Z
end;

## Noisy data

In [None]:
seed!(0)
(M, N) = size(Xtrue)
Z = outliers((M,N))
Y = Xtrue + Z

title = latexstring("\$$(bm(:Y))\$ : Corrupted image matrix\n(with outliers)")
py = jime(Y ; title)

## Singular values

The first 3 singular values of $Y$
are well above the "noise floor" caused by outliers.

But
$σ₄(X)$
is just barely above the threshold,
and
$σ₅(X)$
is below the threshold,
so we cannot expect a simple SVD approach
to recover them well.

In [None]:
ps1 = plot(
 title = "Singular values",
 xaxis = (L"k", (1, N), [1, 3, 6, N]),
 yaxis = (L"σ_k",),
 leftmargin = 15px, bottommargin = 20px, size = (600,350), widen = true,
)
sv_x = svdvals(Xtrue)
sv_y = svdvals(Y)
scatter!(sv_y, color=:red, label="Y (data)", marker=:dtriangle)
scatter!(sv_x, color=:blue, label="Xtrue", marker=:utriangle)

In [None]:
prompt()

## Low-rank estimate

A simple low-rank estimate of $X$
from the first few SVD components of $Y$
works just so-so here.
A simple SVD approach recovers the first 3 components well,
but cannot estimate the 4th and 5th components.

In [None]:
r = 5
U,s,V = svd(Y)
Xr = U[:,1:r] * Diagonal(s[1:r]) * V[:,1:r]'
title = latexstring("Rank $r approximation of data \$$(bm(:Y))\$")
pr = jime(Xr ; title)

Examine singular vector estimates.
The first 3 are quite good; the next two are poor.

In [None]:
sv1 = [
 sum(svd(Xr).U[:,1:r] .* svd(Xtrue).U[:,1:r], dims=1).^2
 sum(svd(Xr).V[:,1:r] .* svd(Xtrue).V[:,1:r], dims=1).^2
]

## Non-iterative "robust" PCA

Try simple outlier removal method.
Look at the residual between $\hat{X}$ and $Y$:

In [None]:
residual = Xr - Y

pd = jim1(residual; clim = (-1,1) .* 7, cticks = (-1:1:1) * 8,
 title = latexstring("Residual \$$(bm(:Y)) - \\hat{$(bm(:X))}\$"),
)

Identify "bad" pixels with large residual errors

In [None]:
badpixel = @. abs(residual) > 3
jim1(badpixel)

Replace "bad" pixels with typical image values

In [None]:
Ymod = copy(Y)
Ymod[badpixel] .= mean(Y[.!badpixel])
jime(Ymod) # already reduces NRMSE by a lot compared to Y itself!

Examine singular values of modified $Y$.
The noise floor is lower.

In [None]:
ps2 = plot(
 title = "Singular values",
 xaxis = (L"k", (1, N), [1, 3, 6, N]),
 yaxis = (L"σ_k",),
 leftmargin = 15px, bottommargin = 20px, size = (600,350), widen = true,
)
sv_f = svdvals(Ymod)
scatter!(sv_f, color=:green, label="Y (modified)", marker=:hex)
scatter!(sv_x, color=:blue, label="Xtrue", marker=:utriangle)

In [None]:
prompt()

Applying low-rank matrix approximation to modified $Y$
leads to lower NRMSE.

In [None]:
Um,sm,Vm = svd(Ymod)
Xh = Um[:,1:r] * Diagonal(sm[1:r]) * Vm[:,1:r]'
title = latexstring("Rank $r approximation of modified data \$$(bm(:Y))\$")
ph = jime(Xh ; title)

All of the singular components are better recovered,
including the ones that were near or below the noise threshold.

In [None]:
sv2 = [
 sum(svd(Xh).U[:,1:r] .* svd(Xtrue).U[:,1:r], dims=1).^2
 sum(svd(Xh).V[:,1:r] .* svd(Xtrue).V[:,1:r], dims=1).^2
]

Summary

In [None]:
pa = jim(stack((Xtrue, abs.(Z), Y, Xr, 6*badpixel, Xh));
 ncol=1, size=(600, 900), clim=(0,9))

## More outliers

Now examine a case where the outliers are stronger
and more prevalent.

In [None]:
pout2 = 0.1
Z = outliers((M,N), 50, pout2)
Y = Xtrue + Z

title = latexstring("\$$(bm(:Y))\$ : Corrupted image matrix\n(with $(100*pout2)% outliers)")
py2 = jime(Y ; title)

## Singular values

Now all of the singular values of $X$
are below the "noise floor" caused by outliers.

In [None]:
ps3 = plot(
 title = "Singular values",
 xaxis = (L"k", (1, N), [1, 3, 6, N]),
 yaxis = (L"σ_k",),
 leftmargin = 15px, bottommargin = 20px, size = (600,350), widen = true,
)
sv_x = svdvals(Xtrue)
sv_y = svdvals(Y)
scatter!(sv_y, color=:red, label="Y (data)", marker=:dtriangle)
scatter!(sv_x, color=:blue, label="Xtrue", marker=:utriangle)

In [None]:
prompt()

## Low-rank estimate

A simple low-rank estimate of $X$
from the first few SVD components of $Y$
does not work at all now
for such heavily corrupted data.

In [None]:
r = 5
U,s,V = svd(Y)
Xr = U[:,1:r] * Diagonal(s[1:r]) * V[:,1:r]'
title = latexstring("Rank $r approximation of data \$$(bm(:Y))\$")
pr2 = jime(Xr ; title)

Examine singular vector estimates.
The first one is so-so, the rest are useless.

In [None]:
sv3 = [
 sum(svd(Xr).U[:,1:r] .* svd(Xtrue).U[:,1:r], dims=1).^2
 sum(svd(Xr).V[:,1:r] .* svd(Xtrue).V[:,1:r], dims=1).^2
]

## Non-iterative "robust" PCA

Try simple outlier removal method.
Look at the residual between $\hat{X}$ and $Y$:

In [None]:
residual = Xr - Y

pd2 = jim1(residual; clim = (-1,1) .* 70, cticks = (-1:1:1) * 8,
 title = latexstring("Residual \$$(bm(:Y)) - \\hat{$(bm(:X))}\$"),
)

Identify "bad" pixels with large residual errors.
This is a nonlinear operation:

In [None]:
badpixel = @. abs(residual) > 10
jim1(badpixel)

Replace "bad" pixels with typical image values

In [None]:
Ymod = copy(Y)
Ymod[badpixel] .= mean(Y[.!badpixel])
jime(Ymod) # already reduces NRMSE by a lot compared to Y itself!

Examine singular values of modified $Y$.
The noise floor is lower.

In [None]:
ps4 = plot(
 title = "Singular values",
 xaxis = (L"k", (1, N), [1, 3, 6, N]),
 yaxis = (L"σ_k",),
 leftmargin = 15px, bottommargin = 20px, size = (600,350), widen = true,
)
sv_f = svdvals(Ymod)
scatter!(sv_f, color=:green, label="Y (modified)", marker=:hex)
scatter!(sv_x, color=:blue, label="Xtrue", marker=:utriangle)

In [None]:
prompt()

Applying low-rank matrix approximation to modified $Y$
leads to lower NRMSE.

In [None]:
Um,sm,Vm = svd(Ymod)
Xh = Um[:,1:r] * Diagonal(sm[1:r]) * Vm[:,1:r]'
title = latexstring("Rank $r approximation of modified data \$$(bm(:Y))\$")
ph2 = jime(Xh ; title)

Now the first three singular components are better recovered.

In [None]:
sv4 = [
 sum(svd(Xh).U[:,1:r] .* svd(Xtrue).U[:,1:r], dims=1).^2
 sum(svd(Xh).V[:,1:r] .* svd(Xtrue).V[:,1:r], dims=1).^2
]

Let's try iterating to see if we can refine it.
Indeed it does refine it,
but at this point it starts to become an ad hoc iterative method.
If we going to iterate,
then it seems preferable to use a cost function
like the one used in robust PCA,
with a proper optimization algorithm.

In [None]:
residual = Xh - Y
badpixel = @. abs(residual) > 10
jim1(badpixel)
Ymod = copy(Y)
Ymod[badpixel] .= mean(Y[.!badpixel])
Um,sm,Vm = svd(Ymod)
Xh3 = Um[:,1:r] * Diagonal(sm[1:r]) * Vm[:,1:r]'
title = latexstring("Rank $r approximation of modified data \$$(bm(:Y))\$")
ph3 = jime(Xh3 ; title)

---

*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*