# MA3J8 Approximation Theory and Applications

## 05 - Least Squares Methods

In this notebooks we will explore in a little more detail how to fit functions using least squares fitting instead of interpolation. This is closer to the Hilbert-space setting, i.e., best approximation by projection.

First we implement some auxiliary functions: evaluating the chebyshev basis functions as well as a polynomial represented in terms of those basis functions.

In [None]:
using Plots, Polynomials, QuadGK, SoftGlobalScope, LinearAlgebra, LaTeXStrings, Plots, Printf
gr()

In [None]:
""" 
A convenience wrapper to store a polynomial in terms 
of its coefficients and its basis
"""
struct Poly
    c::Vector
    evalb
end
(p::Poly)(x) = dot(p.c, p.evalb(x, length(p.c)-1))


""" 
linear lsq fit using the QR factorisation
f : function to be fitted 
X : sample points 
N : polynomial degree 
evalbasis : a function to evaluate the basis at a point x 
"""
function pfit(f, X, N, evalbasis)
    Y = f.(X)
    A = lsqsys(X, N, evalbasis)
    c = qr(A) \ Y
    return Poly(c, evalbasis)
end


function lsqsys(X, N, evalbasis)
    M = length(X)
    A = zeros(M, N+1)
    for m = 1:M
        A[m, :] = evalbasis(X[m], N)
    end 
    return A 
end


"""
evaluate the monomial basis at a point 
""" 
monobasis(x, N) = [x^n for n = 0:N]

### 05 - 1 Motivation / Fit via QR Factorisation

We begin by recalling a naive result from our polynomial approximation tests, where we experimented using least squares in place of (equi-spaced) polynomial interpolation. But this time, let's use our own code and also push the polynomial degree a bit higher!

In [None]:
f(x) = 1 / (1 + 25 * x^2)
NN = 10:10:100
X = range(-1, 1, length=1000)

P = plot(X, f.(X), lw=2, label = "f")
ylims!(P, (-0.05, 0.2))

err = []
for N in NN
    p = pfit(f, X, N, monobasis)
    push!(err, norm(f.(X) - p.(X), Inf))
    
    # TODO: add N = 100 to the plot!
    if N in [10, 20, 40, 100]; plot!(P, X, p.(X), lw=2, label = "p$N"); end 
end 

plot(P, plot(NN, err, lw=2, m=:o, yaxis = (:log,), label = ""), layout = (1,2))

We observe that around $N = 40$ even $M = 400$ datapoints are not enough to produce a stable / accurate fit? Or does this have to do with the conditioning of the basis?

In [None]:
println(" degree |  cond(A)   |  norm(c)")
println("--------|------------|----------")
for N in NN[1:end-1] 
    p = pfit(f, X, N, monobasis)
    cnd = cond(lsqsys(X, N, monobasis))
    @printf("   %3d  |  %.2e  |  %.2e \n", N, cnd, norm(p.c))
end

Let us attempt the same using the Chebyshev basis. We know this is a much better behaved basis. The results now look much more promising!

In [None]:
""" 
evaluate the chebyshev basis at a point x 
""" 
function chebbasis(x, N)
    @assert N >= 2 
    T = zeros(typeof(x), N+1)
    T[1] = 1
    T[2] = x 
    for n = 2:N 
        T[n+1] = 2*x*T[n] - T[n-1]
    end 
    return T 
end

In [None]:
f(x) = 1 / (1 + 100 * x^2)
NN = 10:30:300
xp = range(-1, 1, length=1000)
errmono = [] 
errcheb = []
for N in NN
    pm = pfit(f, xp, N, monobasis)
    push!(errmono, norm(f.(xp) - pm.(xp), Inf))

    pc = pfit(f, xp, N, chebbasis)
    push!(errcheb, norm(f.(xp) - pc.(xp), Inf))
end 
plot(NN, errmono, lw=2, m=:o, yaxis = (:log,), label = "monofit")
plot!(NN, errcheb, lw=2, m=:o, label = "chebfit")

In [None]:
println(" degree |  cond(A)   |  norm(c)")
println("--------|------------|----------")
for N in NN[1:end-1] 
    pc = pfit(f, xp, N, chebbasis)
    cnd = cond(lsqsys(xp, N, chebbasis))
    @printf("   %3d  |  %.2e  |  %.2e \n", N, cnd, norm(pc.c))
end

But we have made a very naive mistake ... we are estimating the error on the same points on which we are training! Studying this issue in detail goes far beyond this module, but we can at least test it numerically by estimating the error on a much finer grid.

In [None]:
f(x) = 1 / (1 + 100 * x^2)
NN = 10:30:300
x_train = range(-1, 1, length=1000)
x_test = range(-1, 1, length=4*1000)
errmono1 = [] 
errcheb1 = []
for N in NN
    pm = pfit(f, x_train, N, monobasis)
    push!(errmono1, norm(f.(x_test) - pm.(x_test), Inf))

    pc = pfit(f, x_train, N, chebbasis)
    push!(errcheb1, norm(f.(x_test) - pc.(x_test), Inf))
end 
plot(NN, errmono, lw=2, m=:o, yaxis = (:log,), label = "polyfit-train")
plot!(NN, errcheb, lw=2, m=:o, label = "chebfit-train")
plot!(NN, errmono1, lw=2, m=:o, yaxis = (:log,), label = "polyfit-test")
plot!(NN, errcheb1, lw=2, m=:o, label = "chebfit-test")
ylims!((1e-7, 1.0))

This can be related to conditioning of the normal equations! See the table above! We can see that this is a form of overfitting related to the Runge phenomenon

In [None]:
P1 = plot(x_train, f.(x_train), lw=2, label = "f")
for N in [100, 200, 400]
    pc = pfit(f, x_train, N, chebbasis)
    plot!(x_test, pc.(x_test), lw=1, label = "cheb($N)")
end
ylims!(-0.1, 0.3)

x_plot = range(0.99, 0.999999, length=1_000)
P2 = plot(x_plot, f.(x_plot), lw=2, label = "")
pc = nothing 
for N in [100, 200, 400]
    pc = pfit(f, x_train, N, chebbasis)
    plot!(x_plot, pc.(x_plot), lw=2, label = "")
end
plot!(x_train, pc.(x_train), lw=0, c=4, m=:o, label  ="")
xlims!(P2, (0.99, 1.005))
ylims!(P2, (-0.12, 0.12))

plot(P1, P2)

### 05-2 Fitting with the right distribution 

In the previous section we did something that is actually quite odd. We used a Chebyshev basis to fit polynomials with datapoints $x_m$ uniformaly distributed in $[-1,1]$ whereas Chebyshev polynomials are orthogonal w.r.t the measure $(1-x^2)^{-1/2} dx$. It therefore seems natural to (1) either incorporate the Chebyshev weights 
$$
w_m = (1 - x_m^2)^{-1/2}
$$
into the fitting process; or (2) to fit on Chebyshev-distributed data points.
$$
x_m = \cos\big(\pi m/M\big)
$$
We will next explore the consequences of this modification.

NOTE: This can be done much more effectively using Gauss type quadrature rules, but this is not the point of these experiments which are gearing up towards the next section below!

In [None]:

function wchebfit(f, N, M, data=:unif, weights=:unif)
    if data == :unif
        X = range(-1, 1, length=M)
    elseif data == :cheb
        X = cos.(range(0, pi, length=M))
    else
        error("Unknown `data`")
    end
    
    if weights == :unif
        W = ones(length(X))
    elseif weights == :cheb  # W<-√W
        W = (1+1e-10 .- X.^2).^(-0.25) 
    else 
        error("Unknown `weights`")
    end
        
    # weighted lsq system
    Y = W .* f.(X)
    A = Diagonal(W) * lsqsys(X, N, chebbasis)
    return Poly(qr(A) \ Y, chebbasis)
end 


In [None]:
# easy parameters 
# C, Mfit, NN = 25, 400, 19:20:399
# medium parameters 
C, Mfit, NN = 100, 400, 20:20:380
# harder parameters 
# C, Mfit, NN = 400, 1_000, 20:40:400

f(x) = 1 / (1 + C * x^2)
x_test = cos.(range(0, pi, length=4000))

err_uu, err_uc, err_cu = [], [], []
get_err = (_N, _x, _w) -> norm(f.(x_train) - wchebfit(f, _N, Mfit, _x, _w).(x_train), Inf)

for N in NN
    push!(err_uu, get_err(N, :unif, :unif))
    push!(err_uc, get_err(N, :unif, :cheb))
    push!(err_cu, get_err(N, :cheb, :unif))
end 
plot(; yaxis = (:log, ), xaxis = ("polynomial degree",), title = "Max-Error")        
plot!(NN, err_uu, lw=2, m=:o, label = "unif, unif")
plot!(NN, err_uc, lw=2, m=:o, label = "unif, cheb")
plot!(NN, err_cu, lw=2, m=:o, label = "cheb, unif")
plot!(NN, 0.1*(1+1/sqrt(C)).^(- NN), lw=2, c=:black, ls=:dash, label="predicted")
P_maxe = hline!([1e-16], label = "eps")
plot(P_maxe, legend=:bottomleft)

In [None]:
println("Conditioning of Chebyshev LSQ System")
println(" degree |  cond(A)   |  norm(c)")
println("--------|------------|----------")
for N in 40:40:400
    X = cos.(range(0, pi, length=1_000))
    pc = pfit(f, X, N, chebbasis)
    cnd = cond(lsqsys(X, N, chebbasis))
    @printf("   %3d  |  %.2e  |  %.2e \n", N, cnd, norm(pc.c))
end

### 05-3 Fit at Random Points 

The most realistic real-world context is that we cannot choose the data-points but are given - likely random - datapoints. The first step should then be to construct an orthogonal basis adapted to the distribution of the points. Since we work mostly with Chebyshev here we assume that the distribution is the Chebyshev distribution. 

In [None]:
function rndchebfit(f, N, M, data)
    if data == :unif
        X = range(-1, 1, length=M)
    elseif data == :cheb
        X = cos.(range(0, pi, length=M))
    elseif data == :rand
        X = 2*(rand(M) .- 0.5)
    elseif data == :randcheb
        X = cos.(pi * rand(M))
    else
        error("Unknown `data`")
    end
    
    # weighted lsq system
    p = pfit(f, X, N, chebbasis)
    
    # errors
    X_test = cos.(range(0, pi, length=5*M))
    return p, norm(f.(X_test) - p.(X_test), Inf)    
end 
   

In [None]:
# easy parameters 
C, Mfit, NN = 25, 400, 19:20:399
# medium parameters 
# C, Mfit, NN = 100, 400, 20:20:380
# harder parameters 
# C, Mfit, NN = 400, 981, 20:60:980


f(x) = 1 / (1 + C * x^2)
err_u, err_ru, err_c, err_rc = [], [], [], []

for N in NN
    push!(err_u, rndchebfit(f, N, Mfit, :unif)[2])
    push!(err_ru, rndchebfit(f, N, Mfit, :rand)[2])
    push!(err_c, rndchebfit(f, N, Mfit, :cheb)[2])
    push!(err_rc, rndchebfit(f, N, Mfit, :randcheb)[2])
end 

plot(; yaxis = (:log,), xaxis = ("polynomial degree",), title="RMSE-TEST")
plot!(NN, err_u, lw=2, m=:o, label = "unif")
plot!(NN, err_ru, lw=2, ls=:dash, m=:o, label = "unif, rnd")
plot!(NN, err_c, lw=2, m=:o, label = "cheb")
plot!(NN, err_rc, lw=2, ls=:dash, m=:o, label = "cheb, rnd")
P_maxe = hline!([1e-16], label = "")

plot(P_maxe)

The analysis of [Cohen, Davenport, Leviatan, 2012] suggests that we should turn the problem around: given a number of data-points $M$ we should then choose an appropriate degree $N$. In practise it seems best to combine theory with experimentation. 

Here, we learn that we need $N$ slightly smaller than $M$ (a log-factor is suggested in the paper) to ensure that the LSQ system is stable (moderate condition number) with high probability. The additional error that arises is 
$$
  M^{-r}
$$
where $M$ is the number of data points and $r$ is given by a complicated relation, but for the Chebyshev basis with random points $x_m$ distributed according to the Chebyshev distribtion we have $r \approx C M / N$. To balance the errors we want $r \log M \approx \alpha N$, i.e., $C M/N \approx C \log M$, or, $N \approx C M/\log M$. The $C$ is a bit difficult to determine analytically, so we do it experimentally.

In [None]:
C, MM = 25, 50:50:1000
f(x) = 1 / (1 + C * x^2)

Nfun(M, Nsugg) = floor(Int, min(M-1, Nsugg))
N1(M) = Nfun(M, M / log(M))
N2(M) = Nfun(M, 3*M/log(M))
N3(M) = Nfun(M, 5*M/log(M))
N4(M) = Nfun(M, 7*M/log(M))

err, err1, err2, err3, err4 = [], [], [], [], []
for M in MM
    push!(err,  rndchebfit(f, M-1, M, :cheb)[2])
    push!(err1, rndchebfit(f, N1(M), M, :randcheb)[2])
    push!(err2, rndchebfit(f, N2(M), M, :randcheb)[2])
    push!(err3, rndchebfit(f, N3(M), M, :randcheb)[2])
    push!(err4, rndchebfit(f, N4(M), M, :randcheb)[2])
end 

plot(; yaxis = (:log,), xaxis = ("polynomial degree",), title="max-error")
plot!(MM.-1, err, lw=2, m=:o, label = "chebyshev grid")
plot!(N1.(MM), err1, lw=2, m=:o, label = "N= M/log(M)")
plot!(N2.(MM), err2, lw=2, m=:o, label = "N=3M/log(M)")
plot!(N3.(MM), err3, lw=2, m=:o, label = "N=5M/log(M)")
plot!(N4.(MM), err4, lw=2, m=:o, label = "N=7M/log(M)")

PN = hline!([1e-16], label = "eps")

plot(; yaxis = (:log,), xaxis = ("# sample points",), title="max-error")
plot!(MM, err, lw=2, m=:o, label = "")
plot!(MM, err1, lw=2, m=:o, label = "")
plot!(MM, err2, lw=2, m=:o, label = "")
plot!(MM, err3, lw=2, m=:o, label = "")
plot!(MM, err4, lw=2, m=:o, label = "")
PM = hline!([1e-16], label = "")

plot(PN, PM)