# Problem 5:

### a.

The model matrix A:

In [1]:
A = [1; 1; 1; 1; 1; 1; 1; 1; 1; 1 ;1; 1; 1; 1; 1; 1;; 5000; 5200; 6000; 6538; 7109; 7556; 8005; 8207; 8210; 8600; 9026; 9197; 9926; 10813; 13800; 14311]

16×2 Matrix{Int64}:
 1   5000
 1   5200
 1   6000
 1   6538
 1   7109
 1   7556
 1   8005
 1   8207
 1   8210
 1   8600
 1   9026
 1   9197
 1   9926
 1  10813
 1  13800
 1  14311

The response vector:

In [2]:
b = [2596.8;3328 ;3181.1 ;3198.4 ;4779.9; 5905.6; 5769.2; 8089.5; 4813.1; 5618.7; 7736; 6788.3; 7840.8; 8882.5; 10489.5; 12506.6]

16-element Vector{Float64}:
  2596.8
  3328.0
  3181.1
  3198.4
  4779.9
  5905.6
  5769.2
  8089.5
  4813.1
  5618.7
  7736.0
  6788.3
  7840.8
  8882.5
 10489.5
 12506.6

### b.

The function for the gradient evaluated at point x:

In [3]:
using LinearAlgebra

function ∇f(x::Vector)
    return A'*A*x - A'*b
end
f(x) = 0.5 * norm(A*x - b)^2

f (generic function with 1 method)

### c.

The function for the Hessian evaluated at point x:

In [4]:
function Hessian()
    return A' * A
end

Hessian (generic function with 1 method)

### d.

The steepest descent function:

In [5]:
using LinearAlgebra 

function steepest_descent(x; k = 1000)
    i = 1
    while norm(∇f(x)) > 1e-1 && i < 10000
        p = (-I) * ∇f(x)
        x = x + 0.000000001 * p #step_size was determined through backtracking and approximated to this value for ease
        if i % k == 0 
            println("iteration ", i, ". x = ", x)
        end
        i+=1
    end
    return x
end

steepest_descent (generic function with 1 method)

### e.

Newton's Method Function:

In [6]:
function newton_method(x; k = 1)
    i = 1
    while norm(∇f(x)) > 1e-1 && i < 10000
        B = inv(Hessian())
        p = - B * ∇f(x)
        x = x + p
        if i % k == 0 
            println("iteration ", i, ". x = ", x)
        end
        i += 1
    end
    return x
end

newton_method (generic function with 1 method)

### f.

Using the initial guess of (-2200, 0.5)^T, evaluation of the two methods:

In [7]:
x = [-2200;0.5]

println("Steepest Descent Method: ", steepest_descent(x))
println("Newton's Method: ", newton_method(x))

iteration 1000. x = [-2200.0000489685203, 0.9951113881973228]
iteration 2000. x = [-2200.0001507932734, 0.9951113990677681]
iteration 3000. x = [-2200.0002526180265, 0.9951114099382137]
iteration 4000. x = [-2200.000354442582, 0.995111420808638]
iteration 5000. x = [-2200.0004562668805, 0.9951114316790352]
iteration 6000. x = [-2200.000558091179, 0.9951114425494321]
iteration 7000. x = [-2200.0006599154526, 0.9951114534198262]
iteration 8000. x = [-2200.000761739296, 0.9951114642901747]
iteration 9000. x = [-2200.00086356314, 0.9951114751605232]
Steepest Descent Method: [-2200.0009652851595, 0.9951114860200011]
iteration 1. x = [-2277.069654638195, 1.0033390629260879]
Newton's Method: [-2277.069654638195, 1.0033390629260879]


As we can see above, the steepest descent model takes quite a few more iterations to run, and this is even when the method was forced to stop at 10000 iterations. If the steepest_descent model is given enough iterations, or given a point far closer to the ideal, it will eventually converge to the ideal solution, but incredibly slow (this was tested just to make sure that the model wasn't broken).

### g.

Regression model fitting:

In [8]:
using DataFrames, GLM

data = DataFrame(X = [5000, 5200, 6000, 6538, 7109, 7556, 8005, 8207, 8210, 8600, 9026, 9197, 9926, 10813, 13800, 14311], 
    Y = [2596.8, 3328, 3181.1, 3198.4, 4779.9, 5905.6, 5769.2, 8089.5, 4813.1, 5618.7, 7736, 6788.3, 7840.8, 8882.5, 10489.5, 12506.6])

lm(@formula(Y ~ X), data)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

Y ~ 1 + X

Coefficients:
────────────────────────────────────────────────────────────────────────────────
                   Coef.   Std. Error      t  Pr(>|t|)     Lower 95%   Upper 95%
────────────────────────────────────────────────────────────────────────────────
(Intercept)  -2277.07     765.499      -2.97    0.0100  -3918.9       -635.238
X                1.00334    0.0853205  11.76    <1e-07      0.820345     1.18633
────────────────────────────────────────────────────────────────────────────────

As we can see from the intercept and X coeffiecients, we can see that the estimates for Newton's method match exactly with the regression lm() model but the steepest descent is slightly off, especially for the intercept term. 

Since Newton's method is a 2nd order differential method (using the Hessian) whereas steepest descent is a 1st order (using the gradient), the method is able to find a more accurate model of convergence and even converge towards minimizing residuals faster in less steps (1 iteration for Newton's method and more than 10000 for steepest descent)