In [1]:
using CSV, DataFrames, RDatasets, DrWatson
using Flux, Optim

# Empirical Risk Minimization Framework 

### Selecting the Model and Dataset 

In [7]:
datapath = datadir("exp_raw");
advertising = CSV.File(joinpath(datapath, "Advertising.csv")) |> DataFrame
advertising = advertising[!,Not(:Column1)]
n,m = size(advertising)
first(advertising,5)
X= [ones(n) Matrix(advertising[:,Not(:Sales)])]
y = advertising.Sales;

We'll have a linear model where $$ \text{Sales} = \beta_0 + \beta_1 \text{TV} + \beta_2 \text{Radio} + \beta_3 \text{Newspaper}$$

In [8]:
# prediction fucntion 
ŷ(X,β) = X * β

ŷ (generic function with 1 method)

In [16]:
# cost function: root mean sqaured error 
rmse_i(β) = (i,) -> sqrt(Flux.Losses.mse(ŷ(X[i],β),y[i]))

rmse_i (generic function with 1 method)

In [9]:
rmse(β) = sqrt(Flux.Losses.mse(ŷ(X,β),y))

rmse (generic function with 1 method)

In [10]:
rmse(β)

71.29025671010615

In [11]:
β = zeros(m)
results = optimize(rmse,β)

 * Status: success

 * Candidate solution
    Final objective value:     1.668570e+00

 * Found with
    Algorithm:     Nelder-Mead

 * Convergence measures
    √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    225
    f(x) calls:    403


In [12]:
params = Optim.minimizer(results)

4-element Vector{Float64}:
  2.938725446932781
  0.04576475301502275
  0.1885348308546321
 -0.001037439568134145

# Least Sqaures Method

In [None]:
using GLM
ols =lm(@formula(Sales ~ 1 + TV + Radio + Newspaper),advertising)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

Sales ~ 1 + TV + Radio + Newspaper

Coefficients:
────────────────────────────────────────────────────────────────────────────
                   Coef.  Std. Error      t  Pr(>|t|)   Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────────
(Intercept)   2.93889     0.311908     9.42    <1e-16   2.32376    3.55402
TV            0.0457646   0.0013949   32.81    <1e-80   0.0430137  0.0485156
Radio         0.18853     0.00861123  21.89    <1e-53   0.171547   0.205513
Newspaper    -0.00103749  0.00587101  -0.18    0.8599  -0.012616   0.010541
────────────────────────────────────────────────────────────────────────────

We see that estimates from Empirircal Risk Minimization method and OLS match

# Gradients & Hessians

In [21]:
using Zygote

In [13]:
temp = Matrix{Float64}(undef,n,m);

In [17]:
for i in 1:n
    temp[i,:] =gradient(x->rmse_i(x)(i),params)[1]
end

In [25]:
# this is B matrix
B = temp*temp' ./n

200×200 Matrix{Float64}:
 0.00125     0.00124691  0.00124531  …  0.00124875  0.00124996  0.001249
 0.00124691  0.00125     0.00124983     0.00124959  0.00124617  0.00124942
 0.00124531  0.00124983  0.00125        0.0012489   0.0012444   0.00124864
 0.00124991  0.00124786  0.0012465      0.00124932  0.00124976  0.0012495
 0.00124879  0.00124957  0.00124886     0.00125     0.00124831  0.00124999
 0.00123894  0.00124753  0.00124865  …  0.00124513  0.00123757  0.00124458
 0.00124817  0.00124984  0.00124934     0.00124995  0.00124759  0.00124987
 0.00124892  0.00124948  0.00124873     0.00124999  0.00124847  0.00125
 0.00121354  0.00123159  0.00123493     0.00122574  0.0012111   0.00122454
 0.00124714  0.00125     0.00124977     0.00124967  0.00124642  0.00124952
 0.00124382  0.00124947  0.0012499   …  0.00124813  0.00124279  0.00124779
 0.00124983  0.00124819  0.00124692     0.0012495   0.00124963  0.00124965
 0.00124512  0.0012498   0.00125        0.00124881  0.0012442   0.00124853
 ⋮    

In [35]:
B = temp' * temp /n 

4×4 Matrix{Float64}:
 0.0409334  0.0523974  0.0518317  0.0525829
 0.0523974  0.0700742  0.0692018  0.0703602
 0.0518317  0.0692018  0.0683446  0.0694829
 0.0525829  0.0703602  0.0694829  0.0706478

In [48]:
using CovarianceEstimation
cov(SimpleCovariance(), X)

4×4 Matrix{Float64}:
 0.0     0.0       0.0       0.0
 0.0  7334.1      69.5132  105.39
 0.0    69.5132  219.326   113.924
 0.0   105.39    113.924   471.937

In [40]:
OPG= inv(B) /n
eigvals(OPG)

4-element Vector{Float64}:
 -9.29892995642298e13
  0.03332442227707699
  3.437997727325799
  2.2462862690264528e14

In [64]:
hessian(x->rmse(x)(1),β)

4×4 Matrix{Float64}:
  80.7507   -19.6684  -21.6171   -9.10628
 -19.6684    59.0192  -36.6388  -15.4342
 -21.6171   -36.6388   52.0863  -16.9634
  -9.10628  -15.4342  -16.9634   85.2092

In [24]:
gradient(rmse,params)[1]

4-element Vector{Float64}:
 -2.066524952348131e-5
 -0.00236197067723376
  0.00016014468936864645
 -0.0002808079625291615

In [36]:
using LinearAlgebra
isposdef(B)

false

In [49]:
using StatsBase

In [52]:
standardize(ZScoreTransform, X, dims=1)

200×4 Matrix{Float64}:
 NaN   0.967425    0.979066    1.77449
 NaN  -1.19438     1.0801      0.667903
 NaN  -1.51236     1.52464     1.77908
 NaN   0.0519194   1.21481     1.28319
 NaN   0.393196   -0.839507    1.27859
 NaN  -1.61136     1.7267      2.04081
 NaN  -1.04296     0.642293   -0.323896
 NaN  -0.312652   -0.246787   -0.870303
 NaN  -1.61253    -1.42549    -1.35702
 NaN   0.614501   -1.39181    -0.429504
 NaN  -0.94279    -1.17628    -0.291754
 NaN   0.788051    0.0495729  -1.21927
 NaN  -1.43549     0.797208    1.62297
   ⋮                          
 NaN   1.61853    -0.630708   -1.23304
 NaN  -1.49489    -0.751946   -0.328487
 NaN  -1.25262     1.20134    -1.13662
 NaN  -0.833302   -0.839507   -1.12744
 NaN  -1.51236    -1.29078     0.0480288
 NaN   0.230128    1.26195    -1.23764
 NaN   0.0309536   0.830886   -1.12744
 NaN  -1.26776    -1.31772    -0.769287
 NaN  -0.615491   -1.2369     -1.03101
 NaN   0.348934   -0.940539   -1.10907
 NaN   1.59057     1.26195     1.63674
 