# MULTIVARIABLE LINEAR REGRESSION
---

```julia
versioninfo() # -> v"1.11.1"
```

In [27]:
cd(@__DIR__)

In [28]:
using Pkg; pkg"activate .."

[32m[1m  Activating[22m[39m project at `~/Work/git-repos/AI-ML-DL/jlai/Codes/Julia/Part-2`


Import librairies

In [29]:
using CSV, DataFrames
using MLJ

Load data from CSV file

In [30]:
df = CSV.read("../../Datasets/50_Startups.csv", DataFrame)
schema(df)

┌─────────────────┬────────────┬──────────┐
│[22m names           [0m│[22m scitypes   [0m│[22m types    [0m│
├─────────────────┼────────────┼──────────┤
│ R&D Spend       │ Continuous │ Float64  │
│ Administration  │ Continuous │ Float64  │
│ Marketing Spend │ Continuous │ Float64  │
│ State           │ Textual    │ String15 │
│ Profit          │ Continuous │ Float64  │
└─────────────────┴────────────┴──────────┘


Design the features

In [31]:
X = df[!, 1:4]
colnames = ["rd", "admin", "spend", "state"]
rename!(X, Symbol.(colnames))
coerce!(X, :state => Multiclass)

Row,rd,admin,spend,state
Unnamed: 0_level_1,Float64,Float64,Float64,Cat…
1,1.65349e5,1.36898e5,4.71784e5,New York
2,1.62598e5,1.51378e5,4.43899e5,California
3,1.53442e5,1.01146e5,4.07935e5,Florida
4,1.44372e5,1.18672e5,3.832e5,New York
5,1.42107e5,91391.8,3.66168e5,Florida
6,1.31877e5,99814.7,3.62861e5,New York
7,1.34615e5,1.47199e5,1.27717e5,California
8,1.30298e5,1.4553e5,3.23877e5,Florida
9,1.20543e5,148719.0,3.11613e5,New York
10,1.23335e5,1.08679e5,3.04982e5,California


Encoding the state column

In [32]:
ce = ContinuousEncoder()
X = machine(ce, X) |> fit! |> MLJ.transform

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(ContinuousEncoder(drop_last = false, …), …).


Row,rd,admin,spend,state__California,state__Florida,state__New York
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64
1,1.65349e5,1.36898e5,4.71784e5,0.0,0.0,1.0
2,1.62598e5,1.51378e5,4.43899e5,1.0,0.0,0.0
3,1.53442e5,1.01146e5,4.07935e5,0.0,1.0,0.0
4,1.44372e5,1.18672e5,3.832e5,0.0,0.0,1.0
5,1.42107e5,91391.8,3.66168e5,0.0,1.0,0.0
6,1.31877e5,99814.7,3.62861e5,0.0,0.0,1.0
7,1.34615e5,1.47199e5,1.27717e5,1.0,0.0,0.0
8,1.30298e5,1.4553e5,3.23877e5,0.0,1.0,0.0
9,1.20543e5,148719.0,3.11613e5,0.0,0.0,1.0
10,1.23335e5,1.08679e5,3.04982e5,1.0,0.0,0.0


Extract target vector

In [33]:
y = df.Profit
first(y, 5)

5-element Vector{Float64}:
 192261.83
 191792.06
 191050.39
 182901.99
 166187.94

Preparing for the split

In [34]:
train, test = partition(eachindex(y), 0.8, shuffle=true, rng=123)
Xtrain, Xtest = X[train, :], X[test, :]
ytrain, ytest = y[train], y[test]

([141585.52, 192261.83, 81005.76, 156991.12, 96778.92, 69758.98, 78239.91, 96712.8, 14681.4, 125370.37  …  134307.35, 182901.99, 129917.04, 71498.49, 77798.83, 191050.39, 99937.59, 108552.04, 42559.73, 132602.65], [166187.94, 35673.41, 105008.31, 107404.34, 126992.93, 118474.03, 105733.54, 124266.9, 146121.95, 96479.51])

Load & instantiate the linear regression model

In [35]:
LR = @load LinearRegressor pkg=MLJLinearModels
lr_ = LR()

import MLJLinearModels ✔


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mFor silent loading, specify `verbosity=0`. 


LinearRegressor(
  fit_intercept = true, 
  solver = nothing)

You may want to see [MLJLinearModels.jl](https://github.com/JuliaAI/MLJLinearModels.jl) and the unwrapped model type [`MLJLinearModels.LinearRegressor`](@ref).

Train & fit

In [36]:
lr = machine(lr_, Xtrain, ytrain) |> fit!

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(LinearRegressor(fit_intercept = true, …), …).
[36m[1m┌ [22m[39m[36m[1mInfo: [22m[39mSolver: MLJLinearModels.Analytical
[36m[1m│ [22m[39m  iterative: Bool false
[36m[1m└ [22m[39m  max_inner: Int64 200


trained Machine; caches model-specific representations of data
  model: LinearRegressor(fit_intercept = true, …)
  args: 
    1:	Source @076 ⏎ Table{AbstractVector{Continuous}}
    2:	Source @486 ⏎ AbstractVector{Continuous}


In [37]:
println("Params of fitted model are $(fitted_params(lr))")

Params of fitted model are (coefs = [:rd => 0.8067542137953414, :admin => -0.059568121802828326, :spend => 0.022762283966771147, :state__California => 12083.23505192103, :state__Florida => 14411.229064864907, Symbol("state__New York") => 14895.729389423832], intercept = 41390.193506209755)


Prediction

In [38]:
yhat_lr = predict(lr, Xtest)

10-element Vector{Float64}:
 173337.91139588977
  53640.98175555933
 114881.65212932485
 100469.38053487468
 115184.61619580188
 115006.97571245683
 111038.68201031166
 129732.24964730153
 136648.57250477604
  93019.86204121861

Results & metrics

In [39]:
println("Error is $(sum((yhat_lr .- ytest).^2) ./ length(ytest))")

Error is 8.307176373525837e7


Using `MLJ` Builtin Methods For Evaluation

In [40]:
MLJ.evaluate!(lr, measure=[l1, l2, rms])

PerformanceEvaluation object with these fields:
  model, measure, operation,
  measurement, per_fold, per_observation,
  fitted_params_per_fold, report_per_fold,
  train_test_rows, resampling, repeats
Extract:
┌───┬────────────────────────┬───────────┬─────────────┐
│[22m   [0m│[22m measure                [0m│[22m operation [0m│[22m measurement [0m│
├───┼────────────────────────┼───────────┼─────────────┤
│ A │ LPLoss(                │ predict   │ 6930.0      │
│   │   p = 1)               │           │             │
│ B │ LPLoss(                │ predict   │ 9.7e7       │
│   │   p = 2)               │           │             │
│ C │ RootMeanSquaredError() │ predict   │ 9850.0      │
└───┴────────────────────────┴───────────┴─────────────┘
┌───┬───────────────────────────────────────────────────┬─────────┐
│[22m   [0m│[22m per_fold                                          [0m│[22m 1.96*SE [0m│
├───┼───────────────────────────────────────────────────┼─────────┤
│ A │ [598

### RIDGE REGRESSOR

Load Ridge Regressor

In [41]:
RIDGE = @load RidgeRegressor pkg=MLJLinearModels
ridge_= RIDGE()

import MLJLinearModels ✔


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mFor silent loading, specify `verbosity=0`. 


RidgeRegressor(
  lambda = 1.0, 
  fit_intercept = true, 
  penalize_intercept = false, 
  scale_penalty_with_samples = true, 
  solver = nothing)

Train & fit the model

In [42]:
ridge = machine(ridge_, Xtrain, ytrain) |> fit!

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(RidgeRegressor(lambda = 1.0, …), …).
[36m[1m┌ [22m[39m[36m[1mInfo: [22m[39mSolver: MLJLinearModels.Analytical
[36m[1m│ [22m[39m  iterative: Bool false
[36m[1m└ [22m[39m  max_inner: Int64 200


trained Machine; caches model-specific representations of data
  model: RidgeRegressor(lambda = 1.0, …)
  args: 
    1:	Source @558 ⏎ Table{AbstractVector{Continuous}}
    2:	Source @353 ⏎ AbstractVector{Continuous}


Evalute the model

In [43]:
yhat_ridge = predict(ridge, Xtest)

10-element Vector{Float64}:
 172909.65191869333
  52126.327416694454
 114294.63957730228
 101709.69584437693
 116533.43856843302
 116372.01940383684
 110517.84747695309
 129314.14733884853
 136106.22660861057
  91989.37995523964

In [44]:
println("Error is $(sum((yhat_ridge .- ytest).^2) ./ length(ytest))")

Error is 7.172044878745562e7


### LASSO REGRESSOR

Load Lasso Regressor

In [45]:
LASSO = @load LassoRegressor pkg=MLJLinearModels
lasso_= LASSO()

import MLJLinearModels ✔


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mFor silent loading, specify `verbosity=0`. 


LassoRegressor(
  lambda = 1.0, 
  fit_intercept = true, 
  penalize_intercept = false, 
  scale_penalty_with_samples = true, 
  solver = nothing)

Train & fit the model

In [46]:
lasso = machine(lasso_, Xtrain, ytrain) |> fit!

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(LassoRegressor(lambda = 1.0, …), …).
[36m[1m┌ [22m[39m[36m[1mInfo: [22m[39mSolver: MLJLinearModels.ProxGrad
[36m[1m│ [22m[39m  accel: Bool true
[36m[1m│ [22m[39m  max_iter: Int64 1000
[36m[1m│ [22m[39m  tol: Float64 0.0001
[36m[1m│ [22m[39m  max_inner: Int64 100
[36m[1m│ [22m[39m  beta: Float64 0.8
[36m[1m└ [22m[39m  gram: Bool false
[33m[1m└ [22m[39m[90m@ MLJLinearModels ~/.julia/packages/MLJLinearModels/yYgtO/src/fit/proxgrad.jl:59[39m
[33m[1m└ [22m[39m[90m@ MLJLinearModels ~/.julia/packages/MLJLinearModels/yYgtO/src/fit/proxgrad.jl:73[39m


trained Machine; caches model-specific representations of data
  model: LassoRegressor(lambda = 1.0, …)
  args: 
    1:	Source @989 ⏎ Table{AbstractVector{Continuous}}
    2:	Source @140 ⏎ AbstractVector{Continuous}


Evalute the model

In [47]:
yhat_lasso = predict(lasso, Xtest)

10-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

In [48]:
println("Error is $(sum((yhat_lasso .- ytest).^2) ./ length(ytest))")

Error is 1.388985250949734e10


### ELASTIC NET REGRESSOR

Load Elastic Net Regressor

In [49]:
EN = @load ElasticNetRegressor pkg=MLJLinearModels
en_= EN(lambda=.2)

import MLJLinearModels ✔


[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mFor silent loading, specify `verbosity=0`. 


ElasticNetRegressor(
  lambda = 0.2, 
  gamma = 0.0, 
  fit_intercept = true, 
  penalize_intercept = false, 
  scale_penalty_with_samples = true, 
  solver = nothing)

Train & fit the model

In [50]:
en = machine(en_, Xtrain, ytrain) |> fit!

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(ElasticNetRegressor(lambda = 0.2, …), …).
[36m[1m┌ [22m[39m[36m[1mInfo: [22m[39mSolver: MLJLinearModels.ProxGrad
[36m[1m│ [22m[39m  accel: Bool true
[36m[1m│ [22m[39m  max_iter: Int64 1000
[36m[1m│ [22m[39m  tol: Float64 0.0001
[36m[1m│ [22m[39m  max_inner: Int64 100
[36m[1m│ [22m[39m  beta: Float64 0.8
[36m[1m└ [22m[39m  gram: Bool false
[33m[1m└ [22m[39m[90m@ MLJLinearModels ~/.julia/packages/MLJLinearModels/yYgtO/src/fit/proxgrad.jl:59[39m
[33m[1m└ [22m[39m[90m@ MLJLinearModels ~/.julia/packages/MLJLinearModels/yYgtO/src/fit/proxgrad.jl:73[39m


trained Machine; caches model-specific representations of data
  model: ElasticNetRegressor(lambda = 0.2, …)
  args: 
    1:	Source @149 ⏎ Table{AbstractVector{Continuous}}
    2:	Source @338 ⏎ AbstractVector{Continuous}


Evalute the model

In [51]:
yhat_en = predict(en, Xtest)

10-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

In [52]:
println("Error is $(sum((yhat_en .- ytest).^2) ./ length(ytest))")

Error is 1.388985250949734e10
