# __Early Regression in Julia__

<br>
Tyler J. Brough <br>
Last Update: March 8, 2021 <br>
<br>
<br>

There are multiple ways to perform statistical regression in Julia. In this notebook I will demonstrate two different approaches. 

The first approach uses the `llsq` method from the `MultivariateStats` module. 

<br>

__Note:__ linear regression can be specified as an optimization problem; specifically, a minimization problem. 

<br>



We will formulate least squares the following way:

<br>

$$
\arg\min_{(\alpha, \beta)} \left( \frac{1}{2} || y - (X \beta - \alpha)  ||^{2} \right)
$$

<br>

## __Using `MultivariateStats`__

The module `MultivariateStats` provides the `llsq` method to solve these types of problems. 

<br>

Let's see it in action.

<br>

In [1]:
using StatsKit.MultivariateStats
using StatsKit.StatsBase

In [2]:
## prepare data

X = rand(1000, 3)  # simulate artificial data for 3 variables
beta = rand(3)     # simulate ground truth for the coefficients (you could also hand code this)

3-element Array{Float64,1}:
 0.1637950114777329
 0.3720251828804908
 0.0983288557173645

In [3]:
# generate the respond variable given the model equation

u = 0.1 * randn(1000);
y = X * beta + u;      # suppress output

In [4]:
## solve the least squares problem 

betahat = llsq(X, y; bias=false) # don't allow bias (I'll explain later)

3-element Array{Float64,1}:
 0.16236693186531756
 0.3946265935693638
 0.09170202436572411

In [5]:
## get the predicted response values
yhat = X * betahat;

In [6]:
## measure the error with rmse

rmse = sqrt(mean(abs2.(y - yhat)))
print("rmse = $rmse")

rmse = 0.09950565454516187

Let's check the documentation on `llsq`

In [20]:
?MultivariateStats.llsq

No documentation found.

`MultivariateStats.llsq` is a `Function`.

```
# 1 method for generic function "llsq":
[1] llsq(X::AbstractArray{T,2}, Y::Union{AbstractArray{T,1}, AbstractArray{T,2}}; trans, bias) where T<:Real in MultivariateStats at /Users/tjb/.julia/packages/MultivariateStats/BYMwD/src/lreg.jl:22
```


<br>

See here for more: https://multivariatestatsjl.readthedocs.io/en/stable/lreg.html#llsq

<br>

## __Using `GLM`__