This notebook provides an overview over different ways of estimating a linear model in Julia.

#### Creating data

In [1]:
x1 = randn(100)
x2 = randn(100)
eps = randn(100)
y = 0.4*x1 + 0.3*x2 + 0.4*eps;

#### Without packages - coefficient estimates only

Without loading any additional packages, we can use function `linreg` to perform an ordinary least square estimation. However, this does automatically estimate an intercept as well, which sometimes might not be what you want. Also, the function does not provide standard errors or t statistics.

In [2]:
betaHat1 = linreg([x1 x2], y)

3-element Array{Float64,1}:
 0.0209388
 0.321078 
 0.287667 

#### With package GLM

We can alternatively use functions provided by package GLM, and either use the very general GLS framework or a simplified syntax for linear models. Also, data not necessarily needs to be stored as `DataFrame`.

In [3]:
using GLM
using DataFrames

# without keeping data as DataFrame
olsFit1 = lm([x1 x2], y)



LinearModel{DensePredQR{Float64}}:

Coefficients:     Estimate Std.Error t value Pr(>|t|)
x1   0.322123 0.0491006 6.56048    <1e-8
x2   0.282875  0.043889 6.44522    <1e-8



Since you can build your design matrix on your own, you can also manually include an intercept:

In [4]:
olsFit2 = lm([ones(size(x1)) x1 x2], y)

LinearModel{DensePredQR{Float64}}:

Coefficients:      Estimate Std.Error  t value Pr(>|t|)
x1   0.0209388 0.0480059 0.436171   0.6637
x2    0.321078 0.0493628  6.50445    <1e-8
x3    0.287667 0.0454205  6.33342    <1e-8



In addition, there are also three ways that allow estimation by specifying a formula similar to R. This way, however, data needs to be stored as `DataFrame`. Estimating a linear model with intercept, we get:

In [5]:
# get data as DataFrame
df = DataFrame(y = y, x1 = x1, x2 = x2)

# using function lm
olsFit3 = lm(y ~ x1 + x2, df)

# using function fit
olsFit4 = fit(LinearModel, y~x1+x2, df)

# using function glm
olsFit5 = glm(y~x1+x2,df,Normal(),IdentityLink())

DataFrameRegressionModel{GeneralizedLinearModel,Float64}:

Coefficients:
              Estimate Std.Error  z value Pr(>|z|)
(Intercept)  0.0209388 0.0480059 0.436171   0.6627
x1            0.321078 0.0493628  6.50445   <1e-10
x2            0.287667 0.0454205  6.33342    <1e-9


Or, if we do not want to estimate an intercept:

In [6]:
# using function lm
olsFit6 = lm(y ~ 0 + x1 + x2, df)

# using function fit
olsFit7 = fit(LinearModel, y~0+x1+x2, df)

# using function glm
olsFit8 = glm(y~0+x1+x2,df,Normal(),IdentityLink())

DataFrameRegressionModel{GeneralizedLinearModel,Float64}:

Coefficients:
     Estimate Std.Error z value Pr(>|z|)
x1   0.322123 0.0491006 6.56048   <1e-10
x2   0.282875  0.043889 6.44522    <1e-9


## Session info

In [7]:
versioninfo()

Julia Version 0.3.5
Commit a05f87b* (2015-01-08 22:33 UTC)
Platform Info:
  System: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libblas.so.3
  LAPACK: liblapack.so.3
  LIBM: libopenlibm
  LLVM: libLLVM-3.3


In [8]:
Pkg.status()

18 required packages:
 - DataArrays                    0.2.9
 - DataFrames                    0.6.0
 - Dates                         0.3.2
 - Debug                         0.0.4
 - Distributions                 0.6.3
 - EconDatasets                  0.0.2
 - GLM                           0.4.2
 - Gadfly                        0.3.10
 - IJulia                        0.1.16
 - JuMP                          0.7.3
 - MAT                           0.2.9
 - NLopt                         0.2.0
 - Quandl                        0.4.0
 - RDatasets                     0.1.1
 - Taro                          0.1.2
 - TimeData                      0.5.1
 - TimeSeries                    0.4.6
 - Winston                       0.11.7
56 additional packages:
 - ArrayViews                    0.4.8
 - BinDeps                       0.3.7
 - Blosc                         0.1.1
 - Cairo                         0.2.22
 - Calculus                      0.1.5
 - Codecs                        0.1.3
 - Color      