# Logistic Regression

In [12]:
using MLJ
using RDatasets
using DataFrames
using CategoricalArrays
using Gadfly
import StatsBase: countmap, cor, var

In [2]:
sMarket = dataset("ISLR","Smarket")

Row,Year,Lag1,Lag2,Lag3,Lag4,Lag5,Volume,Today,Direction
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Cat…
1,2001.0,0.381,-0.192,-2.624,-1.055,5.01,1.1913,0.959,Up
2,2001.0,0.959,0.381,-0.192,-2.624,-1.055,1.2965,1.032,Up
3,2001.0,1.032,0.959,0.381,-0.192,-2.624,1.4112,-0.623,Down
4,2001.0,-0.623,1.032,0.959,0.381,-0.192,1.276,0.614,Up
5,2001.0,0.614,-0.623,1.032,0.959,0.381,1.2057,0.213,Up
6,2001.0,0.213,0.614,-0.623,1.032,0.959,1.3491,1.392,Up
7,2001.0,1.392,0.213,0.614,-0.623,1.032,1.445,-0.403,Down
8,2001.0,-0.403,1.392,0.213,0.614,-0.623,1.4078,0.027,Up
9,2001.0,0.027,-0.403,1.392,0.213,0.614,1.164,1.303,Up
10,2001.0,1.303,0.027,-0.403,1.392,0.213,1.2326,0.287,Up


In [7]:
describe(sMarket, :mean, :std, :eltype)

Row,variable,mean,std,eltype
Unnamed: 0_level_1,Symbol,Union…,Union…,DataType
1,Year,2003.02,1.40902,Float64
2,Lag1,0.0038344,1.1363,Float64
3,Lag2,0.0039192,1.13628,Float64
4,Lag3,0.001716,1.1387,Float64
5,Lag4,0.001636,1.13877,Float64
6,Lag5,0.0056096,1.14755,Float64
7,Volume,1.47831,0.360357,Float64
8,Today,0.0031384,1.13633,Float64
9,Direction,,,"CategoricalValue{String, UInt8}"
10,DirectionInt,0.5184,0.499861,Int64


In [9]:
y = sMarket.Direction
X = select(sMarket, Not(:Direction)); #All other cols other than Direction

To see the predictor pairwise correlations, convert to matrix.

In [13]:
cm = X |> Matrix |> cor
round.(cm, sigdigits=3)

9×9 Matrix{Float64}:
 1.0      0.0297    0.0306    0.0332   …   0.539    0.0301    0.0746
 0.0297   1.0      -0.0263   -0.0108       0.0409  -0.0262   -0.0398
 0.0306  -0.0263    1.0      -0.0259      -0.0434  -0.0103   -0.0241
 0.0332  -0.0108   -0.0259    1.0         -0.0418  -0.00245   0.00613
 0.0357  -0.00299  -0.0109   -0.0241      -0.0484  -0.0069    0.00422
 0.0298  -0.00567  -0.00356  -0.0188   …  -0.022   -0.0349    0.00542
 0.539    0.0409   -0.0434   -0.0418       1.0      0.0146    0.023
 0.0301  -0.0262   -0.0103   -0.00245      0.0146   1.0       0.731
 0.0746  -0.0398   -0.0241    0.00613      0.023    0.731     1.0

The target needs to be converted to a categorical object, given ordered factor with Up postive and Down negative.

In [14]:
y = coerce(y, OrderedFactor) #Converts the type of y
typeof(y)

CategoricalVector{String, UInt8, String, CategoricalValue{String, UInt8}, Union{}}[90m (alias for [39m[90mCategoricalArray{String, 1, UInt8, String, CategoricalValue{String, UInt8}, Union{}}[39m[90m)[39m

In [15]:
classes(y[1])

2-element CategoricalArray{String,1,UInt8}:
 "Down"
 "Up"

In [16]:
LogisticClassifier = @load LogisticClassifier pkg=MLJLinearModels

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mFor silent loading, specify `verbosity=0`. 


import MLJLinearModels ✔


MLJLinearModels.LogisticClassifier

In [17]:
X2 = select(X, Not([:Year, :Today]))
classif = machine(LogisticClassifier(), X2, y)

[33m[1m│ [22m[39msupports. Suppress this type check by specifying `scitype_check_level=0`.
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mRun `@doc MLJLinearModels.LogisticClassifier` to learn more about your model's requirements.
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mCommonly, but non exclusively, supervised models are constructed using the syntax
[33m[1m│ [22m[39m`machine(model, X, y)` or `machine(model, X, y, w)` while most other models are
[33m[1m│ [22m[39mconstructed with `machine(model, X)`.  Here `X` are features, `y` a target, and `w`
[33m[1m│ [22m[39msample or class weights.
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mIn general, data in `machine(model, data...)` is expected to satisfy
[33m[1m│ [22m[39m
[33m[1m│ [22m[39m    scitype(data) <: MLJ.fit_data_scitype(model)
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mIn the present case:
[33m[1m│ [22m[39m
[33m[1m│ [22m[39mscitype(data) = Tuple{Table{Union{AbstractVector{Continuous}, AbstractVector{Coun

untrained Machine; caches model-specific representations of data
  model: LogisticClassifier(lambda = 2.220446049250313e-16, …)
  args: 
    1:	Source @023 ⏎ Table{Union{AbstractVector{Continuous}, AbstractVector{Count}}}
    2:	Source @971 ⏎ AbstractVector{OrderedFactor{2}}


In [18]:
fit!(classif)
ŷ = MLJ.predict(classif, X2)

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(LogisticClassifier(lambda = 2.220446049250313e-16, …), …).
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mSolver: MLJLinearModels.LBFGS()


1250-element CategoricalDistributions.UnivariateFiniteVector{OrderedFactor{2}, String, UInt8, Float64}:
 UnivariateFinite{OrderedFactor{2}}(Down=>0.0, Up=>1.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>0.0, Up=>1.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>1.0, Up=>0.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>0.0, Up=>1.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>0.0, Up=>1.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>0.0, Up=>1.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>1.0, Up=>0.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>0.0, Up=>1.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>0.0, Up=>1.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>0.0, Up=>1.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>1.0, Up=>0.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>1.0, Up=>0.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>0.0, Up=>1.0)
 ⋮
 UnivariateFinite{OrderedFactor{2}}(Down=>0.0, Up=>1.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>1.0, Up=>0.0)
 UnivariateFinite{OrderedFactor{2}}(Down=>1.0, Up=>0.

In [19]:
cross_entropy(ŷ, y) |> mean |> r3

LoadError: UndefVarError: r3 not defined