Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logistic regression fails if y is a string of vectors #62

Open
biona001 opened this issue May 11, 2022 · 1 comment
Open

Logistic regression fails if y is a string of vectors #62

biona001 opened this issue May 11, 2022 · 1 comment

Comments

@biona001
Copy link

From README:

For logistic models, y is either a string vector or a m x 2 matrix

But the following doesn't work

using GLMNet
y = ["M", "B", "M", "B"]
X = rand(4, 10)
glmnet(X, y, Binomial())

MethodError: no method matching glmnet(::Matrix{Float64}, ::Vector{String}, ::Binomial{Float64})
Closest candidates are:
  glmnet(::AbstractMatrix{T} where T, ::AbstractVector{T} where T, ::AbstractVector{T} where T) at /home/users/bbchu/.julia/packages/GLMNet/C8WKF/src/CoxNet.jl:151
  glmnet(::AbstractMatrix{T} where T, ::AbstractVector{T} where T, ::AbstractVector{T} where T, ::CoxPH; kw...) at /home/users/bbchu/.julia/packages/GLMNet/C8WKF/src/CoxNet.jl:151
  glmnet(::Matrix{Float64}, ::Vector{Float64}, ::Distribution; kw...) at /home/users/bbchu/.julia/packages/GLMNet/C8WKF/src/GLMNet.jl:485
  ...

Fortunately if y is a matrix with 2 columns, it does work

y = [1 0; 0 1; 0 1; 1 0]
X = rand(4, 10)
glmnet(X, y, Binomial())

Logistic GLMNet Solution Path (100 solutions for 10 predictors in 833 passes):
────────────────────────────────
       df    pct_dev           λ
────────────────────────────────
  [1]   0  0.0        0.476672
  [2]   1  0.0582906  0.455006
  [3]   1  0.11166    0.434325
  [4]   1  0.160737   0.414585
  [5]   1  0.206039   0.395741
  [6]   1  0.248      0.377754
  [7]   1  0.286986   0.360585
  ...
@JackDunnNZ
Copy link
Collaborator

It looks like the method that supports the string-vector input is this one:

function glmnet(X::AbstractMatrix, y; kw...)
lev = sort(unique(y))
if length(lev) >= 2
y = convert(Matrix{Float64}, [i == j for i in y, j in lev])
if length(lev) == 2
glmnet(X, y, Binomial(); kw...)
else
glmnet(X, y, Multinomial(); kw...)
end
else
error("y has only one level.")
end
end

So this works:

using GLMNet
y = ["M", "B", "M", "B"]
X = rand(4, 10)
glmnet(X, y)

The reason it doesn't need a distribution is because it chooses between Binomial and Multinomial based on the number of unique values in y. This method could probably be extended to support passing a distribution, and I guess throwing an error if the distribution and y are incompatible.

At the very least the README should be updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants