Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiclass LDA bounds error #187

Closed
pdimens opened this issue Mar 4, 2022 · 3 comments
Closed

Multiclass LDA bounds error #187

pdimens opened this issue Mar 4, 2022 · 3 comments
Labels

Comments

@pdimens
Copy link

pdimens commented Mar 4, 2022

Hello, I seem to be having difficulty getting the MulticlassLDA functioning properly. I have an input matrix, specify the number of classes, and have labels corresponding to each column, but keep getting BoundsErrors:

julia> typeof(b)    # input matrix
Matrix{Float64} (alias for Array{Float64, 2})

julia> size(b)
(93, 237)

julia> length(catlab)   # labels
237

julia> length(unique(catlab))
17

julia> MultivariateStats.fit(MulticlassLDA, 16, b, catlab)
ERROR: BoundsError: attempt to access 93×16 Matrix{Float64} at index [1, 17]
Stacktrace:
 [1] getindex
   @ ./array.jl:862 [inlined]
 [2] center(X::Matrix{Float64}, label::Vector{Int64}, nc::Int64)
   @ MultivariateStats ~/.julia/packages/MultivariateStats/cFZlL/src/lda.jl:519
 [3] multiclass_lda_stats(nc::Int64, X::Matrix{Float64}, y::Vector{Int64}; covestimator_within::SimpleCovariance, covestimator_between::SimpleCovariance)
   @ MultivariateStats ~/.julia/packages/MultivariateStats/cFZlL/src/lda.jl:200
 [4] #fit#42
   @ ~/.julia/packages/MultivariateStats/cFZlL/src/lda.jl:335 [inlined]
 [5] fit(::Type{MulticlassLDA}, nc::Int64, X::Matrix{Float64}, y::Vector{Int64})
   @ MultivariateStats ~/.julia/packages/MultivariateStats/cFZlL/src/lda.jl:335
 [6] top-level scope
   @ REPL[71]:1

No matter what I set the number of classes nc to, it always gives a BoundsError, where the column index is 1+nc

Is this somehow a user error on my part, or is it possible there is a bug?

@wildart
Copy link
Collaborator

wildart commented Mar 4, 2022

Apparently, when the number of classes is less then the number of labels, class statistics cannot be calculated because of the above error. I do not know whether it's a bug or not. It's an old implementation. I have to look closer.

BTW, why do you try to specify reduced number of classes?

@pdimens
Copy link
Author

pdimens commented Mar 4, 2022

Thanks for getting back to me. Perhaps I misunderstand the theory behind a multiclass LDA. If one has 3 classes, like in Palmer's Penguins (one for each species), each penguin is labelled 1:3 corresponding to a class, then how can you perform LDA on more classes? I might be misunderstanding the concept of a class in this application.

@wildart
Copy link
Collaborator

wildart commented Mar 5, 2022

After the code review, I see following problems:

  • Incorrect estimation of the number of classes from labels
  • Incomplete input data checking

Because of the above, you got an error. In MC-LDA implementation, the number of labels should have been interpreted as the number of classes. Because, there is no such check on nc parameter, and this parameter available for the input, there is a problem. It doesn't make sense to set the number of classes less than the number of labels. Data can be filtered of unwanted class samples before performing LDA. Having the number of classes more than labels is also bad. It leads to zero means for within-class statistic, and basically inflating it.

So, if you want work with 16 classes, as in your example, remove samples of one class from your data. If your class labels are not numbered from 1 to n. I would suggest to relabel them until the issue with labels will be fixed.

@wildart wildart added bug and removed question labels Mar 5, 2022
wildart added a commit to wildart/MultivariateStats.jl that referenced this issue Aug 4, 2022
@wildart wildart closed this as completed in 1c95f82 Aug 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants