Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results seem to be meaningless when smax > rank(X). #495

Closed
brtang63 opened this issue Mar 7, 2023 · 6 comments
Closed

Results seem to be meaningless when smax > rank(X). #495

brtang63 opened this issue Mar 7, 2023 · 6 comments
Assignees
Labels
documentation Improvements or additions to documentation wontfix This will not be worked on

Comments

@brtang63
Copy link
Contributor

brtang63 commented Mar 7, 2023

No warning is suggested when rank of the design matrix <= support size. When the support size gets larger, results seem to be meaningless. What about throwing a warning? The following code provides a demo.

library(abess)
data <- generate.data(n = 30, p = 100, support.size = 10)
x0 <- data$x
y0 <- data$y

idx <- c(1:10, 1:10)
x <- x0[idx, ]
y <- y0[idx]
abess(x, y, support.size = 0:15)
Call:
abess.default(x = x, y = y, support.size = 0:15)

   support.size          dev         GIC
1             0 9.934785e+05   276.17935
2             1 2.310636e+05   252.06170
3             2 8.155513e+04   236.28617
4             3 9.333066e+03   197.98460
5             4 2.467864e+03   176.43313
6             5 3.883858e+02   144.50369
7             6 5.223017e+02   155.48135
8             7 1.689733e+00    45.86059
9             8 2.741854e+01   106.64631
10            9 3.802369e-24 -1033.05370
11           10 1.188349e-25 -1097.31384
12           11 6.137819e-25 -1059.42301
13           12 1.259870e-25 -1086.03949
14           13 4.764910e-24 -1008.32964
15           14 1.402658e-25 -1073.78679
16           15 3.573510e-25 -1050.03047

@brtang63 brtang63 changed the title Results seem meaningless when smax > rank(X). Results seem to be meaningless when smax > rank(X). Mar 7, 2023
@Mamba413 Mamba413 added the enhancement New feature or request label Mar 8, 2023
@Mamba413
Copy link
Collaborator

Mamba413 commented Mar 8, 2023

@brtang63 , I am confused about this question. In this example, n is 30, i.e., rank(X) = 30; on the contrary, the maximum support size is 15 such that rank(X) >= support size. So, what warning shall be thrown?

@Mamba413
Copy link
Collaborator

Mamba413 commented Mar 8, 2023

@brtang63 , I am confused about this question. In this example, n is 30, i.e., rank(X) = 30; on the contrary, the maximum support size is 15 such that rank(X) >= support size. So, what warning shall be thrown?

Oh... I see... Let me simplify this code.

@Mamba413
Copy link
Collaborator

Mamba413 commented Mar 8, 2023

@brtang63 , I am confused about this question. In this example, n is 30, i.e., rank(X) = 30; on the contrary, the maximum support size is 15 such that rank(X) >= support size. So, what warning shall be thrown?

Oh... I see... Let me simplify this code.

Shall you do this since I cannot obtain the newest result shown in the screenshot. Also, simplify this code to a minimal one. And paste the results in code chunck... Many thanks!

@brtang63
Copy link
Contributor Author

brtang63 commented Mar 8, 2023

Actually, abess checks whether max(support.size) < nobs, but not rank(X). The following code will produce an error. However, when rank(X) < smax < nrow(X), there is no warning reported.

library(abess)
data <- generate.data(n = 10, p = 100, support.size = 10)
abess(data$x, data$y, support.size = 0:15)

#> Error in abess.default(data$x, data$y, support.size = 0:15) :  max(support.size) < nobs is not TRUE

@Mamba413 Mamba413 self-assigned this Mar 8, 2023
@Mamba413
Copy link
Collaborator

Mamba413 commented Mar 8, 2023

Thanks for this comment and your clarification!

It is an excellent comment in view of numerical analysis, but we will not address this issue temporarily. This is because computing/estimating the rank of a matrix, actually, is very time-consuming. To my knowledge, computing rank has to obtain all of the eigenvalues of this matrix which has a time complexity of N^3 where N is the matrix size.

So, until a fast rank determination algorithm is well developed, we will incorporate this fast algorithm into my implementation. Or if you know about such a fast algorithm for computing rank, we are glad to implement it into our project.

@Mamba413 Mamba413 added wontfix This will not be worked on documentation Improvements or additions to documentation and removed enhancement New feature or request labels Mar 8, 2023
@brtang63
Copy link
Contributor Author

brtang63 commented Mar 9, 2023

I get it. Thanks for your clear and helpful explanation!

@brtang63 brtang63 closed this as completed Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants