Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem of computing GIC #267

Closed
yannstory opened this issue Dec 13, 2021 · 9 comments
Closed

Problem of computing GIC #267

yannstory opened this issue Dec 13, 2021 · 9 comments
Labels
good first issue Good for newcomers invalid This doesn't seem right

Comments

@yannstory
Copy link

yannstory commented Dec 13, 2021

I want to compute GIC to select the true model. But I gain different results from the abess packages and manual calculation.

   set.seed(2)
    p = 250
    N = 2500
    X = matrix(rnorm(N * p), ncol = p)
    A = sort(sample(p, 10))
    beta = rep(0, p)
    beta = replace(beta, A, rnorm(10, mean = 6))
    xbeta <- X %*% beta
    Y <- xbeta + rnorm(N)

Compute the estimator by abess packages.


    C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic")
    k = C$best.size
    mid=coef(abess(X, Y, family = "gaussian",support.size =k))
    Central =mid[2:(p+1)]
    intercept=mid[1]
    #compute GIC[10]=131.3686
    GIC= N*log(1/(2*N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N)))
    #GIC=-1601.499
     
@Mamba413
Copy link
Collaborator

I want to compute GIC to select the true model. But I gain different results from the abess packages and manual calculation. p = 250 N = 2500 X = matrix(rnorm(N * p), ncol = p) A = sort(sample(p, 10)) beta = rep(0, p) beta = replace(beta, A, rnorm(10, mean = 6)) xbeta <- X %*% beta Y <- xbeta + rnorm(N) C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic") k = C$best.size Central = extract(C,support.size =k)$beta[1:p] # compute GIC[10]=129.2499

GIC= N_log(1/(2_N)_(t(Central)_t(X)_X_Central-2_t(Y)_X_Central+t(Y)Y))+k_log(p)(log(log(N))) #GIC=-1602.366

Thanks for your question. Two points:

  1. I fail to run the last command in R because:
    `

GIC= Nlog(1/(2N)(t(Central)t(X)XCentral-2t(Y)XCentral+t(Y)Y))+klog(p)(log(log(N)))
Error: unexpected symbol in "GIC= Nlog(1/(2N"
`

  1. It seems that you have omitted the intercept term extract(C,support.size =k)[["intercept"]]?

@Mamba413
Copy link
Collaborator

Again, I cannot run you code. You may paste your code like this:
image

@Mamba413
Copy link
Collaborator

I want to compute GIC to select the true model. But I gain different results from the abess packages and manual calculation.

set.seed(2) p = 250 N = 2500 X = matrix(rnorm(N * p), ncol = p) A = sort(sample(p, 10)) beta = rep(0, p) beta = replace(beta, A, rnorm(10, mean = 6)) xbeta <- X %*% beta Y <- xbeta + rnorm(N)

Compute the estimator by abess packages. C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic") k = C$best.size mid=coef(abess(X, Y, family = "gaussian",support.size =k)) #compute GIC[10]=131.3686 Central =mid[2:(p+1)] intercept=mid[1] GIC= N*log(1/(2*N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N))) #GIC=-1601.499

Hi~The code is readable now. But I am not clear how you get this line in R:
#compute GIC[10]=131.3686
Please share the related code.
Thanks.

@yannstory
Copy link
Author

Thanks for your reply. Please add the following code and then can gain GIC[10]=131.3686
C$tune.value[11]
The value is the GIC when the support size is 10.

@Mamba413
Copy link
Collaborator

C$tune.value[11]

Thanks. The following code gives results the same GIC from the abess packages.

set.seed(2)
p = 250
N = 2500
X = matrix(rnorm(N * p), ncol = p)
A = sort(sample(p, 10))
beta = rep(0, p)
beta = replace(beta, A, rnorm(10, mean = 6))
xbeta <- X %*% beta
Y <- xbeta + rnorm(N)
library(abess)
C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic")
k = C$best.size
mid=coef(abess(X, Y, family = "gaussian",support.size =k))
extract(C, support.size = k)[["tune.value"]]
C$tune.value[11]
#compute GIC[10]=131.3686
Central =mid[2:(p+1)]
intercept=mid[1]
GIC= N*log(1/(N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N)))
GIC

In summary, when we compute GIC, we omit a constant term (1/2) in loss function. It seems like a bug because it is not completely match to the GIC our PNAS paper (https://www.pnas.org/content/117/52/33117). We will fix this as fast as we can. But we believe this constant term is not essential, and we still can achieves desirable results in many numerical study. So, you can trust the results given by the abess.

All in all, thanks for your careful code inspection, we are very appreciated.

@yannstory
Copy link
Author

C$tune.value[11]

Thanks. The following code gives results the same GIC from the abess packages.

set.seed(2)
p = 250
N = 2500
X = matrix(rnorm(N * p), ncol = p)
A = sort(sample(p, 10))
beta = rep(0, p)
beta = replace(beta, A, rnorm(10, mean = 6))
xbeta <- X %*% beta
Y <- xbeta + rnorm(N)
library(abess)
C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic")
k = C$best.size
mid=coef(abess(X, Y, family = "gaussian",support.size =k))
extract(C, support.size = k)[["tune.value"]]
C$tune.value[11]
#compute GIC[10]=131.3686
Central =mid[2:(p+1)]
intercept=mid[1]
GIC= N*log(1/(N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N)))
GIC

In summary, when we compute GIC, we omit a constant term (1/2) in loss function. It seems like a bug because it is not completely match to the GIC our PNAS paper (https://www.pnas.org/content/117/52/33117). We will fix this as fast as we can. But we believe this constant term is not essential, and we still can achieves desirable results in many numerical study. So, you can trust the results given by the abess.

All in all, thanks for your careful code inspection, we are very appreciated.

Thank you for your patience.

@Mamba413
Copy link
Collaborator

@Jiang-Kangkang , I would like to fix this bug. But I am not very sure that just modify loss_function in abessLm would not cause any problem. I need to modify the other component in abessLm?

@Mamba413
Copy link
Collaborator

Hi, @yannstory

It takes some time for us to fix this. Now, the following code returns the same GIC defined in our PNAS paper (https://www.pnas.org/content/117/52/33117).

set.seed(2)
p = 250
N = 2500
X = matrix(rnorm(N * p), ncol = p)
A = sort(sample(p, 10))
beta = rep(0, p)
beta = replace(beta, A, rnorm(10, mean = 6))
xbeta <- X %*% beta
Y <- xbeta + rnorm(N)
library(abess)
C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic")
k = C$best.size
mid=coef(abess(X, Y, family = "gaussian",support.size =k))
extract(C, support.size = k)[["tune.value"]]
C$tune.value[11]
#compute GIC[10]=131.3686
Central =mid[2:(p+1)]
intercept=mid[1]
GIC= N*log(1/(2*N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N)))
GIC

Notice that you need to install the latest package from our github repository:

library(devtools)
install_github(repo = "abess-team/abess", subdir = "R-package")

If you meet any additional question, please feel free to contact us.

@yannstory
Copy link
Author

Thank you very much! It's very perfect!

@Mamba413 Mamba413 added bug Something isn't working invalid This doesn't seem right good first issue Good for newcomers and removed bug Something isn't working labels Jan 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants