Problem of computing GIC #267

yannstory · 2021-12-13T09:37:50Z

I want to compute GIC to select the true model. But I gain different results from the abess packages and manual calculation.

   set.seed(2)
    p = 250
    N = 2500
    X = matrix(rnorm(N * p), ncol = p)
    A = sort(sample(p, 10))
    beta = rep(0, p)
    beta = replace(beta, A, rnorm(10, mean = 6))
    xbeta <- X %*% beta
    Y <- xbeta + rnorm(N)

Compute the estimator by abess packages.


    C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic")
    k = C$best.size
    mid=coef(abess(X, Y, family = "gaussian",support.size =k))
    Central =mid[2:(p+1)]
    intercept=mid[1]
    #compute GIC[10]=131.3686
    GIC= N*log(1/(2*N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N)))
    #GIC=-1601.499

The text was updated successfully, but these errors were encountered:

Mamba413 · 2021-12-13T11:14:15Z

I want to compute GIC to select the true model. But I gain different results from the abess packages and manual calculation. p = 250 N = 2500 X = matrix(rnorm(N * p), ncol = p) A = sort(sample(p, 10)) beta = rep(0, p) beta = replace(beta, A, rnorm(10, mean = 6)) xbeta <- X %*% beta Y <- xbeta + rnorm(N) C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic") k = C$best.size Central = extract(C,support.size =k)$beta[1:p] # compute GIC[10]=129.2499

GIC= N_log(1/(2_N)_(t(Central)_t(X)_X_Central-2_t(Y)_X_Central+t(Y)Y))+k_log(p)(log(log(N))) #GIC=-1602.366

Thanks for your question. Two points:

I fail to run the last command in R because:
`

GIC= Nlog(1/(2N)(t(Central)t(X)XCentral-2t(Y)XCentral+t(Y)Y))+klog(p)(log(log(N)))
Error: unexpected symbol in "GIC= Nlog(1/(2N"
`

It seems that you have omitted the intercept term extract(C,support.size =k)[["intercept"]]?

Mamba413 · 2021-12-13T16:10:42Z

Again, I cannot run you code. You may paste your code like this:

Mamba413 · 2021-12-13T16:35:17Z

I want to compute GIC to select the true model. But I gain different results from the abess packages and manual calculation.

set.seed(2) p = 250 N = 2500 X = matrix(rnorm(N * p), ncol = p) A = sort(sample(p, 10)) beta = rep(0, p) beta = replace(beta, A, rnorm(10, mean = 6)) xbeta <- X %*% beta Y <- xbeta + rnorm(N)

Compute the estimator by abess packages. C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic") k = C$best.size mid=coef(abess(X, Y, family = "gaussian",support.size =k)) #compute GIC[10]=131.3686 Central =mid[2:(p+1)] intercept=mid[1] GIC= N*log(1/(2*N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N))) #GIC=-1601.499

Hi~The code is readable now. But I am not clear how you get this line in R:
#compute GIC[10]=131.3686
Please share the related code.
Thanks.

yannstory · 2021-12-13T16:46:26Z

Thanks for your reply. Please add the following code and then can gain GIC[10]=131.3686
C$tune.value[11]
The value is the GIC when the support size is 10.

Mamba413 · 2021-12-13T17:04:13Z

C$tune.value[11]

Thanks. The following code gives results the same GIC from the abess packages.

set.seed(2)
p = 250
N = 2500
X = matrix(rnorm(N * p), ncol = p)
A = sort(sample(p, 10))
beta = rep(0, p)
beta = replace(beta, A, rnorm(10, mean = 6))
xbeta <- X %*% beta
Y <- xbeta + rnorm(N)
library(abess)
C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic")
k = C$best.size
mid=coef(abess(X, Y, family = "gaussian",support.size =k))
extract(C, support.size = k)[["tune.value"]]
C$tune.value[11]
#compute GIC[10]=131.3686
Central =mid[2:(p+1)]
intercept=mid[1]
GIC= N*log(1/(N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N)))
GIC

In summary, when we compute GIC, we omit a constant term (1/2) in loss function. It seems like a bug because it is not completely match to the GIC our PNAS paper (https://www.pnas.org/content/117/52/33117). We will fix this as fast as we can. But we believe this constant term is not essential, and we still can achieves desirable results in many numerical study. So, you can trust the results given by the abess.

All in all, thanks for your careful code inspection, we are very appreciated.

yannstory · 2021-12-13T17:09:55Z

C$tune.value[11]

Thanks. The following code gives results the same GIC from the abess packages.
set.seed(2)
p = 250
N = 2500
X = matrix(rnorm(N * p), ncol = p)
A = sort(sample(p, 10))
beta = rep(0, p)
beta = replace(beta, A, rnorm(10, mean = 6))
xbeta <- X %*% beta
Y <- xbeta + rnorm(N)
library(abess)
C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic")
k = C$best.size
mid=coef(abess(X, Y, family = "gaussian",support.size =k))
extract(C, support.size = k)[["tune.value"]]
C$tune.value[11]
#compute GIC[10]=131.3686
Central =mid[2:(p+1)]
intercept=mid[1]
GIC= N*log(1/(N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N)))
GIC
In summary, when we compute GIC, we omit a constant term (1/2) in loss function. It seems like a bug because it is not completely match to the GIC our PNAS paper (https://www.pnas.org/content/117/52/33117). We will fix this as fast as we can. But we believe this constant term is not essential, and we still can achieves desirable results in many numerical study. So, you can trust the results given by the abess.

All in all, thanks for your careful code inspection, we are very appreciated.

Thank you for your patience.

Mamba413 · 2021-12-14T02:55:49Z

@Jiang-Kangkang , I would like to fix this bug. But I am not very sure that just modify loss_function in abessLm would not cause any problem. I need to modify the other component in abessLm?

Mamba413 · 2021-12-26T12:02:41Z

Hi, @yannstory

It takes some time for us to fix this. Now, the following code returns the same GIC defined in our PNAS paper (https://www.pnas.org/content/117/52/33117).

set.seed(2)
p = 250
N = 2500
X = matrix(rnorm(N * p), ncol = p)
A = sort(sample(p, 10))
beta = rep(0, p)
beta = replace(beta, A, rnorm(10, mean = 6))
xbeta <- X %*% beta
Y <- xbeta + rnorm(N)
library(abess)
C = abess(X, Y, family = "gaussian", tune.path="sequence",tune.type = "gic")
k = C$best.size
mid=coef(abess(X, Y, family = "gaussian",support.size =k))
extract(C, support.size = k)[["tune.value"]]
C$tune.value[11]
#compute GIC[10]=131.3686
Central =mid[2:(p+1)]
intercept=mid[1]
GIC= N*log(1/(2*N)*t(Y-X%*%Central-intercept)%*%(Y-X%*%Central-intercept))+k*log(p)*(log(log(N)))
GIC

Notice that you need to install the latest package from our github repository:

library(devtools)
install_github(repo = "abess-team/abess", subdir = "R-package")

If you meet any additional question, please feel free to contact us.

yannstory · 2021-12-26T12:08:19Z

Thank you very much! It's very perfect!

Mamba413 closed this as completed Dec 26, 2021

Mamba413 added bug Something isn't working invalid This doesn't seem right good first issue Good for newcomers and removed bug Something isn't working labels Jan 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem of computing GIC #267

Problem of computing GIC #267

yannstory commented Dec 13, 2021 •

edited

Mamba413 commented Dec 13, 2021

Mamba413 commented Dec 13, 2021

Mamba413 commented Dec 13, 2021

yannstory commented Dec 13, 2021

Mamba413 commented Dec 13, 2021

yannstory commented Dec 13, 2021

Mamba413 commented Dec 14, 2021

Mamba413 commented Dec 26, 2021

yannstory commented Dec 26, 2021

Problem of computing GIC #267

Problem of computing GIC #267

Comments

yannstory commented Dec 13, 2021 • edited

Mamba413 commented Dec 13, 2021

Mamba413 commented Dec 13, 2021

Mamba413 commented Dec 13, 2021

yannstory commented Dec 13, 2021

Mamba413 commented Dec 13, 2021

yannstory commented Dec 13, 2021

Mamba413 commented Dec 14, 2021

Mamba413 commented Dec 26, 2021

yannstory commented Dec 26, 2021

yannstory commented Dec 13, 2021 •

edited