Regularization - parameter selection #83

mtva0001 · 2021-12-20T17:00:06Z

Hi,

Thanks for developing the sjSDM and making it possible to use it in R!
I have a small fungal dataset (with ~500 OTUs and 40 samples (approx. 10 samples from 4 different sites) and some environmental + spatial data and I want to run sjSDM on this dataset. The main aim is to drawn some conclusions about community assembly processes, following Leibold et al.’s (2021, OIKOS) paper. However, this is the first time I’m using JSDM and not so sure how I should set up the right parameters for tuning. Following your script on github and the example in the R help section, I ended up with this script:

SPV=generateSpatialEV(env[,8:9])

tune = sjSDM_cv(env = as.matrix(Env), Y = as.matrix(Occ),
learning_rate = 0.001, iter = 100L, CV = nrow(Occ),
tune_steps = 40,
lambda_cov = seq(0, 0.1, 0.001),
lambda_coef = 0.1,
alpha_cov = seq(0, 1, 0.05),
alpha_coef = 0.5,
sampling = 100L,
biotic = bioticStruct(df=dim(Occ)[2]),
spatial = linear(SPV),
lambda_spatial = 2^seq(-10, -0.5, length.out = 20),
step_size = 5L,
blocks = 3L,
family=poisson("log"),
n_cores = 8)

best = head(tune$short_summary[order(tune$short_summary$logLik),])[1,]

model <- sjSDM(Y = as.matrix(Occ),
env = linear(as.matrix(Env), lambda = best[["lambda_coef"]], alpha = best[["alpha_coef"]]),
spatial = linear(SPV, ~0+., lambda = best[["lambda_spatial"]], alpha = best[["alpha_spatial"]]),
biotic = bioticStruct(lambda = best[["lambda_cov"]], alpha = best[["alpha_cov"]]),
family = poisson("log"),
iter = 100, learning_rate = 0.001)

Would you be so kind to take a quick look at this above and let me know if this setting would be appropriate to run the model, or if I should modify it somehow?

Thanks in advance!!!

MaximilianPi · 2021-12-21T07:17:43Z

Hi,

yes, this looks overall fine but here a few small errors/questions:

Regularization:

Is this intended that you tune only lambda and alpha for space and the associations? If so then everything looks good, otherwise, I would change lambda_coef and alpha_coef.

Model specifications:

You have probably already done it, but scale 'Env' and 'SPV'
You currently run a LOOCV (CV = nrow(OCC)) which is computationally very expensive: 100*40/8 = 500 models per core for your task. If this is unreasonable, you might try a 5 or 10-CV first
in your final model you set the intercept in the space term to zero ("linear(SPV, ~0+.,...)" ), which is correct, but you should do this in your tuning.
the same applies for the biotic object, in the tuning you set df=dim(Occ)[2] which you should also do in the final model
learning rate is very low, probably start with a higher lr (e.g. 0.01 or 0.005) because the learning rate sheduler will decrease automatically a too high learning rate during training but there's no recovery from a too low learning rate

SPV=generateSpatialEV(env[,8:9])

tune = sjSDM_cv(env = scale(as.matrix(Env)), Y = as.matrix(Occ),
                learning_rate = 0.005, iter = 100L, CV = nrow(Occ),
                tune_steps = 40,
                lambda_cov = seq(0, 0.1, 0.001),
                lambda_coef = seq(0, 0.1, 0.001),
                alpha_cov = seq(0, 1, 0.05),
                alpha_coef = seq(0, 1, 0.05),
                alpha_spatial = seq(0, 1, 0.05),
                sampling = 100L,
                biotic = bioticStruct(df=dim(Occ)[2]),
                spatial = linear(scale(SPV), ~0+.),
                lambda_spatial = 2^seq(-10, -0.5, length.out = 20),
                step_size = 5L,
                blocks = 3L,
                family=poisson("log"),
                n_cores = 8)

best = head(tune$short_summary[order(tune$short_summary$logLik),])[1,]

model <- sjSDM(Y = as.matrix(Occ),
               env = linear(scale(as.matrix(Env)), lambda = best[["lambda_coef"]], alpha = best[["alpha_coef"]]),
               spatial = linear(scale(SPV), ~0+., lambda = best[["lambda_spatial"]], alpha = best[["alpha_spatial"]]),
               biotic = bioticStruct(lambda = best[["lambda_cov"]], alpha = best[["alpha_cov"]], df = dim(Occ)[2]),
               family = poisson("log"),
               iter = 100, learning_rate = 0.005)

mtva0001 · 2021-12-21T11:25:26Z

Thank you so much for your quick response and help!

To answer your question (Is this intended that you tune only lambda and alpha for space and the associations?): No, but when I tune alpha, I got this error message: Error: vector memory exhausted (limit reached?). Though I have 16 Gb memory, but it seems it's not enough. When I don't tune, it works.

MaximilianPi · 2021-12-21T11:39:26Z

Oh damn, this issue is actually on my to-do list...it is caused by the creation of the hyper-parameter tuning space:

I naively create it with grid.expand(lambda_coef, lambda_cov, lambda_spatial, alpha_cov, alpha_cov, alpha_spatial) and then draw tune_steps random samples (rows) from this space (matrix)...which is of course not memory efficient...

For now maybe just reduce the number of steps in your seq- functions then it should be possible to also tune environment.

mtva0001 · 2021-12-21T11:41:05Z

Ah okay! Now it works with "lambda_cov = seq(0, 0.1, 0.05), lambda_coef = seq(0, 0.1, 0.05)". Thanks again!

MaximilianPi · 2021-12-21T11:46:39Z

Great!

florianhartig · 2021-12-21T11:51:12Z

Hey Max, I took the liberty to create a new issue #84 to be able to discuss / close this independent of the question!

MaximilianPi added the enhancement New feature or request label Dec 21, 2021

MaximilianPi modified the milestone: sjSDM 0.2.0 CRAN Release Dec 21, 2021

florianhartig mentioned this issue Dec 21, 2021

Improve memory efficiency of hyper-parameter sampling #84

Closed

florianhartig added question Further information is requested and removed enhancement New feature or request labels Dec 21, 2021

MaximilianPi mentioned this issue Dec 21, 2021

CRAN release #15

Closed

11 tasks

MaximilianPi mentioned this issue Jan 12, 2022

CRAN 1.0.1 #91

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regularization - parameter selection #83

Regularization - parameter selection #83

mtva0001 commented Dec 20, 2021

MaximilianPi commented Dec 21, 2021

mtva0001 commented Dec 21, 2021 •

edited

Loading

MaximilianPi commented Dec 21, 2021

mtva0001 commented Dec 21, 2021

MaximilianPi commented Dec 21, 2021

florianhartig commented Dec 21, 2021

Regularization - parameter selection #83

Regularization - parameter selection #83

Comments

mtva0001 commented Dec 20, 2021

MaximilianPi commented Dec 21, 2021

mtva0001 commented Dec 21, 2021 • edited Loading

MaximilianPi commented Dec 21, 2021

mtva0001 commented Dec 21, 2021

MaximilianPi commented Dec 21, 2021

florianhartig commented Dec 21, 2021

mtva0001 commented Dec 21, 2021 •

edited

Loading