Skip to content

Commit

Permalink
version 0.4-1
Browse files Browse the repository at this point in the history
  • Loading branch information
Walter K Kremers authored and cran-robot committed Jan 9, 2024
1 parent 5c35411 commit 5828439
Show file tree
Hide file tree
Showing 53 changed files with 3,116 additions and 2,336 deletions.
19 changes: 10 additions & 9 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
Title: Nested Cross Validation for the Relaxed Lasso and Other Machine
Learning Models
Package: glmnetr
Version: 0.3-1
Date: 2023-07-28
Version: 0.4-1
Date: 2023-12-30
Depends: R (>= 3.4.0)
Suggests: R.rsp
VignetteBuilder: R.rsp
Imports: glmnet, survival, Matrix, rpart, xgboost, smoof, mlrMBO,
ParamHelpers, torch
Imports: glmnet, survival, Matrix, xgboost, smoof, mlrMBO,
ParamHelpers, randomForestSRC, rpart, torch
ByteCompile: Yes
Authors@R: c(person(c("Walter", "K"), "Kremers",
role=c("aut", "cre"),
Expand All @@ -18,14 +18,15 @@ Author: Walter K Kremers [aut, cre] (<https://orcid.org/0000-0001-5714-3473>),
Nicholas B Larson [ctb]
Maintainer: Walter K Kremers <kremers.walter@mayo.edu>
Description:
Cross validation informed Relaxed LASSO, Artificial Neural Network (ANN), gradient boosting machine ('xgboost'), Recursive Partitioning ('RPART') or step wise regression models are fit. It fits all these model as extensions of linear, logistic and Cox regression models. The package can fit all these models in a single call, and performs nested cross validation allowing the user to evaluate and compare the performances of these different models. The package fits these models using other packages including 'glmnet', 'survival', 'xgboost', 'rpart' and 'torch'. For the relaxed lasso models 'glmnetr' uses 'stat' and 'survival' to obtain stable model fits, and obtain these often more quickly. This too might be achieved using the 'path=TRUE' option in 'glmnet'.
While the package fits nested cross validation for the lasso and other models, it does not fit the general elastic net model. If you are fitting not a relaxed lasso model but an elastic-net model, then the R-packages 'nestedcv' <https://cran.r-project.org/package=nestedcv>, 'glmnetSE' <https://cran.r-project.org/package=glmnetSE> or others may provide greater functionality when performing a nested CV.
As with the 'glmnet' package, this package passes most relevant information to the output object which can be evaluated using plot, summary() and predict() functions. The 'glmnetr' package has some features and functionality that we find useful, but omits some of the functionality of 'glmnet' as well. Use of the 'glmnetr' package has many similarities to the 'glmnet' package and it is recommended that the user of 'glmnetr' first become familiar with the 'glmnet' package <https://cran.r-project.org/package=glmnet>, with the "An Introduction to glmnet" and "The Relaxed Lasso" being especially helpful in this regard.
Cross validation informed Relaxed LASSO, Artificial Neural Network (ANN), gradient boosting machine ('xgboost'), Random Forest ('RandomForestSRC'), Recursive Partitioning ('RPART') or step wise regression models are fit. Nested cross validation to estimate and compare performances between these models is also performed.
For some datasets, for example when the design matrix is not of full rank, 'glmnet' may have very long run times when fitting the relaxed lasso model, from our experience when fitting Cox models on data with many predictors and many patients, making it difficult to get solutions from either glmnet() or cv.glmnet(). This may be remedied with the 'path=TRUE' options when calling cv.glmnet(). This option is not described in the 'glmnet' Reference Manual but is described in the 'glmnet' "The Relaxed Lasso" vignette. In this package, 'glmnetr', we provide a similar workaround and solve for the non penalized relaxed model where gamma=0 for model structures analogue to 'R' functions like glm() or coxph() of the 'survival' package. If you are not fitting relaxed lasso models, or if you are able to get convergence using 'glmnet', then the glmnetr() and cv.glmnetr() functions may not be of much benefit to you. Note, while this package may allow one to fit relaxed lasso models that have difficulties converging using 'glmnet', and provides some different functionality beyond that of cv.glmnet(), it does not afford the some of the versatility of 'glmnet'.
When fitting not a relaxed lasso model but an elastic-net model, then the R-packages 'nestedcv' <https://cran.r-project.org/package=nestedcv>, 'glmnetSE' <https://cran.r-project.org/package=glmnetSE> or others may provide greater functionality when performing a nested CV.
As with the 'glmnet' package, this package passes most relevant output to the output object and tabular and graphical summaries can be generated using the summary and plot functions. Use of the 'glmnetr' has many similarities to the 'glmnet' package and it is recommended that the user of 'glmnetr' first become familiar with the 'glmnet' package <https://cran.r-project.org/package=glmnet>, with the "An Introduction to 'glmnet'" and "The Relaxed Lasso" being especially helpful in this regard.
License: GPL-3
NeedsCompilation: no
Copyright: Mayo Foundation for Medical Education and Research
RoxygenNote: 7.2.3
Encoding: UTF-8
Packaged: 2023-08-10 02:08:50 UTC; kremers
Packaged: 2024-01-08 17:16:59 UTC; kremers
Repository: CRAN
Date/Publication: 2023-08-10 06:50:02 UTC
Date/Publication: 2024-01-09 00:40:12 UTC
82 changes: 43 additions & 39 deletions MD5
Original file line number Diff line number Diff line change
@@ -1,70 +1,74 @@
6d5ce007c9d25892ad605ea459e7d93b *DESCRIPTION
26e1bb651b11630aee056146a957580e *NAMESPACE
0705e434d9f33aa740fb707c3c3368b0 *DESCRIPTION
7752512e691251a3bccd0e207ab5d965 *NAMESPACE
8be455ef5a118f2940aaaa486e7d36a9 *R/RandomForest_231230.R
ad6ff0bd32b7abb263e529e8e07b34a0 *R/aicreg_230218.R
ad2014970433a9178f4bd435d57155a9 *R/ann_tab_cv_230724.R
2916876a5349d2996468f1cd51a5abbe *R/ann_tab_cv_231230.R
729659f3b23c47e90076141f1ff07885 *R/cv.glmnetr_230508.R
934f013fbad6f6348ae5a6e9d1abdc6b *R/cv.stepreg_230218.R
29482baf8fb736187c0ab5d828e09392 *R/globalVariables_230509.R
25f44abd6d7d0901ea230579a4f90e5b *R/nested.glmnetr_230724.R
15ab27c0f9eba24a1fb619afaa27edae *R/plot.cv.glmnetr_230508.R
c97abb495eb9d88743322b512a6effe1 *R/nested.glmnetr_231230.R
4cc69f359aea3f9f88bf51f3c97c437a *R/plot.cv.glmnetr_231226.R
ee241bf54e9bb2c9d45a479ebbfaef84 *R/predict_ann_tab_230714.R
cc76a7a09e49c075950500d56dd0f71d *R/stepreg_230218.R
263bdaf7c4eb37627f89eb35a8a2b225 *R/summary.cv.glmnetr_230508.R
c326045ba5e1d7dcc0c3a3dac26c8ada *R/summary.nested.glmnetr_230722.R
4e142ce055e3074be344f4220e00ea87 *R/summary.cv.glmnetr_230508.R
aa822799b678fa8d3e985a6ef4a7101d *R/summary.nested.glmnetr_231230.R
7ea6974f14ed567fa99c534b644458d8 *R/xgbm_tuned_230606.R
11b89efc96b54a86c2e5927310e0af54 *build/vignette.rds
90f5b3ec2e7dbec57a61a751d5d477ec *inst/doc/Using_ann_tab_cv_230728.pdf
fce89333e96bf7ec320fba73a50576da *inst/doc/Using_ann_tab_cv_230728.pdf.asis
bf7fe55db7c5f5ec41f31e9cd2910baf *inst/doc/Using_glmnetr_230728.pdf
21f21b35db3650c28c96a2a3d81e6229 *inst/doc/Using_glmnetr_230728.pdf.asis
584f11ecf28d460302093674a6da9c65 *inst/doc/Using_stepreg_230728.pdf
11d2a2dbc54b3dc80a7e6389c7d5a9eb *inst/doc/Using_stepreg_230728.pdf.asis
1fc1d4976d1d089cdef93fe225e5ac76 *build/vignette.rds
18a01245b58410d00bb42f45d0a7ce25 *inst/doc/An_Overview_of_glmnetr_231230.pdf
03ceb0a193e7e47b433efe6fbbe10eeb *inst/doc/An_Overview_of_glmnetr_231230.pdf.asis
d002647ffd957d5686a7cccfca7d8f1e *inst/doc/Using_ann_tab_cv_231230.pdf
fce89333e96bf7ec320fba73a50576da *inst/doc/Using_ann_tab_cv_231230.pdf.asis
fd700b47d6bede905c71879902b1c8e5 *inst/doc/Using_stepreg_231230.pdf
11d2a2dbc54b3dc80a7e6389c7d5a9eb *inst/doc/Using_stepreg_231230.pdf.asis
a9f5d151d25f4588549f8d0f395827f5 *man/aicreg.Rd
a818d7ba8e543d332dcfe4be3d146c60 *man/ann_tab_cv.Rd
f7b85a7f4397687d3028a7aa1664e166 *man/ann_tab_cv_best.Rd
431c8e8a94327905fee2c13927c22242 *man/ann_tab_cv.Rd
16df70bb8936ea69dde22108bb8c1fa0 *man/ann_tab_cv_best.Rd
8920b7dad7b3771b3bb41999c622e344 *man/best.preds.Rd
afcbae93cc694a31308b53a5b19270e6 *man/bsint.Rd
7260337c06093bd343efe9eb5462454d *man/calceloss.Rd
379b2a77d292446a0ff19bc961bf044d *man/bsint.Rd
090ed1cdf3bca816a12d40ab69423f3a *man/calceloss.Rd
b2251e244cd539b217e89e0e37bd94ec *man/cox.sat.dev.Rd
b949f1758fb88f707e4ee913b76ca2f0 *man/cv.glmnetr.Rd
c16f161fa1c8a0d0b3fc07f905c4fba6 *man/cv.stepreg.Rd
b157d2d22fa467f67229b13850640941 *man/diff_time.Rd
1e416c07e1fe26b7d834c2f9ccb03f8a *man/diff_time1.Rd
6791d285f4b9612b691b063d52f3447c *man/dtstndrz.Rd
85cfb6769c87d7807dd35f0c7afcff71 *man/factor.foldid.Rd
5483af7c23eea9ef8473de502b9e3e95 *man/dtstndrz.Rd
678838674575273df61d552fe81a089b *man/factor.foldid.Rd
2369ac69cb168b7768dc26985e0b8b9a *man/getlamgam.Rd
75bf020290ee4a7864ecdeee723bf5cd *man/glmnetr.Rd
ed7f749530e0386b6bde2e2bc42f7f99 *man/glmnetr.compcv.Rd
50e718da02493c28899e99ed7a029147 *man/glmnetr.compcv0.Rd
c81d68de05ecbf34879b336e8fc83046 *man/glmnetr.compcv.Rd
1ee5ff0ccfb908ca2450fd81b9ef49d6 *man/glmnetr.compcv0.Rd
b5fb573f46c4a8157fa818e0aea348d8 *man/glmnetr.foldid.Rd
330ad6aefe7aeecccf19f52099324b54 *man/glmnetr.simdata.Rd
06a7161510bf8a859a2d5ca571e77058 *man/glmnetr_devratio.Rd
7912b3bde0d92b502d2383a2330ae614 *man/glmnetrll_1fold.Rd
483c406c189a24fe57bf5ae0764c2c60 *man/nested.glmnetr.Rd
f6d6dd97c6258aa3ad26142b4e198bc2 *man/plot.cv.glmnetr.Rd
d870475b0cbd2ca2263291d6e69fde1b *man/plot.glmnetr.Rd
d53d8e170ce58c67d22ddc5281efb4b5 *man/plot.nested.glmnetr.Rd
22f6641d028a179fde598ce098c17f8a *man/nested.glmnetr.Rd
27b099037fe4b546baf8eb1ae6f27310 *man/plot.cv.glmnetr.Rd
e68ff33ce8d5256e4677d4f307affde2 *man/plot.glmnetr.Rd
f6d2e499399b3eba67e4a23238054b94 *man/plot.nested.glmnetr.Rd
5b42feaf4e6b3b07f14f5f78885b5f95 *man/predict.cv.glmnetr.Rd
225aa35f3daf529ab6f92b6e266f4ea2 *man/predict.cv.stepreg.Rd
10e522d22448b92b3c0e3e4dc09938f1 *man/predict.glmnetr.Rd
b5f8c9f26137558e9944a486c62f2864 *man/predict.nested.glmnetr.Rd
afd36dbe589b334821283a5e653c7ec8 *man/predict.nested.glmnetr.Rd
542956e9b4af26a220e57ed295868630 *man/predict_ann_tab.Rd
ba094e6f7daf465bd90f69bc1d9e55ad *man/prednn_tl.Rd
56f22f5dd2fc238c351eb0a8331a7832 *man/prednn_tl.Rd
72074968b83c105982fac03dc906a144 *man/preds_1.Rd
ab521ddb80e83446efed027acb59eadb *man/print.nested.glmnetr.Rd
910eed30168c303a1aecb2f691b9ee5f *man/print.nested.glmnetr.Rd
a13364db342b33245ca827e3422e31b9 *man/print.rf_tune.Rd
81b4bf043ea24091e0ad76cdeff4759f *man/rf_tune.Rd
a93fc5b9b30430381989b56da754a79f *man/stepreg.Rd
965acc544becf6bca8eae2918e8fe967 *man/summary.cv.glmnetr.Rd
b11215f26e442e7ff025ef0ad97487a6 *man/summary.cv.stepreg.Rd
7e30d302ad79160b64fe884de0f076eb *man/summary.nested.glmnetr.Rd
a849f2c91b1d77351e25f7e309b24819 *man/summary.nested.glmnetr.Rd
b3daa10f0c013eab517f7c68fe3bcccd *man/summary.rf_tune.Rd
6fd3a6565e5562f822b88c765756d0b2 *man/summary.stepreg.Rd
2a624034a1b8c0ec0f9de19b2a5d86d3 *man/wtlast.Rd
06b1ed8902abe7e7ac1c9fdd7f3c48f6 *man/wtmiddle.Rd
5a66f618f77aff93b6680ec862e66aa7 *man/wtzero.Rd
b1938d64a3b8c6e35804db087adf33c3 *man/wtlast.Rd
4d354079990e3f3bbd78a4f3a7441f57 *man/wtmiddle.Rd
05db74cd06d0f0c21abe8c8ab89e1a70 *man/wtzero.Rd
df3a72008a5fccd56bf4171229b9977f *man/xgb.simple.Rd
46f326d84ba28126b6b912db62d4c9ce *man/xgb.tuned.Rd
635c0c9376b516054c666ee1c6cf7373 *vignettes/Using_ann_tab_cv_230728.Rmd
fce89333e96bf7ec320fba73a50576da *vignettes/Using_ann_tab_cv_230728.pdf.asis
993571edd8eb96e4f002c3cda9e62fd3 *vignettes/Using_glmnetr_230728.Rmd
21f21b35db3650c28c96a2a3d81e6229 *vignettes/Using_glmnetr_230728.pdf.asis
1d7a96a7df3ec5ee04bfb4448afeda1b *vignettes/Using_stepreg_230728.Rmd
11d2a2dbc54b3dc80a7e6389c7d5a9eb *vignettes/Using_stepreg_230728.pdf.asis
2b2d6c660beac1aa0184cad180eff34a *vignettes/An_Overview_of_glmnetr_231230.Rmd
03ceb0a193e7e47b433efe6fbbe10eeb *vignettes/An_Overview_of_glmnetr_231230.pdf.asis
7835844e6366d3d192815a9e1f87e4e1 *vignettes/Using_ann_tab_cv_231230.Rmd
fce89333e96bf7ec320fba73a50576da *vignettes/Using_ann_tab_cv_231230.pdf.asis
35fcdb1b133191c24a433403b9b14a31 *vignettes/Using_stepreg_231230.Rmd
11d2a2dbc54b3dc80a7e6389c7d5a9eb *vignettes/Using_stepreg_231230.pdf.asis
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@ S3method(predict,cv.stepreg)
S3method(predict,glmnetr)
S3method(predict,nested.glmnetr)
S3method(print,nested.glmnetr)
S3method(print,rf_tune)
S3method(summary,cv.glmnetr)
S3method(summary,cv.stepreg)
S3method(summary,nested.glmnetr)
S3method(summary,rf_tune)
S3method(summary,stepreg)
export(aicreg)
export(ann_tab_cv)
Expand All @@ -31,6 +33,7 @@ export(glmnetr.simdata)
export(nested.glmnetr)
export(predict_ann_tab)
export(prednn_tl)
export(rf_tune)
export(stepreg)
export(wtlast)
export(wtmiddle)
Expand All @@ -48,6 +51,7 @@ importFrom(graphics,lines)
importFrom(mlrMBO,makeMBOControl)
importFrom(mlrMBO,mbo)
importFrom(mlrMBO,setMBOControlTermination)
importFrom(randomForestSRC,rfsrc)
importFrom(rpart,prune)
importFrom(rpart,rpart)
importFrom(smoof,makeSingleObjectiveFunction)
Expand Down
200 changes: 200 additions & 0 deletions R/RandomForest_231230.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
################################################################################

#' Fit a Random Forest model on data provided in matrix and vector formats.
#'
#' @description Fit an Random Forest model using the rfsrc() function of the
#' randomForestSRC package.
#'
#' @param xs predictor input - an n by p matrix, where n (rows) is sample size, and p (columns)
#' the number of predictors. Must be in matrix form for complete data, no NA's, no Inf's, etc.,
#' and not a data frame.
#' @param start an optional vector of start times in case of a Cox model. Class numeric of length same as number of patients (n)
#' @param y_ dependent variable as a vector: time, or stop time for Cox model, Y_ 0 or 1 for binomial (logistic), numeric for gaussian.
#' Must be a vector of length same as number of sample size.
#' @param event event indicator, 1 for event, 0 for census, Cox model only.
#' Must be a numeric vector of length same as sample size.
#' @param family model family, "cox", "binomial" or "gaussian" (default)
#' @param mtryc a vector (numeric) of values to search over for optimization of the
#' Random Forest fit. This if for the mtry input variable of the rfsrc() program
#' specifying the number of terms to consider in each step of teh Random Forest fit.
#' @param ntreec a vector (numeric) of 2 values, the first for the number of forests
#' (ntree from rfsrc()) to use when searhcing for a better bit and the second to use
#' when fitting the final model. More trees should give a better fit but
#' require more computations and storage for the final.
#' model.
#' @param keep 1 to keep the model fits used to select the value for mtry, or 0
#' (default) to not keep these initial model fits.
#' @param track 1 to output a brief summary of the final selected model, 2 to
#' output a brief summary on each model fit in search of a better model or 0
#' (default) to not output this information.
#'
#' @return a Random Forest model fit
#'
#' @importFrom randomForestSRC rfsrc
#'
#' @export
#'
rf_tune = function(xs, start=NULL, y_, event=NULL, family=NULL, mtryc=NULL, ntreec=NULL, keep=0, track=0) {

if (is.null(family) == 1) {
if (!is.null(event) == 1) {
family = "cox"
} else if ( (length(table(y_)) == 2) & (names(table(y_))[1] == 0) & (names(table(y_))[2] == 1) ) { family = "binomial"
} else { family = "gaussian"}
print(family)
}

if (track >= 1) {
time_start = diff_time()
time_split = time_start
}

if (family == "cox") {
df = as.data.frame(cbind(xs, y_=y_, event = event))
} else {
df = as.data.frame(cbind(xs, y_))
names(df)
}
# df = as.data.frame(cbind(xs, y_=y_))
# dim(df)

# print(mtryc)
# print (family)

if (is.null(mtryc)) {
if (family %in% c("cox","binomial")) {
mtryc = round( sqrt(dim(xs)[2]) * c(0.67 , 1, 1.5, 2.25, 3.375, 5.0625) )
mtryc = mtryc [mtryc < dim(xs)[2]]
#mtryc = mtryc [mtryc < 1]
if (length(mtryc) == 0) { mtryc = 1 }
} else {
mtryc = round( dim(xs)[2]*c(0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.6) )
# mytryc = round( dim(xs)[2]*c(0.33, 0.5, 0.67) )
}
}

# print(mtryc)

if (is.null(ntreec)) {
ntreec = c(10, 50)
if (family == "gaussian") { ntreec = c(50, 500)
} else { ntreec = c(25, 250) }
}

##------------------------------------------------------------------------------

rfs = list()

##mtryc = mtryc[1:3]
## k_ = 1

# rfsrc does not analyze (start,stop) time survival data
if ((family == "cox") & (!is.null(start)==1)) { skip = 1 } else { skip = 0 }

if (skip == 0) {

for ( k_ in c(1:length(mtryc))) {
if (family == "cox") {
rf = rfsrc( Surv(y_, event) ~ . , data=df, mtry=mtryc[k_], nsplit=8,
ntree=ntreec[1], membership = TRUE, importance=TRUE)
} else {
rf = rfsrc( y_ ~ . , data=df, mtry=mtryc[k_], nsplit=8,
ntree=ntreec[1], membership = TRUE, importance=TRUE)
}

rfs[[paste0("m.", k_)]] <- rf

if (k_ ==1) {
k_best = 1
mtry_best = mtryc[k_]
err.rate = rf$err.rate[length(rf$err.rate)] ## should be OOB error rate
err.rate.best = err.rate
err.ratev = err.rate
rf_best = rf
} else if (rf$err.rate[length(rf$err.rate)] < err.rate.best) {
k_best = k_
mtry_best = mtryc[k_]
err.rate.best = rf$err.rate[length(rf$err.rate)]
rf_best = rf
}

if (k_ > 1) { err.ratev = c(err.ratev, rf$err.rate[length(rf$err.rate)]) }

if (track >= 2) {
print( c(k_, mtryc[k_], mtry_best, rf$err.rate[length(rf$err.rate)] ) )
time_split = diff_time(time_start, time_split)
}
}

if (ntreec[2] > ntreec[1]) {
if (family == "cox") {
rf_tuned = rfsrc( Surv(y_, event) ~ . , data=df, mtry=mtryc[k_best], nsplit=8,
ntree=ntreec[2], membership = TRUE, importance=TRUE)
} else {
rf_tuned = rfsrc( y_ ~ . , data=df, mtry=mtryc[k_best], nsplit=8,
ntree=ntreec[2], membership = TRUE, importance=TRUE)
}
} else {
rf_tuned = rf_best
}

if (track >= 1) {
print( c(mtry_best, rf_tuned$err.rate[length(rf_tuned$err.rate)] ) )
time_split = diff_time(time_start, time_split)
}

if (keep == 1) {
rffit = list(rf_tuned=rf_tuned, rfs=rfs, err.ratev=err.ratev, err.rate=rf_tuned$err.rate[length(rf_tuned$err.rate)],
mtryc=mtryc, ntreec=ntreec )
} else {
rffit = list(rf_tuned=rf_tuned, err.ratev=err.ratev, rf_tuned$err.rate[length(rf_tuned$err.rate)],
mtryc=mtryc, ntreec=ntreec )
}

} else { rffit = list(rffit="NONE") }

class(rffit) <- c("rf_tune")

return(rffit)

}

################################################################################
#'
#' Summarize output from rf_tune() function
#'
#' @param object output from an rf_tune() function
#' @param ... optional pass through parameters to pass to summary.rfsrc()
#'
#' @return summary to console
#'
#' @export
#'
summary.rf_tune = function(object, ...) {
cat(paste0("\n search set for mtry and ntree : \n\n"))
print(object$mtryc)
print(object$ntreec)
cat(paste0("\n selected vale for for mtry from search: ", object$rf_tuned$mtry, "\n\n"))
rf_tuned = object$rf_tuned
summary(rf_tuned)
}

################################################################################
#'
#' Print output from rf_tune() function
#'
#' @param x output from an rf_tune() function
#' @param ... optional pass through parameters to pass to print.rfsrc()
#'
#' @return summary to console
#'
#' @export
#'
#'
print.rf_tune = function(x, ...) {
rf_tuned = x$rf_tuned
print( rf_tuned )
}

################################################################################

0 comments on commit 5828439

Please sign in to comment.