diff --git a/DESCRIPTION b/DESCRIPTION index fe7f99ad..05651fbc 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: mikropml Title: User-Friendly R Package for Supervised Machine Learning Pipelines -Version: 1.5.0 +Version: 1.5.0.9000 Date: 2023-01-15 Authors@R: c(person(given = "Begüm", diff --git a/NEWS.md b/NEWS.md index 7960bb77..333116f4 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,7 @@ +# mikropml development version + +- Minor documentation improvements (#323, @kelly-sovacool). + # mikropml 1.5.0 - New example showing how to plot feature importances in the `parallel` vignette (#310, @kelly-sovacool). diff --git a/R/run_ml.R b/R/run_ml.R index 2f9679a8..ffb23013 100644 --- a/R/run_ml.R +++ b/R/run_ml.R @@ -1,13 +1,15 @@ #' Run the machine learning pipeline #' -#' This function runs machine learning (ML), evaluates the best model, +#' This function splits the data set into a train & test set, +#' trains machine learning (ML) models using k-fold cross-validation, +#' evaluates the best model on the held-out test set, #' and optionally calculates feature importance using the framework #' outlined in Topçuoğlu _et al._ 2020 (\doi{10.1128/mBio.00434-20}). -#' Required inputs are a dataframe with an outcome variable and other columns -#' as features, as well as the ML method. +#' Required inputs are a data frame (must contain an outcome variable and all +#' other columns as features) and the ML method. #' See `vignette('introduction')` for more details. #' -#' @param dataset Dataframe with an outcome variable and other columns as features. +#' @param dataset Data frame with an outcome variable and other columns as features. #' @param method ML method. #' Options: `c("glmnet", "rf", "rpart2", "svmRadial", "xgbTree")`. #' - glmnet: linear, logistic, or multiclass regression @@ -73,13 +75,28 @@ #' #' - `trained_model`: Output of [caret::train()], including the best model. #' - `test_data`: Part of the data that was used for testing. -#' - `performance`: Dataframe of performance metrics. The first column is the cross-validation performance metric, and the last two columns are the ML method used and the seed (if one was set), respectively. All other columns are performance metrics calculated on the test data. This contains only one row, so you can easily combine performance dataframes from multiple calls to `run_ml()` (see `vignette("parallel")`). -#' - `feature_importance`: If feature importances were calculated, a dataframe where each row is a feature or correlated group. The columns are the performance metric of the permuted data, the difference between the true performance metric and the performance metric of the permuted data (true - permuted), the feature name, the ML method, the performance metric name, and the seed (if provided). For AUC and RMSE, the higher perf_metric_diff is, the more important that feature is for predicting the outcome. For log loss, the lower perf_metric_diff is, the more important that feature is for predicting the outcome. -#' +#' - `performance`: Data frame of performance metrics. The first column is the +#' cross-validation performance metric, and the last two columns are the ML +#' method used and the seed (if one was set), respectively. +#' All other columns are performance metrics calculated on the test data. +#' This contains only one row, so you can easily combine performance +#' data frames from multiple calls to `run_ml()` +#' (see `vignette("parallel")`). +#' - `feature_importance`: If feature importances were calculated, a data frame +#' where each row is a feature or correlated group. The columns are the +#' performance metric of the permuted data, the difference between the true +#' performance metric and the performance metric of the permuted data +#' (true - permuted), the feature name, the ML method, +#' the performance metric name, and the seed (if provided). +#' For AUC and RMSE, the higher perf_metric_diff is, the more important that +#' feature is for predicting the outcome. For log loss, the lower +#' perf_metric_diff is, the more important that feature is for +#' predicting the outcome. #' #' @section More details: #' -#' For more details, please see [the vignettes](http://www.schlosslab.org/mikropml/articles/). +#' For more details, please see +#' [the vignettes](http://www.schlosslab.org/mikropml/articles/). #' #' @export #' @author Begüm Topçuoğlu, \email{topcuoglu.begum@@gmail.com} diff --git a/docs/dev/CODE_OF_CONDUCT.html b/docs/dev/CODE_OF_CONDUCT.html index 84ac1f39..4fc94a4d 100644 --- a/docs/dev/CODE_OF_CONDUCT.html +++ b/docs/dev/CODE_OF_CONDUCT.html @@ -10,7 +10,7 @@ mikropml - 1.4.0.9000 + 1.5.0.9000