diff --git a/README.md b/README.md index c9d4dda9..ff4e4e60 100644 --- a/README.md +++ b/README.md @@ -54,9 +54,9 @@ mamba install -c conda-forge r-mikropml ### Dependencies -- Imports: caret, dplyr, e1071, glmnet, kernlab, MLmetrics, + - Imports: caret, dplyr, e1071, glmnet, kernlab, MLmetrics, randomForest, rlang, rpart, stats, utils, xgboost -- Suggests: doFuture, foreach, future, future.apply, ggplot2, knitr, + - Suggests: doFuture, foreach, future, future.apply, ggplot2, knitr, progress, progressr, purrr, rmarkdown, testthat, tidyr ## Usage @@ -107,29 +107,35 @@ license](https://creativecommons.org/licenses/by/4.0/). To cite mikropml in publications, use: +> +> >
+> > Topçuoğlu BD, Lapp Z, Sovacool KL, Snitkin E, Wiens J, Schloss PD > (2021). “mikropml: User-Friendly R Package for Supervised Machine > Learning Pipelines.” Journal of Open Source Software, > 6(61), 3073. > doi:10.21105/joss.03073, > https://joss.theoj.org/papers/10.21105/joss.03073. +> >
A BibTeX entry for LaTeX users is: - @Article{, - title = {{mikropml}: User-Friendly R Package for Supervised Machine Learning Pipelines}, - author = {Begüm D. Topçuoğlu and Zena Lapp and Kelly L. Sovacool and Evan Snitkin and Jenna Wiens and Patrick D. Schloss}, - journal = {Journal of Open Source Software}, - year = {2021}, - month = {May}, - volume = {6}, - number = {61}, - pages = {3073}, - doi = {10.21105/joss.03073}, - url = {https://joss.theoj.org/papers/10.21105/joss.03073}, - } +``` + @Article{, + title = {{mikropml}: User-Friendly R Package for Supervised Machine Learning Pipelines}, + author = {Begüm D. Topçuoğlu and Zena Lapp and Kelly L. Sovacool and Evan Snitkin and Jenna Wiens and Patrick D. Schloss}, + journal = {Journal of Open Source Software}, + year = {2021}, + month = {May}, + volume = {6}, + number = {61}, + pages = {3073}, + doi = {10.21105/joss.03073}, + url = {https://joss.theoj.org/papers/10.21105/joss.03073}, +} +``` ## Why the name? @@ -138,4 +144,4 @@ This package was originally implemented as a machine learning pipeline for microbiome-based classification problems (see [Topçuoğlu *et al.* 2020](https://doi.org/10.1128/mBio.00434-20)). We realized that these methods are applicable in many other fields too, but stuck with the name -because we like it! +because we like it\! diff --git a/docs/dev/CODE_OF_CONDUCT.html b/docs/dev/CODE_OF_CONDUCT.html index 388aa4f6..0428e050 100644 --- a/docs/dev/CODE_OF_CONDUCT.html +++ b/docs/dev/CODE_OF_CONDUCT.html @@ -1,5 +1,5 @@ -Site built with pkgdown 2.0.5.
+Site built with pkgdown 2.0.6.
diff --git a/docs/dev/CONTRIBUTING.html b/docs/dev/CONTRIBUTING.html index 76ecfd5f..9e234d66 100644 --- a/docs/dev/CONTRIBUTING.html +++ b/docs/dev/CONTRIBUTING.html @@ -1,5 +1,5 @@ -Site built with pkgdown 2.0.5.
+Site built with pkgdown 2.0.6.
diff --git a/docs/dev/articles/index.html b/docs/dev/articles/index.html index 39f12551..12070d3a 100644 --- a/docs/dev/articles/index.html +++ b/docs/dev/articles/index.html @@ -1,5 +1,5 @@ -run_ml()
results$performance
#> # A tibble: 1 × 17
-#> cv_metric_AUC logLoss AUC prAUC Accuracy Kappa F1 Sensitivity Specificity
-#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 0.622 0.684 0.647 0.606 0.590 0.179 0.6 0.6 0.579
-#> # … with 8 more variables: Pos_Pred_Value <dbl>, Neg_Pred_Value <dbl>,
-#> # Precision <dbl>, Recall <dbl>, Detection_Rate <dbl>,
-#> # Balanced_Accuracy <dbl>, method <chr>, seed <dbl>
+#> cv_metric_AUC logLoss AUC prAUC Accuracy Kappa F1 Sensi…¹ Speci…² Pos_P…³
+#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+#> 1 0.622 0.684 0.647 0.606 0.590 0.179 0.6 0.6 0.579 0.6
+#> # … with 7 more variables: Neg_Pred_Value <dbl>, Precision <dbl>, Recall <dbl>,
+#> # Detection_Rate <dbl>, Balanced_Accuracy <dbl>, method <chr>, seed <dbl>,
+#> # and abbreviated variable names ¹Sensitivity, ²Specificity, ³Pos_Pred_Value
When using logistic regression for binary classification, area under the receiver-operator characteristic curve (AUC) is a useful metric to evaluate model performance. Because of that, it’s the default that we use for mikropml
. However, it is crucial to evaluate your model performance using multiple metrics. Below you can find more information about other performance metrics and how to use them in our package.
cv_metric_AUC
is the AUC for the cross-validation folds for the training data. This gives us a sense of how well the model performs on the training data.
Most of the other columns are performance metrics for the test data — the data that wasn’t used to build the model. Here, you can see that the AUC for the test data is not much above 0.5, suggesting that this model does not predict much better than chance, and that the model is overfit because the cross-validation AUC (cv_metric_AUC
, measured during training) is much higher than the testing AUC. This isn’t too surprising since we’re using so few features with this example dataset, so don’t be discouraged. The default option also provides a number of other performance metrics that you might be interested in, including area under the precision-recall curve (prAUC).
results_pr$performance
#> # A tibble: 1 × 17
-#> cv_metric_prAUC logLoss AUC prAUC Accuracy Kappa F1 Sensitivity
-#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-#> 1 0.577 0.691 0.663 0.605 0.538 0.0539 0.690 1
-#> # … with 9 more variables: Specificity <dbl>, Pos_Pred_Value <dbl>,
-#> # Neg_Pred_Value <dbl>, Precision <dbl>, Recall <dbl>, Detection_Rate <dbl>,
-#> # Balanced_Accuracy <dbl>, method <chr>, seed <dbl>
+#> cv_metric_p…¹ logLoss AUC prAUC Accur…² Kappa F1 Sensi…³ Speci…⁴ Pos_P…⁵
+#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+#> 1 0.577 0.691 0.663 0.605 0.538 0.0539 0.690 1 0.0526 0.526
+#> # … with 7 more variables: Neg_Pred_Value <dbl>, Precision <dbl>, Recall <dbl>,
+#> # Detection_Rate <dbl>, Balanced_Accuracy <dbl>, method <chr>, seed <dbl>,
+#> # and abbreviated variable names ¹cv_metric_prAUC, ²Accuracy, ³Sensitivity,
+#> # ⁴Specificity, ⁵Pos_Pred_Value
results_multi$performance
#> # A tibble: 1 × 17
-#> cv_metric_logLoss logLoss AUC prAUC Accuracy Kappa Mean_F1 Mean_Sensitivity
-#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
-#> 1 1.07 1.11 0.506 0.353 0.382 0.0449 NA 0.360
-#> # … with 9 more variables: Mean_Specificity <dbl>, Mean_Pos_Pred_Value <chr>,
-#> # Mean_Neg_Pred_Value <dbl>, Mean_Precision <chr>, Mean_Recall <dbl>,
-#> # Mean_Detection_Rate <dbl>, Mean_Balanced_Accuracy <dbl>, method <chr>,
-#> # seed <dbl>
Site built with pkgdown 2.0.5.
+Site built with pkgdown 2.0.6.
Site built with pkgdown 2.0.5.
+Site built with pkgdown 2.0.6.
diff --git a/docs/dev/articles/parallel.html b/docs/dev/articles/parallel.html index 7440baf8..fcc33857 100644 --- a/docs/dev/articles/parallel.html +++ b/docs/dev/articles/parallel.html @@ -14,8 +14,8 @@ - - + + @@ -276,7 +276,7 @@Site built with pkgdown 2.0.5.
+Site built with pkgdown 2.0.6.
diff --git a/docs/dev/articles/preprocess.html b/docs/dev/articles/preprocess.html index da7e7a56..b8a91be7 100644 --- a/docs/dev/articles/preprocess.html +++ b/docs/dev/articles/preprocess.html @@ -14,8 +14,8 @@ - - + + @@ -680,7 +680,7 @@