diff --git a/DESCRIPTION b/DESCRIPTION new file mode 100644 index 0000000..f45e08f --- /dev/null +++ b/DESCRIPTION @@ -0,0 +1,35 @@ +Package: iraceplot +Title: Plots for Visualizing the Data Produced by the 'irace' Package +Version: 1.0 +Authors@R: c(person("Manuel", "López-Ibáñez", role = c("aut", "cre"), + email = "manuel.lopez-ibanez@manchester.ac.uk", + comment = c(ORCID = "0000-0001-9974-1295")), + person("Pablo", "Oñate Marín", role = c("aut"), + email = "pablo.onate.m@gmail.com"), + person("Leslie", "Pérez Cáceres", role = c("aut"), + email = "leslie.perez@pucv.cl", + comment = c(ORCID = "0000-0001-5553-6150"))) +Description: Graphical visualization tools for analyzing the data produced by 'irace'. The 'iraceplot' package enables users to analyze the performance and the parameter space data sampled by the configuration during the search process. It provides a set of functions that generate different plots to visualize the configurations sampled during the execution of 'irace' and their performance. The functions just require the log file generated by 'irace' and, in some cases, they can be used with user-provided data. +License: MIT + file LICENSE +Encoding: UTF-8 +RoxygenNote: 7.1.2 +Depends: R (>= 3.4) +Imports: cli, dplyr, DT, forcats, ggforce, ggplot2 (>= 3.3.6), + gridExtra, irace (>= 3.5), knitr, matrixStats (>= 0.55), + plotly, rmarkdown, stats, tibble, tidyr, truncnorm, utils, + viridisLite, withr +Suggests: testthat (>= 3.0.0) +VignetteBuilder: knitr +URL: https://auto-optimization.github.io/iraceplot/, + https://github.com/auto-optimization/iraceplot/ +BugReports: https://github.com/auto-optimization/iraceplot/issues +Config/testthat/edition: 3 +NeedsCompilation: no +Packaged: 2022-12-19 12:02:51 UTC; manu +Author: Manuel López-Ibáñez [aut, cre] + (), + Pablo Oñate Marín [aut], + Leslie Pérez Cáceres [aut] () +Maintainer: Manuel López-Ibáñez +Repository: CRAN +Date/Publication: 2022-12-19 12:40:02 UTC diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..93ee65c --- /dev/null +++ b/LICENSE @@ -0,0 +1,2 @@ +YEAR: 2021 +COPYRIGHT HOLDER: Pablo Oñate diff --git a/MD5 b/MD5 new file mode 100644 index 0000000..52c7941 --- /dev/null +++ b/MD5 @@ -0,0 +1,79 @@ +412427b1666c31e8a3b159a3f4748790 *DESCRIPTION +9e77d4d22f8d545e7cea0b9b2b8871c6 *LICENSE +a93ad81425f6e30f5c40f33f54177ef7 *NAMESPACE +2357148da1118d7ed0f15debf97b9627 *NEWS.md +a1f6246bc8d60556b17211ef4e476b68 *R/boxplot_performance.R +e794c8e9aa54631747aa67f843032259 *R/boxplot_test.R +2cda92a27b310c38b895591e157bfee7 *R/boxplot_training.R +45cf5f005dfd9919a40ae7cf2bb64f17 *R/common.R +74feb56337fec146ea5c524e49fe996f *R/configurations_display.R +28b1feec23d0c2e4e4ea5b8fb90d22e4 *R/distance_config.R +93afe6879ca3eb3208423168b8fa2f4d *R/iraceplot-package.R +7ce3fb4babfddd14240e7eb95eb50efb *R/parallel_cat.R +019424532177984315011558e4660a12 *R/parallel_coord.R +334a272f2ab7e4aafb71c14aa783a8b8 *R/plot_experiments_matrix.R +af21a844ceadb2d6ec777902d1877a49 *R/plot_model.R +ba060503adce7c36c439474c21845746 *R/reexport.R +205f9b25a2d90855a48a1d7363915b17 *R/report.R +220f928dfa7a1f9eb79b46b088590d71 *R/sampling_distance.R +fba7adf718876b42a725ed7bbe44d7d2 *R/sampling_frequency.R +2abb940973a9c2c4efd47152e35c5a7f *R/sampling_frequency_iteration.R +d6c219c1a7375796abea0c90b274da47 *R/sampling_heatmap.R +a2cc5c911865304cd035d201ede31fd0 *R/sampling_pie.R +4a802bc8aa301a2d33364d6766d07212 *R/scatter_performance.R +79361f6e405a2e2dba79f525fb5301b0 *R/summarise_by_instance.R +ae1f13ceb7752ca34eb3bacd54ed04d8 *R/summarise_by_iteration.R +e09016c55c5410a4729158ea5232eb10 *README.md +09d36fee04b8bfdd90c0c18bb8860616 *build/vignette.rds +cb4ba4a7d30045032beb8ba0c040c3d6 *inst/doc/iraceplot_package.R +44f0c0a289d568f46b656092d92af7a7 *inst/doc/iraceplot_package.Rmd +ed1db2bdb532ffb397e0ddf2ccbc614a *inst/doc/iraceplot_package.html +cef997572e1419236fe7e76b490b7064 *inst/exdata/guide-example.Rdata +e0afc1f8387a63512fae725170765046 *inst/template/report_html.Rmd +9a74467b0eb3be9adc31586c300784d2 *man/boxplot_performance.Rd +1e87fd428167c3b4a1393c95509ef88d *man/boxplot_test.Rd +481b819953bd127702ca47b65180f11b *man/boxplot_training.Rd +b1a3e3fce9841c8da35716bf92c07d35 *man/configurations_display.Rd +de65cd97ae56553184131e7cd8f509fd *man/distance_config.Rd +31e3f7ada9da64a62c83768f1b226bfa *man/has_testing_data.Rd +39dea2c833642ad662517b5e8b42080d *man/iraceplot-package.Rd +0b85664a9d5c9ee1ec8284d7ec1ad987 *man/parallel_cat.Rd +b9b098015d2a4e03090213a90fdbe65e *man/parallel_coord.Rd +f12ee46eebd0d0cf4d9940896f4da5fa *man/parallel_coord2.Rd +546e3348dd8cd17cc81744b67315ece9 *man/plot_experiments_matrix.Rd +9c20df2ea3a1bddb2f1e70bb487ce9fa *man/plot_model.Rd +3bf4faec6f7a75a648b35baba764c5e7 *man/reexports.Rd +614c013039fd5eb0f9c99cc2fa16cc76 *man/report.Rd +faba461bd76069e65fc68adfd3139efc *man/sampling_distance.Rd +2ec05f52f98f0db0c6278567bb1e7ced *man/sampling_frequency.Rd +1468cb38b6f395b5aa41bac9879b715d *man/sampling_frequency_iteration.Rd +2b1f5ed1630f8527fe9cd1c3286d6600 *man/sampling_heatmap.Rd +1d14835bdbb7418ffa1d069a381709d8 *man/sampling_heatmap2.Rd +083922eea6e225deacbd73753d49f26e *man/sampling_pie.Rd +179af32c1c1ec6b28a3e5bd398ba6a5e *man/scatter_performance.Rd +7df23170107cc84383d93d59397bf449 *man/summarise_by_instance.Rd +af82b1603d40ab63e1f50fa61f5a664d *man/summarise_by_iteration.Rd +7bd5147032b4f2e933f475f5980ca0f1 *tests/testthat.R +65c2ec3221e2e07397c4551cd31c5bba *tests/testthat/bug32.Rdata +d04ad2842ddbc5890b845535a5c0180e *tests/testthat/setup.R +2516955a8acf6f4e095df2ab87bea1f8 *tests/testthat/teardown.R +3803d358ef7c575a682c016d9e189619 *tests/testthat/test-boxplot_test.R +f0e07f6f7a083923df3b99f514a5dfc7 *tests/testthat/test-boxplot_training.R +835475536d002cc8e556df7141915311 *tests/testthat/test-bug-32.R +8f3ab38969d65be4702acf5377433126 *tests/testthat/test-configurations_display.R +0fced15ad7bb36da64eb471b0675e583 *tests/testthat/test-distance_config.R +2a9d1d62dd123b12a948efdb92aa5c49 *tests/testthat/test-parallel_cat.R +700ce56ba793c0c7c196a99313e62675 *tests/testthat/test-parallel_coord.R +43df88638822acb4daef0a5a26dfdcda *tests/testthat/test-plot_experiments_matrix.R +4d9ed69d96cf7933bc55af70cd2c2ee9 *tests/testthat/test-plot_model.R +b5f7d6b4a10e57a98479259c0f661061 *tests/testthat/test-report.R +655b8d6a2bdc49b4660d32ca5a6a2958 *tests/testthat/test-sampling_distance.R +2c213bdc17e7bbf8f5e26b524066ea45 *tests/testthat/test-sampling_frequency.R +2186c009979e0df28ced12b624e42e8e *tests/testthat/test-sampling_frequency_iteration.R +cca5f9881437ae001214ac1249b21ae5 *tests/testthat/test-sampling_heatmap.R +2b060f506e27138a198515813e3df545 *tests/testthat/test-sampling_pie.R +10598d8208714c743c364d81069d87bd *tests/testthat/test-scatter_test.R +80daab69eaf078e2a8545c33eb3adf50 *tests/testthat/test-scatter_training.R +4a181b63192cc7e12c62d8cb6ec8b8e8 *vignettes/example/report_example.Rmd +44f0c0a289d568f46b656092d92af7a7 *vignettes/iraceplot_package.Rmd +a442b318eba918ee7567827becd61091 *vignettes/user_guide/guide.Rmd diff --git a/NAMESPACE b/NAMESPACE new file mode 100644 index 0000000..c4a56eb --- /dev/null +++ b/NAMESPACE @@ -0,0 +1,96 @@ +# Generated by roxygen2: do not edit by hand + +export(boxplot_performance) +export(boxplot_test) +export(boxplot_training) +export(configurations_display) +export(has_testing_data) +export(parallel_cat) +export(parallel_coord) +export(parallel_coord2) +export(plot_experiments_matrix) +export(plot_model) +export(read_logfile) +export(report) +export(sampling_distance) +export(sampling_frequency) +export(sampling_frequency_iteration) +export(sampling_heatmap) +export(sampling_heatmap2) +export(sampling_pie) +export(scatter_performance) +export(scatter_test) +export(scatter_training) +export(summarise_by_instance) +export(summarise_by_iteration) +import(irace) +import(stats) +import(tibble) +importFrom(DT,renderDataTable) +importFrom(cli,cli_abort) +importFrom(cli,cli_alert_info) +importFrom(cli,cli_alert_warning) +importFrom(cli,cli_inform) +importFrom(cli,cli_warn) +importFrom(dplyr,"%>%") +importFrom(dplyr,arrange) +importFrom(dplyr,count) +importFrom(dplyr,group_by) +importFrom(dplyr,mutate) +importFrom(dplyr,n_distinct) +importFrom(dplyr,select) +importFrom(dplyr,slice) +importFrom(dplyr,summarise) +importFrom(ggplot2,aes) +importFrom(ggplot2,after_stat) +importFrom(ggplot2,element_blank) +importFrom(ggplot2,element_rect) +importFrom(ggplot2,element_text) +importFrom(ggplot2,facet_grid) +importFrom(ggplot2,geom_abline) +importFrom(ggplot2,geom_bar) +importFrom(ggplot2,geom_blank) +importFrom(ggplot2,geom_boxplot) +importFrom(ggplot2,geom_density) +importFrom(ggplot2,geom_histogram) +importFrom(ggplot2,geom_jitter) +importFrom(ggplot2,geom_line) +importFrom(ggplot2,geom_point) +importFrom(ggplot2,geom_tile) +importFrom(ggplot2,geom_violin) +importFrom(ggplot2,ggplot) +importFrom(ggplot2,ggsave) +importFrom(ggplot2,ggtitle) +importFrom(ggplot2,guide_axis) +importFrom(ggplot2,guide_colourbar) +importFrom(ggplot2,guide_legend) +importFrom(ggplot2,guides) +importFrom(ggplot2,labs) +importFrom(ggplot2,position_jitter) +importFrom(ggplot2,rel) +importFrom(ggplot2,scale_alpha_manual) +importFrom(ggplot2,scale_color_hue) +importFrom(ggplot2,scale_color_manual) +importFrom(ggplot2,scale_color_viridis_c) +importFrom(ggplot2,scale_color_viridis_d) +importFrom(ggplot2,scale_fill_manual) +importFrom(ggplot2,scale_fill_viridis_c) +importFrom(ggplot2,scale_shape_manual) +importFrom(ggplot2,scale_size_manual) +importFrom(ggplot2,scale_x_continuous) +importFrom(ggplot2,scale_x_discrete) +importFrom(ggplot2,scale_y_continuous) +importFrom(ggplot2,scale_y_discrete) +importFrom(ggplot2,theme) +importFrom(ggplot2,theme_bw) +importFrom(ggplot2,vars) +importFrom(ggplot2,xlab) +importFrom(ggplot2,ylab) +importFrom(grDevices,dev.off) +importFrom(grDevices,nclass.Sturges) +importFrom(grDevices,pdf) +importFrom(grDevices,rainbow) +importFrom(gridExtra,grid.arrange) +importFrom(gridExtra,marrangeGrob) +importFrom(irace,read_logfile) +importFrom(knitr,knit) diff --git a/NEWS.md b/NEWS.md new file mode 100644 index 0000000..358bf8f --- /dev/null +++ b/NEWS.md @@ -0,0 +1,13 @@ +# iraceplot 1.0 + + * Implement all plots that were available in the `irace` package and a few + more. + + * First version in CRAN. + + + + + + + diff --git a/R/boxplot_performance.R b/R/boxplot_performance.R new file mode 100644 index 0000000..5e01cff --- /dev/null +++ b/R/boxplot_performance.R @@ -0,0 +1,234 @@ +#' Box Plot of the performance of a set of configurations +#' +#' Creates a box plot that displays the performance of a set of configurations +#' which can be displayed by iteration. +#' +#' The performance data is obtained from the experiment matrix provided in the +#' experiments argument. The configurations can be selected using the allElites +#' argument and this argument can be also used to define the iteration of each +#' elite configuration was evaluated. +#' +#' @param experiments +#' Experiment matrix obtained from irace training or testing data. Configurations +#' in columns and instances in rows. As in irace, column names (configurations ids) +#' should be characters. +#' +#' @param allElites +#' List or vector of configuration ids, (default NULL). These configurations +#' should be included in the plot. If the argument is not provided all configurations +#' in experiments are included. If allElites is a vector all configurations are +#' assumed without iteration unless argument `type="ibest"` is provided, in which case +#' each configuration is assumed to be from a different iteration. If `allElites` +#' is a list, each element of the list is assumed as an iteration. +#' +#' @param type +#' String, (default "all") possible values are "all" or "ibest". "all" +#' shows all the selected configurations showing iterations if the information +#' is provided. "ibest" shows the elite configurations of each iteration, note +#' that the best configuration is always assumed to be first in the vector of +#' each iteration. +#' +#' @param first_is_best +#' Boolean (default TRUE) Enables the display in a different color the best configuration +#' identified as the first one in a vector. If FALSE, all configurations are shown +#' in the same color. +#' +#' @template arg_rpd +#' +#' @template arg_show_points +#' +#' @param best_color +#' String, (default `"#08bfaa"`) color to display best configurations. +#' +#' @param x_lab +#' String, (default `"Configurations"`) label for the x axis. +#' +#' @param boxplot By default, display a violin plot ([ggplot2::geom_violin()]). +#' If `TRUE`, show a classical boxplot. +#' +#' @template arg_filename +#' @template arg_interactive +#' +#' @template ret_boxplot +#' +#' @seealso [boxplot_test()] [boxplot_training()] +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' boxplot_performance(iraceResults$experiments, iraceResults$allElites) +#' \donttest{ +#' boxplot_performance(iraceResults$testing$experiments, iraceResults$iterationElites) +#' } +#' @export +boxplot_performance <- function(experiments, allElites= NULL, type = c("all", "ibest"), + first_is_best = TRUE, rpd = TRUE, show_points=TRUE, + best_color = "#08bfaa", x_lab ="Configurations", boxplot = FALSE, + filename = NULL, interactive = base::interactive()) +{ + type <- match.arg(type) + if (!is.matrix(experiments) && !is.data.frame(experiments)) { + cli_abort("'{.field experiments}' must be a matrix or a data frame") + } + inst_ids <- rownames(experiments) + if (is.null(inst_ids)) inst_ids <- as.character(1:nrow(experiments)) + + if (type == "ibest" && !first_is_best) { + cli_alert_info("Note: The setting {.code 'type=ibest'} only supports {.code 'first_is_best=TRUE'}, ignoring this setting.\n") + first_is_best <- TRUE + } + + # Get the order of configurations + if (is.null(allElites)) { + cli_alert_info("Note: {.field allElites} not provided, assumming all configurations in experiments as elites.\n") + allElites <- list() + allElites[[1]] <- get_ranked_ids(experiments) + if (type == "ibest") { + cli_abort("The {.field type} argument provided ({type}) is not supported when no {.field allElites} value provided") + } + } else if (type == "ibest" && !is.list(allElites)) { + cli_alert_info("Note: Since {.code type=ibest}, assuming vector best configuration by iteration in {.field allElites}.\n") + allElites <- as.list(allElites) + } + + if (rpd) experiments <- calculate_rpd(experiments) + + # Generate iteration and final elite vector + iterationElites <- finalElites <- allElites + if (is.list(allElites)) + iterationElites <- unlist(lapply(allElites, function(x) x[1])) + + # Select experiments that will be used + if (type == "all") { + v_allElites <- as.character(unique(unlist(allElites))) + if (!all(v_allElites %in% colnames(experiments))) + stop("Missing elite data in experiments matrix:", paste0(setdiff(v_allElites, colnames(experiments)), collapse=", ")) + } else { + stopifnot(type == "ibest") + if (!all(iterationElites %in% colnames(experiments))) + stop("Missing iteration elites data in experiments matrix") + v_allElites <- as.character(iterationElites) + } + # FIXME: It doesn't make sense to rename experiments to data. Just use either + # of the names. + data <- as.data.frame(experiments[,v_allElites, drop=FALSE]) + names_col <- colnames(data) + # If we only have one row, then don't try to plot boxplots. + plot_points <- (nrow(data) == 1) + + # FIXME: This is too complicated and unclear. + reshape_data <- function(x, elites) { + out <- NULL + for (i in 1:length(elites)) { + d <- x[, as.character(elites[[i]]), drop=FALSE] + d <- reshape(d, + varying = as.vector(colnames(d)), + v.names = "performance", + timevar = "ids", + times = as.vector(colnames(d)), + new.row.names = 1:(nrow(d) * ncol(d)), + direction = "long") + d <- d[!is.na(d[,"performance"]),] + d[,"iteration"] <- i + out <- rbind(out, d) + } + out + } + # Get the experiment data together with the iterations + if (is.list(allElites)) { + data <- reshape_data(data, if (type == "all") allElites else iterationElites) + data$iteration_f <- factor(data$iteration, levels = unique(data$iteration)) + } else { + data <- reshape(data, + varying = as.character(colnames(data)), + v.names = "performance", + timevar = "ids", + times = as.character(colnames(data)), + new.row.names = 1:(nrow(data) * ncol(data)), + direction = "long") + data <- data[!is.na(data[,"performance"]),] + data$iteration <- 0 + } + + # Mark best configurations + if (type == "all") { + data$best_conf <- rep("none", nrow(data)) + if (first_is_best) { + if (is.list(allElites)) { + for (i in 1:length(allElites)) { + data$best_conf[data$iteration == i & data$ids == as.character(iterationElites[i])] <- "best" + } + } else { + data$best_conf[data$ids == as.character(iterationElites[1])] <- "best" + } + } + } + data$ids_f <- factor(data$ids, levels = unique(data$ids)) + # FIXME: This should include the instance and seed. + # data$label <- paste0("Iteration: ", data$iteration_f, "\nValue: ", data$performance, "\n") + # Silence CRAN warning. + ids <- performance <- v_allElites <- names_col <- best_conf <- ids_f <- iteration_f <- label <- NULL + # FIXME: Simplify these conditions to avoid repetitions. + if (type == "ibest") { + p <- ggplot(data, aes(x = ids_f, y = performance, colour = iteration_f)) + + labs(subtitle = "Iterations") + + theme(plot.subtitle = element_text(hjust = 0.5)) + } else { + # type="all" + if (first_is_best) { + p <- ggplot(data, aes(x = ids_f, y = performance, colour = best_conf)) + + scale_color_manual(values=c(best_color, "#999999")) + } else if (is.list(allElites)){ + p <- ggplot(data, aes(x = ids_f, y = performance, colour = iteration_f)) + } else { + p <- ggplot(data, aes(x = ids_f, y = performance, colour = ids_f)) + } + + if (is.list(allElites)) { + p <- p + labs(subtitle = "Iterations") + p <- p + theme(plot.subtitle = element_text(hjust = 0.5), + axis.text.x = element_text(size = 6.4, angle = 90)) + } else { + p <- p + theme(plot.subtitle = element_text(hjust = 0.5)) + } + } + + if (plot_points) { + p <- p + geom_point(shape = 16, na.rm = TRUE) + } else { + if (boxplot) + p <- p + geom_boxplot() + else + p <- p + geom_violin(draw_quantiles = c(0.25, 0.5, 0.75)) + if (show_points) + p <- p + geom_jitter(shape = 16, position = position_jitter(0.2), alpha=0.2, na.rm = TRUE) + } + y_lab <- if (rpd) "RPD (%)" else "Cost (raw)" + p <- p + theme(legend.position = "none") + labs(x = x_lab, y = y_lab) + + # each box plot is divided by iteration + if (is.list(allElites)) { + p <- p + facet_grid(cols = vars(data$iteration_f), scales = "free") + } + + # If the value in filename is added the pdf file is created + if (!is.null(filename)) ggsave(filename, plot = p) + + if (interactive) { + # FIXME: ggplotly does not work well with geom_violin() + # We need to create the violin plot directly in plotly: https://plotly.com/r/violin/ + p <- plotly::ggplotly(p) + } + p +} + +get_ranked_ids <- function(experiments) +{ + allElites <- c() + naExp <- sort(colSums(!is.na(experiments)), decreasing = TRUE) + for (r in unique(naExp)) { + confId <- names(naExp[naExp == r]) + confMean <- colMeans(experiments[, confId, drop=FALSE], na.rm = TRUE) + allElites <- c(allElites, names(sort(confMean))) + } + allElites +} diff --git a/R/boxplot_test.R b/R/boxplot_test.R new file mode 100644 index 0000000..7a592c0 --- /dev/null +++ b/R/boxplot_test.R @@ -0,0 +1,44 @@ +#' Box Plot Testing Performance +#' +#' Creates a box plot that displays the performance of a set of configurations on the test instances. +#' +#' The performance data is obtained from the test evaluations performed +#' by irace. Note that the testing is not a default feature in irace and should +#' be enabled in the setup (see the irace package user guide for more details). +#' +#' @template arg_irace_results +#' +#' @param type String, (default `"all"`) possible values are `"all"`, "ibest" or "best". "all" shows all the configurations included in the test, "best" shows the elite configurations of the last iteration and "ibest" shows the elite configurations of each iteration (requires that irace includes the iteration elites in the testing). +#' +#' @param ... Other arguments passed to [boxplot_performance()]. +#' +#' @template ret_boxplot +#' +#' @seealso [boxplot_training()] [boxplot_performance()] +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", +#' "guide-example.Rdata", mustWork = TRUE)) +#' boxplot_test(iraceResults) +#' @export +boxplot_test <- function(irace_results, type = c("all", "ibest", "best"), ...) +{ + type <- match.arg(type) + if (!has_testing_data(irace_results)) + cli_abort("{.field irace_results} does not contain the testing data") + + if (type=="ibest" && !irace_results$scenario$testIterationElites) { + iraceplot_warn("irace data does not contain iteration elites testing,", + " changing plot type to {.code 'best'}") + type <- "best" + } + + id_configurations <- lapply(irace_results$allElites, utils::head, irace_results$scenario$testNbElites) + if (type == "best") { + id_configurations <- id_configurations[[length(id_configurations)]] + type <- "all" + } + boxplot_performance(experiments = irace_results$testing$experiments, + allElites = id_configurations, + type = type, ...) +} diff --git a/R/boxplot_training.R b/R/boxplot_training.R new file mode 100644 index 0000000..f9532a8 --- /dev/null +++ b/R/boxplot_training.R @@ -0,0 +1,63 @@ +#' Box Plot Training +#' +#' Creates a box plot that displays the performance of a set of configurations +#' on the training instances. Performance data is obtained from the evaluations +#' performed by irace during the execution process. This implies that the +#' number of evaluations can differ between configurations. +#' +#' +#' @template arg_irace_results +#' +#' @param iteration +#' Numeric, iteration number that should be included in the plot (example: `iteration = 5`) +#' When no iteration and no id_condigurations are provided, the iterations is assumed to be +#' the last one performed by irace. +#' +#' The performance data is obtained from the evaluations performed by irace +#' during the execution process. This implies that the number of evaluations +#' can differ between configurations due to the elimination process applied by +#' irace. This plot, consequently, does not provide a complete compaarison of +#' two configurations, for a fair comparison use the test data plot. +#' +#' @param id_configurations +#' Numeric vector, configurations ids whose performance should be included in the plot. +#' If no ids are provided, the configurations ids are set as the elite configuration ids +#' of the selected iteration (last iteration by default) +#' (example: `id_configurations = c(20,50,100,300,500,600,700)`). +#' +#' @param ... Other arguments passed to [boxplot_performance()]. +#' @template ret_boxplot +#' +#' @seealso [boxplot_test()] [boxplot_performance()] +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' boxplot_training(iraceResults) +#' \donttest{ +#' boxplot_training(iraceResults, iteration = 5) +#' boxplot_training(iraceResults, id_configurations = c(23,28,29)) +#' } +#' @export +boxplot_training <- function(irace_results, iteration = NULL, id_configurations = NULL, ...) +{ + if (length(iteration) > 0 & length(id_configurations) > 0) + stop("cannot use id_configurations and iteration at the same time") + + if (length(iteration) == 0) { + iteration <- length(irace_results$allElites) + } else if (iteration < 0 || length(irace_results$allElites) < iteration) { + # We verify that iteration is within the range of values it can take + stop("iteration number out of range") + } + + # Check configurations + if (length(id_configurations) == 0) { + id_configurations <- irace_results$allElites[[iteration]] + } else if (any(!(as.character(id_configurations) %in% colnames(irace_results$experiments)))) { + stop("provided configurations id not found in experiments") + } + boxplot_performance(experiments = irace_results$experiments, + allElites = id_configurations, + type = "all", ...) +} diff --git a/R/common.R b/R/common.R new file mode 100644 index 0000000..7ca4fce --- /dev/null +++ b/R/common.R @@ -0,0 +1,81 @@ +calculate_rpd <- function(x) +{ + min_cols <- matrixStats::rowMins(as.matrix(x), na.rm = TRUE) + 100 * (x - min_cols) / min_cols +} + +orca_pdf <- function(filename, plot) +{ + # The filename value is worked to separate it and assign it to new values. + nameFile <- basename(filename) + nameFile <- maybe_add_file_extension(nameFile, "pdf") + directory <- paste0(dirname(filename), sep = "/") + withr::with_dir(directory, plotly::orca(plot, nameFile)) +} + + +iraceplot_warn <- function(...) + cli_alert_warning(text = paste0("{.strong Warning:} ", ...)) + + +orca_save_plot <- function(plot_list, filename) +{ + if (!is.null(filename)) { + directory <- paste0(dirname(filename), sep = "/") + if (length(plot_list) == 1) { + plotly::orca(plot_list[[1]], irace::path_rel2abs(filename)) + } else { + base_name <- strsplit(basename(filename),split = '[.]')[[1]][1] + ext <- strsplit(basename(filename),split = '[.]')[[1]][2] + for (i in seq_along(plot_list)) { + part <- paste0("-", i) + cfile <- irace::path_rel2abs(paste0(directory, "/", base_name, part,"." , ext)) + plotly::orca(plot_list[[i]], cfile) + } + } + } +} + +maybe_add_file_extension <- function(filename, ext) +{ + if (startsWith(ext, ".")) ext <- substring(ext, 2L) + if (!has_file_extension(filename, ext)) filename <- paste0(filename, ".", ext) + filename +} + +has_file_extension <- function(filename, ext) +{ + if (startsWith(ext, ".")) ext <- substring(ext, 2L) + grepl(paste0('[.]', ext, '$'), filename, ignore.case = TRUE) +} + +#' Check if the results object generated by irace has data about the testing phase. +#' +#' @template arg_irace_results +#' +#' @return `logical(1)` +#' @export +has_testing_data <- function(irace_results) +{ + ins <- irace_results$scenario$testInstances + exp <- irace_results$testing$experiments + !(length(ins) == 0 || + (length(ins) == 1 && (is.na(ins) || nchar(ins) == 0)) || + length(exp) == 0 || !(is.matrix(exp) || is.data.frame(exp))) +} + +check_unknown_param_names <- function(x, parameters_names) +{ + x <- unlist(x) + if (any(!(x %in% parameters_names))) + stop("Unknown parameter names: ", paste0(setdiff(x, parameters_names), collapse=", ")) + x +} + +subset_param_names <- function(x, parameters_names, is_fixed) +{ + if (is.null(x)) return(parameters_names[!is_fixed]) + check_unknown_param_names(x, parameters_names) +} + + diff --git a/R/configurations_display.R b/R/configurations_display.R new file mode 100644 index 0000000..d9480ba --- /dev/null +++ b/R/configurations_display.R @@ -0,0 +1,140 @@ +#' The configurations by iteration and instance +#' +#' A graph is created with all the settings and instance of the training data +#' +#' @template arg_irace_results +#' +#' @template arg_rpd +#' +#' @template arg_filename +#' +#' @template arg_interactive +#' +#' @return [ggplot2::ggplot()] object +#' +#' @examples +#' \donttest{ +#' iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", +#' "guide-example.Rdata", mustWork = TRUE)) +#' configurations_display(iraceResults) +#' } +#' @export +configurations_display <- function(irace_results, rpd = TRUE, filename = NULL, interactive = base::interactive()) +{ + # FIXME: This function takes a long time. + # variable assignment + time <- bound <- instance <- configuration <- iteration <- nconfig <- cont_exe <- NULL + nconfig <- 0 + experiments <- as.data.frame(irace_results$experiments) + + if (rpd) experiments <- calculate_rpd(experiments) + + exp_log <- select(as.data.frame(irace_results$experimentLog), -time, -bound) + value <- sample(NA, size = dim(exp_log)[1], replace = TRUE) + execution <- sample(NA, size = dim(exp_log)[1], replace = TRUE) + tabla <- cbind(exp_log, value, execution) + + # the values of each configuration are added to the table + cont_exe <- 0 + for (i in 1:dim(exp_log)[1]) { + for (j in 1:dim(irace_results$experiments)[1]) { + if (!is.na(experiments[[tabla$configuration[i]]][j])) { + if (is.na(tabla$value[i])) { + cont_exe <- cont_exe + 1 + tabla$value[i] <- experiments[[tabla$configuration[i]]][j] + tabla$execution[i] <- cont_exe + } else { + cont_exe <- cont_exe + 1 + add <- tabla[i, ] + add$value <- experiments[[tabla$configuration[i]]][j] + add$execution <- cont_exe + tabla <- rbind(tabla, add) + } + } + } + } + + # new columns are created and added to the table + type <- sample(NA, size = dim(tabla)[1], replace = TRUE) + conf_it <- sample(NA, size = dim(tabla)[1], replace = TRUE) + instance_it <- sample(NA, size = dim(tabla)[1], replace = TRUE) + media_regular <- sample(NA, size = dim(tabla)[1], replace = TRUE) + media_elite <- sample(NA, size = dim(tabla)[1], replace = TRUE) + regular_color <- sample("median iteration", size = dim(tabla)[1], replace = TRUE) + elite_color <- sample("median elites", size = dim(tabla)[1], replace = TRUE) + tabla <- cbind(tabla, type, conf_it, instance_it, media_regular, media_elite, regular_color, elite_color) + tabla <- tabla[order(tabla$execution), ] + + # the data is added to the conf_it, instance_it and type columns + for (j in 1:length(irace_results$allElites)) { + nconfig <- max(tabla$execution[tabla$iteration == j]) + tabla$conf_it[tabla$iteration == j] <- nconfig + tabla$instance_it[tabla$iteration == j] <- max(unique(tabla$instance[tabla$iteration == j])) + + if (j == length(irace_results$allElites)) { + tabla$type[tabla$iteration == j & !(tabla$configuration %in% irace_results$allElites[[j]])] <- "regular config." + tabla$type[tabla$iteration == j & (tabla$configuration %in% irace_results$allElites[[j]])] <- "final elite config." + tabla$type[tabla$iteration == j & (tabla$configuration %in% irace_results$allElites[[j]][1])] <- "best found config." + } else { + tabla$type[tabla$iteration == j & !(tabla$configuration %in% irace_results$allElites[[j]])] <- "regular config." + tabla$type[tabla$iteration == j & tabla$configuration %in% irace_results$allElites[[j]]] <- "elite config." + } + } + + # The mean values are calculated in the configurations by iteration + for (k in 1:length(irace_results$allElites)) { + tabla$media_regular[tabla$iteration == k] <- mean(tabla$value[tabla$iteration == k]) + tabla$media_elite[tabla$iteration == k] <- mean(tabla$value[tabla$iteration == k & (tabla$type == "elite config." | tabla$type == "final elite config." | tabla$type == "best found config.")]) + } + + # Instance and configuration columns are converted to character + tabla$instance[1] <- as.character(tabla$instance[1]) + tabla$configuration[1] <- as.character(tabla$configuration[1]) + + # the text column is generated + tabla <- tabla %>% + mutate(text = paste0("execution: ", execution, "\n", "instance: ", instance, "\n", "configuration: ", configuration, "\n")) + + # the execution column is passed to factor and added to the table + exe_factor <- factor(tabla$execution) + levels(exe_factor) <- tabla$execution + tabla <- cbind(tabla, exe_factor) + text <- NULL # Silence CRAN warning + + # point plot creation + p <- ggplot(tabla, aes(x = exe_factor, y = value, color = instance, text = text)) + + geom_point(aes(shape = type, size = type, alpha = type)) + + facet_grid(cols = vars(tabla$instance_it), scales = "free_x", space = "free_x") + + scale_shape_manual(values = c(22, 21, 24, 4)) + + scale_color_manual(values = c(rainbow(n_distinct(tabla$instance)), "red", "orange"), breaks = c("median elites", "median iteration")) + + scale_size_manual(values = c(2, 2, 2, 0.5)) + + scale_alpha_manual(values = c(0.8, 0.6, 1, 0.2)) + + scale_x_discrete(breaks = c(1, unique(tabla$conf_it))) + + labs( + x = "Candidate evaluations", + y = "RPD", + subtitle = "Instances evaluated" + ) + + theme( + axis.text.x = element_text(angle = 90), + axis.ticks.x = element_blank(), + plot.subtitle = element_text(hjust = 0.5), + strip.text.x = element_text(size = 8), + legend.position = "right", + legend.title = element_blank() + ) + + geom_point(mapping = aes(y = media_elite, color = elite_color), size = 0.1, ) + + geom_point(mapping = aes(y = media_regular, color = regular_color), size = 0.1) + + if (interactive) + p <- plotly::ggplotly(p, tooltip = "text") + + # If the value in filename is added the pdf file is created + if (!is.null(filename)) { + ggsave(filename, plot = p) + # If you do not add the value of filename, the plot is displayed + } else { + p + return(p) + } +} diff --git a/R/distance_config.R b/R/distance_config.R new file mode 100644 index 0000000..52a9a6d --- /dev/null +++ b/R/distance_config.R @@ -0,0 +1,67 @@ +#' Distance between configurations +#' +#' Calculate the difference between a configuration and the others in the irace data. +#' +#' @template arg_irace_results +#' +#' @param id_configuration +#' Numeric, configuration id which should be compared to others +#' (example: id_configuration = c(806,809)) +#' +#' @param t +#' Numeric, (default 0.05) threshold that defines the distance (percentage of the domain size) +#' to consider a parameter value equal to other. +#' +#' @return numeric +#' +#' @examples +#' NULL +distance_config <- function(irace_results, id_configuration, t = 0.05) { + + if (length(id_configuration) != 1) { + stop("Error: You must enter one configuration id\n") + } else if (FALSE %in% (id_configuration %in% irace_results$allConfigurations[[".ID."]])) { + stop(paste("Error: Configuration", id_configuration[1], "does not exist\n", sep = " ")) + } + + if (t < 0 || t > 1){ + stop("Error: threshold t should be in [0,1]\n") + } + + distance <- .ID. <- .PARENT. <- NULL + + #Get configurations + config <- select(irace_results$allConfigurations[id_configuration, ], -.ID., -.PARENT.) + others <- select(irace_results$allConfigurations[!(irace_results$allConfigurations$.ID. %in% id_configuration), ], -.ID., -.PARENT.) + tipos <- irace_results$parameters$types + + distance <- rep(0, nrow(others)) + + # Categorical parameters + cat_par <- names(tipos[tipos %in% c("c", "o")]) + for (pname in cat_par) { + if(is.na(config[,pname])) { + distance <- distance + as.numeric(!is.na(others[,pname])) + } else { + are_na <- is.na(others[,pname]) + distance <- distance + as.numeric(are_na) + distance[!are_na] <- distance[!are_na] + as.numeric(others[!are_na, pname] != config[,pname]) + } + } + + # Numerical parameters + num_par <- names(tipos[tipos %in% c("i", "r", "i,log", "r,log")]) + # calculate distance + threshold <- lapply(irace_results$parameters$domain[num_par], function(d) return(abs((d[2]-d[1])*t))) + for (pname in num_par) { + if(is.na(config[,pname])) { + distance <- distance + as.numeric(!is.na(others[,pname])) + } else { + are_na <- is.na(others[,pname]) + distance <- distance + as.numeric(are_na) + distance[!are_na] <- distance[!are_na] + as.numeric(abs(others[!are_na,pname] - config[,pname]) > threshold[[pname]]) + } + } + + return(distance) +} diff --git a/R/iraceplot-package.R b/R/iraceplot-package.R new file mode 100644 index 0000000..8220b91 --- /dev/null +++ b/R/iraceplot-package.R @@ -0,0 +1,52 @@ +#' @keywords internal +"_PACKAGE" + +#' The iraceplot package: \packageTitle{iraceplot} +#' @description +#' +#' Graphical Visualization Tools for Analysing the Data Produced by irace. +#' +#' boxplot_performance; +#' boxplot_test; +#' boxplot_training; +#' parallel_cat; +#' parallel_coord2; +#' parallel_coord; +#' plot_experiments_matrix; +#' plot_model; +#' report; +#' sampling_distance; +#' sampling_frequency; +#' sampling_frequency_iteration; +#' sampling_heatmap2; +#' sampling_heatmap; +#' sampling_pie; +#' scatter_performance; +#' scatter_test; +#' scatter_training; +#' +#' If you need information about any function you can write: +#' ?name_function +#' +#' If you need more information, go to the following page: +#' https://auto-optimization.github.io/iraceplot/ +#' +#' @name iraceplot-package +#' @docType package +#' +#' @details License: MIT + file LICENSE +#' +#' @author Maintainers: Pablo Oñate Marín and Leslie Pérez Cáceres and Manuel López-Ibañez +#' \email{leslie.perez@pucv.cl} +#' +#' @keywords package plot automatic configuration +#' +#' @import stats +#' @import tibble +#' @import irace +#' @importFrom cli cli_warn cli_inform cli_abort cli_alert_info cli_alert_warning +#' @importFrom dplyr mutate %>% group_by summarise select arrange count n_distinct slice +#' @importFrom ggplot2 aes after_stat element_blank element_rect element_text facet_grid geom_abline geom_bar geom_blank geom_boxplot geom_density geom_histogram geom_jitter geom_line geom_point geom_tile geom_violin ggplot ggsave ggtitle guides guide_axis guide_colourbar guide_legend labs position_jitter rel scale_alpha_manual scale_color_hue scale_color_manual scale_color_viridis_c scale_color_viridis_d scale_fill_manual scale_fill_viridis_c scale_shape_manual scale_size_manual scale_x_continuous scale_x_discrete scale_y_continuous scale_y_discrete theme theme_bw vars xlab ylab +#' @importFrom grDevices rainbow nclass.Sturges dev.off pdf +#' @importFrom gridExtra grid.arrange marrangeGrob +NULL diff --git a/R/parallel_cat.R b/R/parallel_cat.R new file mode 100644 index 0000000..3bd040d --- /dev/null +++ b/R/parallel_cat.R @@ -0,0 +1,213 @@ +#' Parallel Coordinates Category +#' +#' Parallel categories plot of selected configurations. Numerical parameters +#' are discretized to maximum `n_bins` intervals. To visualize configurations +#' of other iterations these must be provided setting the argument iterations, +#' groups of configurations of different iterations are shown in different +#' colors. Specific configurations can be selected providing their ids in the +#' `id_configurations` argument. +#' +#' The parameters to be included in the plot can be selected with the +#' param_names argument. Additionally, the maximum number of parameters to be +#' displayed in one plot. A list of plots is returned by this function in +#' several plots are required to display the selected data. +#' +#' +#' @template arg_irace_results +#' +#' @template arg_id_configurations +#' +#' @template arg_param_names +#' +#' @param iterations +#' Numeric vector, iterations from which configuration should be obtained +#' (example: iterations = c(1,4,5)) +#' +#' @param by_n_param +#' Numeric (optional), maximum number of parameters to be displayed. +#' +#' @param n_bins +#' Numeric (default 3), number of intervals to generate for numerical parameters. +#' +#' @template arg_filename +#' +#' @return parallel categories plot +#' +#' @seealso [parallel_coord()] [parallel_coord2()] +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' parallel_cat(iraceResults) +#' \donttest{ +#' parallel_cat(iraceResults, by_n_param = 6) +#' parallel_cat(iraceResults, id_configurations = c(20, 50, 100)) +#' parallel_cat(iraceResults, param_names = c("algorithm", "alpha", "rho", "q0", "rasrank")) +#' parallel_cat(iraceResults, iterations = c(1, 4, 6), n_bins=4) +#' } +#' @export +parallel_cat <- function(irace_results, id_configurations = NULL, param_names = NULL, + iterations = NULL, by_n_param = NULL, n_bins=3, filename = NULL) { + + # Variable assignment + iteration <- configuration <- dim <- tickV <- vectorP <- x <- y <- id <- freq <- NULL + id_configurations <- unlist(id_configurations) + param_names <- subset_param_names(param_names, irace_results$parameters$names, irace_results$parameters$isFixed) + # Verify that param_names contains more than one parameter + if (length(param_names) < 2) stop("Data must have at least two parameters") + + # Check by_n_param + if (is.null(by_n_param)) + by_n_param <- length(param_names) + if (!is.numeric(by_n_param)){ + stop("Error: argument by_n_param must be numeric\n") + } else if (by_n_param < 2) { + stop("Error: argument by_n_param must > 1\n") + } + by_n_param <- min(length(param_names), by_n_param) + + # Check iterations + if (!is.null(iterations)) { + if (any(!(iterations %in% 1:length(irace_results$allElites)))) { + stop("Error: The interactions entered are outside the possible range\n") + } + } else { + iterations <- 1:length(irace_results$allElites) + } + + # Check configurations + if (!is.null(id_configurations)) { + # Verify that the entered id are within the possible range + if (any(id_configurations[id_configurations < 1]) || any(id_configurations[id_configurations > nrow(irace_results$allConfigurations)])) { + stop("Error: IDs provided are outside the range of settings\n") + } + # Verify that the id entered are more than 1 or less than the possible total + if (length(id_configurations) <= 1 || length(id_configurations) > nrow(irace_results$allConfigurations)) { + stop("Error: You must provide more than one configuration id\n") + } + iterations <- 1:length(irace_results$allElites) + } else { + id_configurations <- unique(irace_results$experimentLog[irace_results$experimentLog[,"iteration"] %in% iterations, "configuration"]) + } + + if (!is.numeric(n_bins) || n_bins < 1) { + stop("Error: n_bins must be numeric > 0") + } + + # Select data + tabla <- irace_results$allConfigurations[irace_results$allConfigurations[, ".ID."] %in% id_configurations, ] + filtro <- unique(irace_results$experimentLog[, c("iteration", "configuration")]) + filtro <- filtro[filtro[, "configuration"] %in% id_configurations, ] + filtro <- filtro[filtro[, "iteration"] %in% iterations, ] + + # Merge iteration and configuration data + colnames(filtro)[colnames(filtro) == "configuration"] <- ".ID." + tabla <- merge(filtro, tabla, by=".ID.") + + # adding discretization for numerical variables and replace NA values + # FIXME: Add proper ordering for each axis + # FIXME: add number of bins as an argument (maybe a list?) + # FIXME: This is surely wrong! It is not using param_names calculated above! + for (pname in irace_results$parameters$names) { + n_bins_param <- n_bins + if (irace_results$parameters$types[pname] %in% c("i", "r", "i,log", "r,log")) { + not.na <- !is.na(tabla[,pname]) + u_data <- unique(tabla[not.na, pname]) + if (length(u_data) >= n_bins_param) { + snot.na <- sum(not.na) + if(snot.na < nrow(tabla)) { + n_bins_param <- max(n_bins - 1, 1) + if (snot.na < nrow(tabla)/3) + n_bins_param <- 2 + } + val <- tabla[not.na, pname] + bb <- seq(irace_results$parameters$domain[[pname]][1], + irace_results$parameters$domain[[pname]][2], + length.out=(n_bins_param+1)) + if (irace_results$parameters$types[pname] %in% c("i", "i,log")) + bb <- round(bb) + #quartile based ranges + #val <- c(irace_results$parameters$domain[[pname]], tabla[not.na, pname]) + #bb <- unique(c(quantile(val, probs=seq(0,1, by=1/n_bins_param)))) + #bins <- as.character(bins[3:length(bins)],scientific = F) + bins <- cut(val, breaks=bb, include.lowest = TRUE, ordered_result=TRUE) + bins <- as.character(bins) + tabla[not.na, pname] <- bins + } + } + + # replace NA values + rna <- is.na(tabla[,pname]) + if (any(rna)) { + tabla[rna,pname] <- "NA" + } + tabla[, pname] <- factor(tabla[, pname]) + } + + # Column .ID. and .PARENT. are removed + tabla <- tabla[, !(startsWith(colnames(tabla), "."))] + tabla$iteration <- factor(tabla$iteration, ordered=TRUE) + + n_parts <- ceiling(length(param_names) / by_n_param) + start_i <- 1 + end_i <- by_n_param + plot_list <- list() + # Create plots + for (i in 1:n_parts) { + # stop if we reach the end + if (end_i > length(param_names)) + break; + + # add las parameter as we cant plot + # one parameter in the next plot + if (length(param_names) == (end_i+1)) + end_i <- end_i + 1 + + params <- param_names[start_i:end_i] + ctabla <- tabla[,c(params, "iteration")] + + # Format data + ctabla <- ctabla %>% + group_by(ctabla[1:ncol(ctabla)]) %>% + summarise(freq = dplyr::n())# %>% filter(freq > 1) + ctabla <- ggforce::gather_set_data(ctabla, params) + ctabla <- ctabla[ctabla$x != "iteration", ] + + # Create plot + p <- ggplot(ctabla, aes(x, id = id, split = y, value = freq)) + + ggforce::geom_parallel_sets(aes(fill = iteration), alpha = 0.8, axis.width = 0.2) + + ggforce::geom_parallel_sets_axes(axis.width = 0.3, alpha = 0.4, color = "lightgrey", fill = "lightgrey") + + ggforce::geom_parallel_sets_labels(colour = "black", angle = 90, size = 3) + + theme_bw() + + theme( + axis.text.x = element_text(angle = 90, size = 9), + axis.title.x = element_blank(), + axis.text.y = element_blank(), + axis.ticks.y = element_blank(), + panel.grid.major = element_blank(), + panel.grid.minor = element_blank() + ) + plot_list[[i]] <- p + start_i <- start_i + by_n_param + end_i <- min(end_i + by_n_param, length(param_names)) + } + + + # If the value in filename is added the pdf file is created + if (!is.null(filename)) { + if (length(plot_list) == 1) { + ggsave(filename, plot = plot_list[[1]]) + } else { + directory <- paste0(dirname(filename), sep = "/") + base_name <- strsplit(basename(filename),split = '[.]')[[1]][1] + ext <- strsplit(basename(filename),split = '[.]')[[1]][2] + for (i in 1:length(plot_list)) { + part <- paste0("-", i) + ggsave(paste0(directory, "/", base_name, part,"." ,ext), plot = plot_list[[i]]) + } + } + } + + if (length(plot_list) == 1) + return(plot_list[[1]]) + return(plot_list) +} diff --git a/R/parallel_coord.R b/R/parallel_coord.R new file mode 100644 index 0000000..0f69b7b --- /dev/null +++ b/R/parallel_coord.R @@ -0,0 +1,460 @@ +#' Parallel Coordinates Plot +#' +#' Parallel coordinates plot of a set of selected configurations. Each line in +#' the plot represents a configuration. By default, the final elite +#' configurations are shown. To visualize configurations of other iterations +#' these must be provided setting the argument iterations, configurations of +#' different iterations are shown in different colors. Setting the only_elites +#' argument to FALSE allows to display all configurations in the selected +#' iterations, specific configurations can be selected providing their ids in +#' the id_configuration argument. +#' +#' The parameters to be included in the plot can be selected with the param_names +#' argument. Additionally, the maximum number of parameters to be displayed in one +#' plot. A list of plots is returned by this function in several plots are required +#' to display the selected data. +#' +#' To export the plot to a file, it is possible to do it so manually using the +#' functionality provided by plotly in the plot. If a filename is provided, +#' orca server will be used to export the plots and thus, it requires the library +#' to be installed (). +#' +#' +#' @template arg_irace_results +#' @template arg_id_configurations +#' @template arg_param_names +#' +#' @param iterations +#' Numeric vector, iteration number that should be included in the plot +#' (example: iterations = c(1,4,5)) +#' +#' @param only_elite +#' logical (default TRUE), only print elite configurations (argument ignored when +#' id_configurations is provided) +#' +#' @param by_n_param +#' Numeric (optional), maximum number of parameters to be displayed. +#' +#' @param color_by_instances +#' Logical (default TRUE), choose how to color the lines. TRUE shows the number +#' of instances evaluated by the configuration in the colores. FALSE to show +#' the iteration number where the configuration was sampled. +#' +#' @template arg_filename +#' @template orca_required +#' +#' @return parallel coordinates plot +#' @export +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' parallel_coord(iraceResults) +#' \donttest{ +#' parallel_coord(iraceResults, by_n_param = 5) +#' parallel_coord(iraceResults, only_elite = FALSE) +#' parallel_coord(iraceResults, id_configurations = c(20, 30, 40, 50, 100)) +#' parallel_coord(iraceResults, param_names = c("algorithm", "alpha", "rho", "q0", "rasrank")) +#' parallel_coord(iraceResults, iterations = c(1, 4, 6)) +#' } +parallel_coord <- function(irace_results, id_configurations = NULL, param_names = NULL, + iterations = NULL, only_elite = TRUE, by_n_param = NULL, + color_by_instances = TRUE, filename = NULL) +{ + parameters <- irace_results$parameters + ptypes <- parameters$types + pdomain <- parameters$domain + # The function get_dimensions creates a list of settings for each vertical axis + # in the parallel coordinates plot + get_dimensions <- function(data) { + dimensions <- list() + # FIXME: This function can be simplified a lot! + # Create plot dimensions + for (i in 1:ncol(data)) { + pname <- colnames(data)[i] + if (pname == "iteration") { + dimensions[[i]] <- list( + label = pname, + range = c(1L, length(irace_results$allElites)), + values = data[,pname], + visible = FALSE + ) + } else if (pname == ".ID.") { + # FIXME: We should do this preprocessing before here. + data[,pname] <- as.character(data[,pname]) + # FIXME: Encoding as integers can be done by using factor() and levels() + tickT <- data[,pname] + tickT <- tickT[order(suppressWarnings(as.numeric(tickT)), tickT)] + tickV <- seq_along(tickT) + rdata <- rep(NA, nrow(data)) + for (v in tickV) + rdata[data[,pname] == tickT[v]] <- v + + # FIXME: We may have too many IDs. We probably cannot see more than 10-15. + dimensions[[i]] <- list( + range = range(tickV), + label = "ID", + tickvals = tickV, + ticktext = tickT, + values = rdata + ) + + } else if (ptypes[pname] %in% c("c", "o")) { + # FIXME: Encoding as integers can be done by using factor() and levels() and being careful with NAs (replacing them with an "NA" string) + tickT <- as.character(pdomain[[pname]]) + # FIXME: Can you have NA here? Haven't we remove them already in na_data_processing? + if (anyNA(data[,pname])) tickT <- c(tickT, "NA") + tickV <- seq_along(tickT) + # FIXME: We should do this preprocessing before here. + data[,pname] <- as.character(data[,pname]) + rdata <- rep(NA, nrow(data)) + for (v in tickV) { + rdata[data[,pname] == tickT[v]] <- v + } + # FIXME: We may have too many tickmarks to see. We probably cannot see more than 10-15. + dimensions[[i]] <- list( + range = c(1L, max(tickV)), + label = pname, + tickvals = tickV, + ticktext = tickT, + values = rdata + ) + # FIXME: I don't think this will work with log transform + } else if (ptypes[pname] %in% c("i", "i,log", "r", "r,log")) { + # This is detecting that na_data_preprocessing has encoded NA values. + ## FIXME: There must be a better way to do this. + if ((as.numeric(pdomain[[pname]][2]) + 1) %in% data[,pname]) { + minimo <- pdomain[[pname]][1] + maximo <- pdomain[[pname]][2] + 1 + medio <- round(((maximo - 1) / 4), 1) + medio2 <- round(((maximo - 1) / 2), 1) + medio3 <- round(((maximo - 1) * (3 / 4)), 1) + dimensions[[i]] <- list( + range = c(pdomain[[pname]][1], pdomain[[pname]][2] + 1), + tickvals = c(minimo, medio, medio2, medio3, maximo), + ticktext = c(minimo, medio, medio2, medio3, ""), + values = data[,pname], + label = pname + ) + } else { + minimo <- pdomain[[pname]][1] + maximo <- pdomain[[pname]][2] + medio <- round((maximo / 4), 1) + medio2 <- round((maximo / 2), 1) + medio3 <- round(maximo * (3 / 4), 1) + dimensions[[i]] <- list( + range = c(pdomain[[pname]][1], pdomain[[pname]][2]), + tickvals = c(minimo, medio, medio2, medio3, maximo), + ticktext = c(minimo, medio, medio2, medio3, maximo), + values = data[,pname], + label = pname + ) + } + } + } + return(dimensions) + } + + param_names <- subset_param_names(param_names, parameters$names, parameters$isFixed) + # Verify that param_names contains more than one parameter + if (length(param_names) < 2) stop("Data must have at least two parameters") + by_n_param <- check_by_n_param(by_n_param, length(param_names)) + + # Check iterations + if (!is.null(iterations)) { + it <- 1:length(irace_results$allElites) + if (any(!(iterations %in% it))) { + cli_abort("The iterations entered are outside the possible range") + } + } else { + iterations <- length(irace_results$allElites) + if (length(irace_results$allElites[[length(irace_results$allElites)]]) == 1) { + cli_alert_info("Note: The final iteration only has one elite configuration\n") + } + } + + # Check configurations + if (is.null(id_configurations)) { + if (only_elite) { + id_configurations <- irace_results$allElites[iterations] + } else { + id_configurations <- irace_results$experimentLog[irace_results$experimentLog[,"iteration"] %in% iterations, "configuration"] + } + } else { + # FIXME: This overrides the above setting of iterations! + iterations <- 1:length(irace_results$allElites) + } + + id_configurations <- unique(as.character(unlist(id_configurations))) + if (length(id_configurations) <= 1) { + stop("You must provide more than one configuration ID") + } + if (any(!(id_configurations %in% irace_results$allConfigurations[, ".ID."]))) { + stop("Unknown configuration IDs: ", paste0(setdiff(id_configurations, irace_results$allConfigurations[, ".ID."]), collapse=", ")) + } + + # Select data + data <- irace_results$allConfigurations[irace_results$allConfigurations[, ".ID."] %in% id_configurations, ,drop=FALSE] + config_iter <- unique(irace_results$experimentLog[, c("iteration", "configuration")]) + config_iter <- config_iter[config_iter[, "configuration"] %in% id_configurations, ,drop=FALSE] + config_iter <- config_iter[config_iter[, "iteration"] %in% iterations, ,drop=FALSE] + + experiments <- irace_results$experiments[,as.character(id_configurations),drop=FALSE] + # FIXME: It says fitness but this is not really fitness. There should be an option to color according to mean fitness value + fitness <- colSums(!is.na(experiments)) + + # Merge iteration and configuration data + colnames(config_iter)[colnames(config_iter) == "configuration"] <- ".ID." + data <- merge(config_iter, data, by=".ID.") + + # Merge fitness measure + data[,"fitness"] <- fitness[as.character(data[,".ID."])] + # FIXME: This is not correct because we are passing data after expanding it with fitness and iterations. We should do any preprocessing before adding columns + data <- na_data_processing(data, parameters) + + # Silence CRAN warnings + iteration <- .ID. <- NULL + + # iteration-based plot focused on sampling (first iteration is selected) + data <- as.data.frame(data %>% group_by(.ID.) %>% slice(which.min(iteration))) + + plot_list <- list() + plot_params <- param_names + # Create plots + i <- 1 + while (length(plot_params) > 0) { + start_i <- 1 + end_i <- min(by_n_param, length(plot_params)) + params <- plot_params[start_i:end_i] + plot_params <- plot_params[-(start_i:end_i)] + if (length(plot_params) == 1) { + params <- c(params, plot_params) + plot_params <- c() + } + + # FIXME: If we pass the data to the plot we do not need to pass it to the + # dimensions. It is enough to pass the column name: https://plotly.com/r/parallel-coordinates-plot/ + cdata <- data[,c(".ID.", params, "fitness", "iteration"), drop=FALSE] + dimensions <- get_dimensions(cdata) + color_col <- if (color_by_instances) "Instances" else "Iteration" + + # plot creation + p <- plotly::plot_ly(cdata) %>% + plotly::add_trace(type = "parcoords", + line = list( + color = if (color_by_instances) ~fitness else ~iteration, + colorscale = "Viridis", + colorbar = list(title = list(text=color_col)), + showscale = TRUE, + reversescale = TRUE, + cmin = if (color_by_instances) min(data[,"fitness"]) else 1L, + cmax = if (color_by_instances) max(data[,"fitness"]) else length(irace_results$allElites)), + dimensions = dimensions, + labelangle = -25) + plot_list[[i]] <- p + i <- i + 1 + } + + # Save plot file + orca_save_plot(plot_list, filename) + if (length(plot_list) == 1) + return(plot_list[[1]]) + return(plot_list) +} + + +#' Parallel Coordinates Plot (configurations) +#' +#' Parallel coordinates plot of a set of provided configurations. Each line in +#' the plot represents a configuration. The parameters to be included in the +#' plot can be selected with the param_names argument. Additionally, the +#' maximum number of parameters to be displayed in one plot. A list of plots is +#' returned by this function in several plots are required to display the +#' selected data. +#' +#' To export the plot to a file, it is possible to do it so manually using the +#' functionality provided by plotly in the plot. If a filename is provided, +#' orca server will be used to export the plots and thus, it requires the library +#' to be installed (). +#' +#' +#' @param configurations +#' Data frame, configurations in `irace` format +#' (example: `configurations = iraceResults$allConfigurations`) +#' +#' @param parameters +#' List, parameter object in irace format +#' (example: `parameters = iraceResults$parameters`) +#' +#' @template arg_param_names +#' +#' @param by_n_param +#' Numeric (optional), maximum number of parameters to be displayed +#' +#' +#' @template arg_filename +#' @template orca_required +#' +#' @return parallel coordinates plot +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' parallel_coord2(iraceResults$allConfigurations[iraceResults$iterationElites,], +#' iraceResults$parameters) +#' parallel_coord2(iraceResults$allConfigurations[iraceResults$iterationElites,], +#' iraceResults$parameters, +#' param_names = c("algorithm", "alpha", "rho", "q0", "rasrank")) +#' parallel_coord2(iraceResults$allConfigurations[iraceResults$iterationElites,], +#' iraceResults$parameters, by_n_param = 5) +#' @export +#' @md +parallel_coord2 <- function(configurations, parameters, param_names = parameters$names, + by_n_param = NULL, filename = NULL) { + + # The function get_dimensions creates a list of settings for each vertical axis + # in the parallelcoordinates plot + get_dimensions <- function(data) { + # Create plot dimensions + for (i in 1:ncol(data)) { + pname <- colnames(data)[i] + if (pname == ".ID.") next # FIXME: Handle this! + if (parameters$types[pname] %in% c("c", "o")) { + if (any(is.na(data[,pname]))) { + tickT <- c(as.character(parameters$domain[[pname]]), "") + tickV <- 1:(length(parameters$domain[[pname]]) + 1) + } else { + tickT <- as.character(parameters$domain[[pname]]) + if (length(tickT) == 1) { + # handle fixed parameters + tickT <- c("", tickT) + tickV <- 1:2 + } else { + tickV <- 1:length(parameters$domain[[pname]]) + } + } + + data[,pname] <- as.character(data[,pname]) + rdata <- rep(NA, nrow(data)) + for (v in 1:length(tickT)){ + rdata[data[,pname] == tickT[v]] <- v + } + + dim[[i]] <- list( + range = c(1, max(tickV)) , + label = pname, + tickvals = tickV, + ticktext = tickT, + values = rdata + ) + # if the column is of type numeric + } else if ((as.numeric(parameters$domain[[pname]][2]) + 1) %in% data[,pname]) { + minimo <- parameters$domain[[pname]][1] + maximo <- parameters$domain[[pname]][2] + 1 + medio <- round(((maximo - 1) / 4), 1) + medio2 <- round(((maximo - 1) / 2), 1) + medio3 <- round(((maximo - 1) * (3 / 4)), 1) + + dim[[i]] <- list( + range = c(parameters$domain[[pname]][1], parameters$domain[[pname]][2] + 1), + tickvals = c(minimo, medio, medio2, medio3, maximo), + ticktext = c(minimo, medio, medio2, medio3, ""), + values = data[,pname], + label = pname + ) + } else { + minimo <- parameters$domain[[pname]][1] + maximo <- parameters$domain[[pname]][2] + medio <- round((maximo / 4), 1) + medio2 <- round((maximo / 2), 1) + medio3 <- round(maximo * (3 / 4), 1) + dim[[i]] <- list( + range = c(parameters$domain[[pname]][1], parameters$domain[[pname]][2]), + tickvals = c(minimo, medio, medio2, medio3, maximo), + ticktext = c(minimo, medio, medio2, medio3, maximo), + values = data[,pname], + label = pname + ) + } + } + return(dim) + } + + param_names <- subset_param_names(param_names, parameters$names, parameters$isFixed) + # Verify that param_names contains more than one parameter + if (length(param_names) < 2) stop("Data must have at least two parameters") + by_n_param <- check_by_n_param(by_n_param, length(param_names)) + configurations <- na_data_processing(configurations, parameters) + + # Variable assignment + configuration <- dim <- tickV <- vectorP <- NULL + + plot_list <- list() + plot_params <- param_names + # Create plots + i <- 1 + while(length(plot_params) > 0) { + start_i <- 1 + end_i <- min(by_n_param, length(plot_params)) + params <- plot_params[start_i:end_i] + plot_params <- plot_params[-(start_i:end_i)] + if (length(plot_params) == 1) { + params <- c(params, plot_params) + plot_params <- c() + } + + ctabla <- configurations[,params, drop=FALSE] + dim <- get_dimensions(ctabla) + + # plot creation + p <- ctabla %>% plotly::plot_ly() + p <- p %>% plotly::add_trace( + type = "parcoords", + line = list( + color = "#60D0E1" + ), + dimensions = dim, + labelangle = -25 + ) + p <- p %>% plotly::layout(margin = list(r=40)) + plot_list[[i]] <- p + i <- i + 1 + } + + # Save plot file + orca_save_plot(plot_list, filename) + if (length(plot_list) == 1) + return(plot_list[[1]]) + return(plot_list) +} + +check_by_n_param <- function(by_n_param, length_param_names) +{ + if (is.null(by_n_param)) return(length_param_names) + if (!is.numeric(by_n_param)){ + stop("Argument by_n_param must be numeric") + } else if (by_n_param < 2) { + stop("Number of parameters and argument by_n_param must > 1") + } + min(length_param_names, by_n_param) +} + + +na_data_processing <- function(data, parameters) +{ + pnames <- colnames(data)[!startsWith(colnames(data), ".")] + # NA data processing + for (pname in pnames) { + # FIXME: This can be done by selecting all columns of each type. + if (parameters$types[pname] %in% c("i", "i,log", "r", "r,log")) { + ina <- is.na(data[,pname]) + if (any(ina)) data[ina,pname] <- parameters$domain[[pname]][2] + 1 + + } else if (parameters$types[pname] %in% c("c", "o")) { + ina <- is.na(data[,pname]) + if (any(ina)) data[ina,pname] <- "" + } + } + # Column .PARENT. is removed + data[, c(".ID.", pnames), drop=FALSE] +} diff --git a/R/plot_experiments_matrix.R b/R/plot_experiments_matrix.R new file mode 100644 index 0000000..17994c6 --- /dev/null +++ b/R/plot_experiments_matrix.R @@ -0,0 +1,91 @@ +#' Heat Map Plot +#' +#' Creates a heatmap plot that shows all performance data seen by irace. +#' Configurations are shown in the x-axis in the order in which they are +#' created in the configuration process. Instances are shown in the y-axis in +#' the order in which they where seen during the configuration run. This plot +#' gives a general idea of the configuration process progression, the number of +#' evaluations of each configuration show how long they survived in the +#' iterated racing procedure. +#' +#' @template arg_irace_results +#' +#' @template arg_filename +#' +#' @param metric Cost metric shown in the plot: `"raw"` shows the raw +#' values, `"rpd"` shows relative percentage deviation per instance and +#' `"rank"` shows rank per instance. +#' +#' @param show_conf_ids If `TRUE`, it shows the configuration IDs in the x-axis. Usually there are too many configurations, thus the default is `FALSE`. +#' +#' @template arg_interactive +#' +#' @return [ggplot2::ggplot()] object +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' plot_experiments_matrix(iraceResults) +#' @export +plot_experiments_matrix <- function(irace_results, filename = NULL, metric = c("raw", "rpd", "rank"), + show_conf_ids = FALSE, interactive = base::interactive()) +{ + metric <- match.arg(metric) + experiments <- irace_results$experiments + conf_ids <- colnames(experiments) + if (is.null(conf_ids)) conf_ids <- as.character(1:ncol(experiments)) + inst_ids <- rownames(experiments) + if (is.null(inst_ids)) inst_ids <- as.character(1:nrow(experiments)) + + if (metric == "rank") { + experiments[] <- matrixStats::rowRanks(experiments, ties.method = "average") + metric_lab <- "Rank" + transform <- "log10" + } else if (metric == "rpd") { + experiments[] <- calculate_rpd(experiments) + metric_lab <- "RPD" + transform <- "log1p" + } else { # == raw + metric_lab <- "Cost" + transform <- "identity" + } + + conf_id <- inst_id <- cost <- text <- NULL + # The table is created and organized for ease of use + experiments <- tibble::as_tibble(experiments) %>% + rownames_to_column("inst_id") %>% + tidyr::pivot_longer(-c("inst_id"), names_to = "conf_id", values_to = "cost") %>% + # We need to relevel so that they appear in the correct order + mutate(conf_id = forcats::fct_relevel(conf_id, conf_ids)) %>% + mutate(inst_id = forcats::fct_relevel(inst_id, inst_ids)) + + # The text field is added to the table to show it in the interactive plot. + if (interactive) { + experiments <- experiments %>% + mutate(text = paste0("Configuration: ", conf_id, "\nInstance: ", inst_id, "\nValue: ", cost, "\n")) + } else { + experiments <- experiments %>% mutate(text = "") + } + + p <- ggplot(experiments, aes(x = conf_id, y = inst_id, fill = cost, text = text)) + + geom_tile() + # Heatmap style + scale_fill_viridis_c(na.value = "white", direction=-1L, trans = transform, + guide = guide_colourbar(barheight=grid::unit(0.8,"npc"))) + + scale_y_discrete(expand=c(0L, 0L)) + + labs(x = "Configuration IDs", y = "Instance IDs", fill = metric_lab) + + # theme(legend.position = "top") + # This doesn't work because numbers often overlap + theme(axis.ticks = element_blank(), + panel.border = element_rect(colour="gray80", fill = NA, size=0.75), + plot.background = element_blank(), + panel.background = element_blank()) + + if (!show_conf_ids) p <- p + theme(axis.text.x = element_blank()) + + if (interactive) { + p <- plotly::ggplotly(p, tooltip = "text") + if (!is.null(filename)) orca_pdf(filename, p) + } else if (!is.null(filename)) { + ggsave(filename, plot = p) + } + p +} diff --git a/R/plot_model.R b/R/plot_model.R new file mode 100644 index 0000000..05a0482 --- /dev/null +++ b/R/plot_model.R @@ -0,0 +1,293 @@ +# Categorical model generation +# +# @description +# +# Rebuild the probabilities of the sampling models used in irace to generate +# configurations in each iteration. +# +# @template arg_irace_results +# +# @param param_name +# String, parameter to be included in the plot (example: param_name = "algorithm")) +# +# @return data frame with columns "iteration", "elite", "parameter", "value", "prob" +# +# @examples +# NULL +# FIXME: Add examples +getCategoricalModel <- function(irace_results, param_name) +{ + if (!(irace_results$parameters$types[[param_name]] %in% c("c"))) { + stop("Error: Parameter is not categorical\n") + } + iterations <- length(irace_results$allElites) + domain <- irace_results$parameters$domain[[param_name]] + n_val <- length(irace_results$parameters$domain[[param_name]]) + prob <- rep((1 / n_val), n_val) + + # Get elite data by iteration + all_elites <- list() + for (i in 1:iterations){ + all_elites[[i]] <- irace_results$allConfigurations[irace_results$allElites[[i]], c(".ID.", ".PARENT.",param_name) ] + } + + total_iterations <- floor(2 + log2(irace_results$parameters$nbVariable)) + + X <- NULL + models <- list() + for (i in 1:(iterations-1)) { + models[[i]] <- list() + total_iterations <- max(total_iterations, i+1) + for (elite in 1:length(irace_results$allElites[[i]])) { + cid <- all_elites[[i]][elite, ".ID."] + parent <- all_elites[[i]][elite, ".PARENT."] + if (i==1) { + cprob <- prob + } else { + if (as.character(cid) %in% names(models[[i-1]])) + cprob <- models[[i-1]][[as.character(cid)]] + else + cprob <- models[[i-1]][[as.character(parent)]] + } + + cprob <- cprob * (1 - (i / total_iterations)) + index <- which (domain == all_elites[[i]][elite, param_name]) + cprob[index] <- (cprob[index] + (i / total_iterations)) + if (irace_results$scenario$elitist) { + cprob <- cprob / sum(cprob) + probmax <- 0.2^(1 / irace_results$parameters$nbVariable) + cprob <- pmin(cprob, probmax) + } + # Normalize probabilities. + cprob <- cprob / sum(cprob) + models[[i]][[as.character(cid)]] <- cprob + for (v in 1:length(domain)) + X <- rbind(X, cbind(i, elite, param_name, domain[v], as.character(cprob[v]))) + } + } + X <- as.data.frame(X, stringsAsFactors=FALSE) + colnames(X) <-c("iteration", "elite", "parameter", "value", "prob") + X[, "prob"] <- as.numeric(X[, "prob"]) + return(X) +} + +# Numerical model generation +# +# @description +# +# Rebuild the sampling distribution parameters of the models used by irace to sampling configurations during the configuration process. +# +# @template arg_irace_results +# +# @param param_name +# String, parameter to be included in the plot (example: param_name = "algorithm")) +# +# @return data frame with columns "iteration", "elite", "parameter", "mean", "sd" +# +# @examples +# NULL +# FIXME: Add examples. +getNumericalModel <- function(irace_results, param_name) +{ + if (!(irace_results$parameters$types[[param_name]] %in% c("i", "r", "i,log", "r,log"))) { + stop("Parameter is not numerical") + } + + iterations <- length(irace_results$allElites) + domain <- irace_results$parameters$domain + n_par <- irace_results$parameters$nbVariable + + + # Get elite data by iteration + all_elites <- list() + for (i in 1:iterations){ + all_elites[[i]] <- irace_results$allConfigurations[irace_results$allElites[[i]], param_name] + } + + # Get initial model standard deviation + s <- (domain[[param_name]][2] - domain[[param_name]][1])/2 + + X <- NULL + for (i in 1:(iterations-1)) { + # Get not elite configurations executed in an iteration + it_conf <- unique(irace_results$experimentLog[irace_results$experimentLog[,"iteration"] == (i+1), "configuration"]) + new_it_conf <- it_conf[!(it_conf %in% irace_results$allElites[[i]])] + n_conf <- length(new_it_conf) + + # Generate updated standard deviation (numerical params) + s <- s * (1/n_conf)^(1/n_par) + for (elite in 1:length(irace_results$allElites[[i]])){ + par_mean <- all_elites[[i]][elite] + X <- rbind(X, cbind(i, elite, param_name, par_mean, s)) + } + } + + X <- as.data.frame(X) + colnames(X) <-c("iteration", "elite", "parameter", "mean", "sd") + X[,"sd"] <- as.numeric(as.character(X[,"sd"])) + X[,"mean"] <- as.numeric(as.character(X[,"mean"])) + rownames(X) <- NULL + return(X) +} + +# Plot a categorical model +# +# @description +# +# The `plotCategoricalModel` function creates a stacked bar plot showing +# the sampling probabilities of the parameter values for the elite +# configurations in the iterations of the configuration process. +# +# @param model_data +# String, data frame obtained from the `getCategoricalModel` function +# +# @param domain +# String Vector, domain of the parameter whose model will be plotted +# +# @return bar plot +# FIXME: examples! +plotCategoricalModel <- function(model_data, domain) +{ + value <- prob <- elite <- NULL + model_data$elite <- factor(model_data$elite) + p <- ggplot(model_data, aes(fill=value, y=prob, x=elite, group=value)) + + geom_bar(position="stack", stat="identity") + + ggplot2::scale_fill_viridis_d() + + facet_grid(~ iteration, scales = "free", space = "free") + + + p <- p + ggplot2::xlab("Elite configurations") + ggplot2::ylab("Probability") + + theme(axis.text.x = element_blank(), + axis.ticks.x = element_blank(), + axis.title.x = element_text(vjust = 4), + axis.text.y = element_blank(), + axis.ticks.y = element_blank()) + + return(p) +} + + +# Plot a categorical model +# +# @description +# +# Creates a sampling distributions plot of the numerical parameters for the +# elite configurations of an iteration. +# +# This plot shows de density function of the truncated normal distributions +# associated to each parameter for each elite configuration. +# +# @param iteration +# Numeric, iteration that should considered in the plot +# +# @param model_data +# String, data frame obtained from the `getNumericalModel` function +# +# @param domain +# Numeric vector, domain of the parameter whose model will be plotted +# +# @param xlabel_iteration +# Numeric, iteration in which the x axis labels should be included +# +# @return sampling distribution plot +# FIXME: examples! +plotNumericalModel <- function(iteration, model_data, domain, xlabel_iteration) +{ + model_data <- model_data[model_data[,"iteration"] == iteration, ] + model_data[,"elite"] <- as.factor(model_data[,"elite"]) + + x <- seq(from=domain[1], to=domain[2], length.out = 100) + + # create plot + p <- ggplot(as.data.frame(x, ncol=1), aes(x=x)) + el <- unique(as.character(model_data[,"elite"])) + col <- viridisLite::viridis(length(el)) + + for (i in 1:length(el)) { + mm <- model_data[model_data[,"elite"] == el[i], "mean"] + ss <- model_data[model_data[,"elite"] == el[i], "sd"] + if(is.na(mm)) next + p <- p + ggplot2::stat_function(fun = truncnorm::dtruncnorm, + geom = "area", + args=list(mean = mm, + sd = ss, + a = domain[1], + b = domain[2]), + color = col[i], + fill = col[i], + xlim = domain, + size = 0.5, + alpha = 0.3) + } + if (xlabel_iteration==iteration){ + p <- p + ggplot2::ylab(as.character(iteration+1)) + + theme(axis.title.x = element_blank(), + axis.title.y = element_text(vjust = 0), + axis.text.y = element_blank(), + axis.ticks.y = element_blank()) + } else { + p <- p + theme(axis.text.x = element_blank(), + axis.title.x = element_blank(), + axis.ticks.x = element_blank(), + axis.title.y = element_text(vjust = 0), + axis.text.y = element_blank(), + axis.ticks.y = element_blank()) + p <- p + labs(y = as.character(iteration+1)) + } + p <- p + ggplot2::xlim(domain[1], domain[2]) + return(p) +} + +#' Plot the sampling models used by irace +#' +#' @description +#' +#' Display the sampling models from which irace generated parameter values for +#' new configurations during the configurations process. +#' +#' For categorical parameters a stacked bar plot is created. This plot shows +#' the sampling probabilities of the parameter values for the elite +#' configurations in the iterations of the configuration process. +#' +#' For numerical parameters a sampling distributions plot of the +#' numerical parameters for the elite configurations of an iteration. +#' This plot shows de density function of the truncated normal distributions +#' associated to each parameter for each elite configuration on each iteration. +#' +#' @template arg_irace_results +#' +#' @param param_name +#' String, parameter to be included in the plot, e.g., `param_name = "algorithm"` +#' +#' @template arg_filename +#' +#' @return sampling model plot +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' plot_model(iraceResults, param_name="algorithm") +#' \donttest{ +#' plot_model(iraceResults, param_name="alpha") +#' } +#' @export +plot_model <- function(irace_results, param_name, filename=NULL) +{ + check_unknown_param_names(param_name, irace_results$parameters$names) + iterations <- length(irace_results$allElites) + + if (irace_results$parameters$types[param_name] %in% c("c", "o")) { + X <- getCategoricalModel(irace_results, param_name) + q <- plotCategoricalModel(model_data=X, domain=irace_results$parameters$domain[[param_name]]) + } else { + X <- getNumericalModel(irace_results, param_name) + p <- lapply((iterations-1):1, plotNumericalModel, model_data=X, + domain=irace_results$parameters$domain[[param_name]], + xlabel_iteration=1) + q <- do.call("grid.arrange", c(p, ncol = 1, left="Iterations")) + } + + if(!is.null(filename)) + ggsave(filename, plot = q) + q +} diff --git a/R/reexport.R b/R/reexport.R new file mode 100644 index 0000000..7f574b3 --- /dev/null +++ b/R/reexport.R @@ -0,0 +1,3 @@ +#' @importFrom irace read_logfile +#' @export +irace::read_logfile diff --git a/R/report.R b/R/report.R new file mode 100644 index 0000000..a836e4a --- /dev/null +++ b/R/report.R @@ -0,0 +1,51 @@ +#' Create HTML Report from irace data +#' +#' This function creates an HTML report of the most relevant irace data. This +#' report provides general statistics and plots that show the best +#' configurations and their performance. Example: +#' +#' @template arg_irace_results +#' +#' @param filename (`character(1)`) +#' Filename indicating where to save the report (example: `"~/path-to/filename"`). +#' @param sections (`list()`) List of sections to enable/disable. This is useful for disabling sections that may cause problems, such as out-of-memory errors. `NA` means automatically enable/disable a section depending on the memory required. +#' +#' @template arg_interactive +#' +# Useless imports to suppress a NOTE generated by "R CMD check" +#' @importFrom DT renderDataTable +#' @importFrom knitr knit +#' +#' @return filename where the report was created or it opens the report in the default browser (interactive). +#' +#' @examples +#' \donttest{ +#' withr::with_tempdir({ +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' report(iraceResults, filename = file.path(getwd(), "report")) +#' }, clean = !base::interactive()) +#' } +#' @export +report <- function(irace_results, filename = "report", + sections = list(experiments_matrix = NULL), + interactive = base::interactive()) +{ + if (missing(irace_results)) stop("irace_results is required") + irace_results <- irace::read_logfile(irace_results) + + # Large experiments matrix crashes pandoc. + if (is.null(sections$experiments_matrix)) { + sections$experiments_matrix <- (prod(dim(irace_results$experiments)) < 128L*1024L) + if (!sections$experiments_matrix) + iraceplot_warn("Race overview disable because the experiments matrix", + " is very large (set {.code sections$experiments_matrix=TRUE} to enable).") + } + + filename <- irace::path_rel2abs(maybe_add_file_extension(filename, "html")) + cli_alert_info("Creating file '{.file {filename}}'.\n") + rmarkdown::render(input=system.file("template", "report_html.Rmd", package = "iraceplot"), + output_file=filename, clean = FALSE) + if (interactive) utils::browseURL(filename) + filename +} diff --git a/R/sampling_distance.R b/R/sampling_distance.R new file mode 100644 index 0000000..b3a7701 --- /dev/null +++ b/R/sampling_distance.R @@ -0,0 +1,124 @@ +#' Sampling distance Plot +#' +#' @description +#' The `sampling_distance` function creates a plot that displays the mean of the +#' distance between the configurations that were executed in each iteration. +#' +#' For categorical parameters the distance is calculated as the hamming distance, +#' for numerical parameters a equality interval is defined by a threshold +#' specified by argument t and hamming distance is calculated using this interval. +#' +#' @template arg_irace_results +#' @param type +#' String, (default "boxplot") Type of plot to be produces, either "line", "boxplot" +#' or "both". The "boxplot" setting shows a boxplot of the mean distance of all +#' configurations, "line" shows the mean distance of the solution population in each +#' iteration, "both" shows both plots. +#' +#' @param t +#' Numeric, (default 0.05) percentage factor that will determine a distance to +#' define equal numerical parameter values. If the numerical parameter values to be +#' compared are v1 and v2 they are considered equal if `|v1-v2| <= |ub-lb|*t`. +#' +#' @template arg_filename +#' +#' @return line or box plot +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' sampling_distance(iraceResults) +#' \donttest{ +#' sampling_distance(iraceResults, type = "boxplot", t=0.07) +#' } +#' @export +sampling_distance <- function(irace_results, type = c("boxplot", "line", "both"), t = 0.05, filename = NULL) +{ + type <- match.arg(type) + + # variable assignment + ids <- media <- allconf <- valor <- iterations <- tabla_box <- iteration <- vectorP <- NULL + allconf <- irace_results$allConfigurations + n_param <- length(allconf) - 2 + niterations <- length(irace_results$allElites) + + all.distance <- matrix(0, ncol=nrow(allconf), nrow=nrow(allconf)) + a <- 1:nrow(allconf) + for (confid in allconf$.ID.) { + all.distance[confid,a[-confid]] <- distance_config(irace_results, id_configuration=confid, t=t) + } + + # The value of the distance between each configuration is created + for (i in 1:niterations) { + ids <- unique(irace_results$experimentLog[irace_results$experimentLog[,"iteration"] %in% i, "configuration"]) + iterations <- c(iterations, i) + + dd <- all.distance[ids,ids] + distance <- dd[upper.tri(dd)] + it <- i + datos <- data.frame(it, distance) + tabla_box <- rbind(tabla_box, datos) + tabla_box[,"it"] <- as.character(tabla_box[,"it"]) + media <- c(media, mean(distance)) + } + + tabla_box[,"it"] <- factor(tabla_box[,"it"] , levels=as.character(1:niterations)) + + # A graph of points and lines is created + if (type == "line" | type == "both") { + tabla <- as.data.frame(cbind(iterations, media)) + colnames(tabla) <- c("iterations", "media") + p <- ggplot(tabla, aes(x = iterations, y = media)) + + geom_point() + + geom_line() + + scale_y_continuous( + limits = c(0, n_param), + breaks = seq(0, n_param, 2) + ) + + scale_x_continuous( + limits = c(1, niterations), + breaks = seq(1, niterations, by=1) + ) + + scale_color_viridis_c() + + labs(y = "Distance", x = "iteration", color = "IT.") + + theme (legend.position = "none") + vectorP[1] <- list(p) + + # A box plot is created + } + if (type == "boxplot" | type == "both") { + p <- ggplot(tabla_box, aes(x = it, y = distance, group = it, color = it)) + + geom_boxplot(na.rm = TRUE) + + scale_color_viridis_d() + + scale_y_continuous( + limits = c(0, n_param), + breaks = seq(0, n_param, 2) + ) + + labs(x = "iteration", y = "Distance", color = "IT.") + + theme (legend.position = "none") + vectorP[2] <- list(p) + } + + # If the value in filename is added the pdf file is created + if (!is.null(filename)) { + if (type == "both") { + if (!has_file_extension(filename, "pdf")) + stop("Unknown filetype: ", filename) + pdf(filename, width = 12) + do.call("grid.arrange", c(vectorP[1], ncol = 1)) + do.call("grid.arrange", c(vectorP[2], ncol = 1)) + dev.off() + } else { + ggsave(filename, plot = p) + } + + # If you do not add the value of filename, the plot is displayed + } else { + if (type == "both") { + do.call("grid.arrange", c(vectorP, nrow = 2)) + } else { + p + return(p) + } + } +} diff --git a/R/sampling_frequency.R b/R/sampling_frequency.R new file mode 100644 index 0000000..2b72749 --- /dev/null +++ b/R/sampling_frequency.R @@ -0,0 +1,166 @@ +#' Parameter Frequency and Density Plot +#' +#' Frequency or density plot that depicts the sampling performed by irace +#' across the iterations of the configuration process. For categorical +#' parameters a frequency plot is created, while for numerical parameters a +#' histogram and density plots are created. The plots are shown in groups of +#' maximum 9, the parameters included in the plot can be specified by setting +#' the param_names argument. +#' +#' @param configurations +#' Data frame, configurations in `irace` format. Example: `iraceResults$allConfigurations`. +#' +#' @param parameters List, parameters object in irace format. If this argument +#' is missing, the first parameter is taken as the `iraceResults` data +#' generated when loading the `.Rdata` file created by `irace` and +#' `configurations=iraceResults$allConfigurations` and `parameters = +#' iraceResults$parameters`. +#' +#' @template arg_param_names +#' +#' @param n +#' Numeric, for scenarios with large parameter sets, it selects a subset +#' of 9 parameters. For example, `n=1` selects the first 9 (1 to 9) parameters, n=2 selects +#' the next 9 (10 to 18) parameters and so on. +#' +#' @template arg_filename +#' +#' @note If there are more than 9 parameters, a pdf file extension is +#' recommended as it allows to create a multi-page document. Otherwise, you +#' can use the `n` argument of the function to generate the plot of a subset +#' of the parameters. +#' +#' @return Frequency and/or density plot +#' +#' @examples +#' # Either use iraceResults +#' iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", +#' "guide-example.Rdata", mustWork = TRUE)) +#' sampling_frequency(iraceResults) +#' \donttest{ +#' sampling_frequency(iraceResults, n = 2) +#' sampling_frequency(iraceResults, param_names = c("alpha")) +#' sampling_frequency(iraceResults, param_names = c("algorithm", "alpha", "rho", "q0", "rasrank")) +#' } +#' # Or explicitly specify the configurations and parameters. +#' sampling_frequency(iraceResults$allConfigurations, iraceResults$parameters) +#' \donttest{ +#' sampling_frequency(iraceResults$allConfigurations, iraceResults$parameters, n = 2) +#' sampling_frequency(iraceResults$allConfigurations, iraceResults$parameters, +#' param_names = c("alpha")) +#' sampling_frequency(iraceResults$allConfigurations, iraceResults$parameters, +#' param_names = c("algorithm", "alpha", "rho", "q0", "rasrank")) +#' } +#' @export +sampling_frequency <- function(configurations, parameters, param_names = NULL, n = NULL, filename = NULL) +{ + if (missing(parameters)) { + parameters <- configurations$parameters + configurations <- configurations$allConfigurations + } + param_names <- subset_param_names(param_names, parameters$names, parameters$isFixed) + if (any(!(param_names %in% colnames(configurations)))) { + stop("Unknown parameter name provided") + } + + # This is needed to silence CRAN warnings. + tabla <- Var1 <- Freq <- density <- NULL + # Filter data by parameter names + config <- configurations[,param_names,drop=FALSE] + if (!is.null(n)) { + max_p <- 9L # FIXME: Why 9 ???? + if (n < 1 | n > ceiling(length(param_names) / max_p)) { + stop(paste0("n cannot be less than 1 or greater than ", ceiling(length(param_names) / max_p), + " (", length(param_names)," parameters selected)")) + } + start <- (max_p * n - 8) + end <- min(max_p * n, length(param_names)) + param_names <- param_names[start:end] + config <- configurations[,param_names] + } + + plot.list <- list() + + for (pname in colnames(config)) { + if (parameters$isFixed[pname]) next + ptype <- parameters$types[pname] + if (ptype %in% c("c", "o")) { + tabla <- as.data.frame(table(config[[pname]])) + len_longest <- max(nchar(levels(tabla$Var1))) + angle_x <- 0 # FiXME: Rotation compresses the plot. This may be because + # of marrangeGrob. We should find another way. + p <- ggplot(data = tabla, aes(x = Var1, y = Freq)) + + geom_bar(stat = "identity", fill = "grey", color = "black") + + labs(x = "Values") + + ggtitle(pname) + + theme( + axis.title.y = element_blank(), + axis.title.x = element_text(size = 6), + plot.title = element_text(hjust = 0.5, size = rel(0.8), face = "bold"), + axis.ticks.x = element_blank() + ) + + # FIXME: This doesn't wrap if there are no spaces: https://github.com/r-lib/scales/issues/353 + # TODO: Use paste0(strsplit(string.to.split, "(?<=[[:lower:]])(?=[[:upper:]])", perl = TRUE), collapse="\n") to split camel case and replace with spaces. + # TODO: Replace "-" and "_" with spaces if needed. + # scale_x_discrete(labels = scales::label_wrap(8)) + + scale_y_continuous(n.breaks = 3) + + guides(x = guide_axis(angle = angle_x)) + } else if (ptype %in% c("i", "r", "i,log", "r,log")) { + # histogram and density plot + tabla <- na.omit(config[[pname]]) + nbreaks <- pretty(range(tabla), + n = nclass.Sturges(tabla), + min.n = 1) + # FIXME: There is some repetition between this plot and the above + # one. Avoid redundancy. + p <- ggplot(as.data.frame(tabla), aes(x = tabla)) + + geom_histogram(aes(y = after_stat(density)), + breaks = nbreaks, + color = "black", fill = "gray") + + geom_density(color = "blue", fill = "blue", alpha = 0.2) + + labs(x = "Values") + + ggtitle(pname) + + theme( + axis.title.y = element_blank(), + axis.title.x = element_text(size = 6), + plot.title = element_text(hjust = 0.5, size = rel(0.8), face = "bold"), + axis.ticks.x = element_blank()) + + scale_y_continuous(n.breaks = 3) + } else { + stop("Unknown parameter type (", ptype, ") of parameter '", pname, "'") + } + # the plot is saved in a list + plot.list <- c(plot.list, list(p)) + } + + npar <- length(plot.list) + # Get appropriate number of columns + col <- row <- 3 + if (npar <= 3) { + col <- npar + row <- 1 + } else if (npar <=6){ + col <- 3 + row <- 2 + } + # Generate plots + if (npar > 9) + wp <- do.call("marrangeGrob", list(grobs=plot.list, ncol = col, nrow = row, as.table=FALSE)) + else if (length(plot.list)==1) + wp <- plot.list[[1]] + else + wp <- do.call("grid.arrange", c(plot.list)) + + # If the value in filename is added + if (!is.null(filename)) { + if (npar > 9) { + iraceplot_warn("Multiple plots generated: If a filename with a pdf extension", + " was provided a multi page plot will be generated;", + " otherwise, only the last plot set will be saved.", + " Use the {.field n} argument of this function to plot by parameter set.") + } + # FIXME: we could save in multiple files with a counter in their name, + ggsave(filename, wp) + } + wp +} diff --git a/R/sampling_frequency_iteration.R b/R/sampling_frequency_iteration.R new file mode 100644 index 0000000..9893194 --- /dev/null +++ b/R/sampling_frequency_iteration.R @@ -0,0 +1,128 @@ +#' Frequency and Density plot based on its iteration +#' +#' @description +#' The function will return a frequency plot used +#' for categorical data (its values are string, show a bar plot) or +#' numeric data (show a histogram and density plot) by each iteration +#' +#' @template arg_irace_results +#' +#' @param param_name +#' String, name of the parameter to be included (example: param_name = "algorithm") +#' +#' @param numerical_type +#' String, (default "both") Indicates the type of plot to be displayed for numerical +#' parameters. "density" shows a density plot, "frequency" shows a frequency plot and +#' "both" show both frequency and density. +#' +#' @template arg_filename +#' +#' @return Frequency and/or density plot +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' sampling_frequency_iteration(iraceResults, param_name = "alpha") +#' \donttest{ +#' sampling_frequency_iteration(iraceResults, param_name = "alpha", numerical_type="density") +#' } +#' @export +sampling_frequency_iteration <- function(irace_results, param_name, numerical_type="both", + filename = NULL) +{ + # Variable assignment + memo <- vectorPlot <- configuration <- x <- Freq <- iteration_f <- iteration <- density <- NULL + + if (!(numerical_type %in% c("both", "density", "frequency"))){ + stop("Unknown numerical_type, values must be either both, density ot frequency.") + } + + # verify that param_names is other than null + if (!is.null(param_name)) { + param_name <- check_unknown_param_names(param_name, irace_results$parameters$names) + if (length(param_name) != 1L) stop("You can only provide one parameter") + } + + # table is created with all settings + tabla <- irace_results$allConfigurations[,c(".ID.", param_name)] + filtro <- unique(irace_results$experimentLog[, c("iteration", "configuration")]) + + # merge iteration and configuration data + colnames(filtro)[colnames(filtro) == "configuration"] <- ".ID." + tabla <- merge(filtro, tabla, by=".ID.") + + # Column .ID. and .PARENT. are removed + tabla <- tabla[, !(colnames(tabla) %in% c(".ID.")), drop=FALSE] + + # The first column is renamed + colnames(tabla)[colnames(tabla) %in% c(param_name)] <- "x" + niter <- n_distinct(tabla$iteration) + tabla$iteration <- factor(tabla$iteration) + + # If the parameter is of type character a frequency graph is displayed + if (irace_results$parameters$types[param_name] %in% c("c", "o")) { + tabla <- as.data.frame(table(tabla)) + tabla$iteration_f <- factor(tabla$iteration, levels = rev(unique(tabla$iteration))) + + p <- ggplot(tabla, aes(x = x, y = Freq, fill = x)) + + geom_bar(stat = "identity") + + facet_grid(vars(iteration_f), scales = "free") + + scale_fill_manual( + values = viridisLite::viridis(length(unique(tabla$x))), + guide = guide_legend(title = param_name) + ) + + labs(y = "Frequency", x = param_name) + + scale_y_continuous(n.breaks = 3) + + theme(strip.text.y = element_text(angle = 0), + legend.position = "none") + + # The plot is saved in a list + vectorPlot[1] <- list(p) + } else if (irace_results$parameters$types[param_name] %in% c("i", "r", "i,log", "r,log")) { + tabla <- na.omit(tabla) + tabla$iteration_f <- factor(tabla$iteration, levels = rev(unique(tabla$iteration))) + + nbreaks <- pretty(range(tabla$x), + n = nclass.Sturges(tabla$x), + min.n = 1 + ) + + # density and histogram plot + p <- ggplot(as.data.frame(tabla), aes(x = x, fill = iteration)) + if (numerical_type %in% c("both", "frequency")) + p <- p + geom_histogram(aes(y = after_stat(density)), + breaks = nbreaks, + color = "black", fill = "gray" + ) + if (numerical_type %in% c("both", "density")) { + # Note: We can use also density ridges for a different (nicer) looking density plot + # ggplot(tabla, aes(x, y = iteration, height = stat(density))) + + # ggridges::geom_density_ridges(aes(fill = iteration), na.rm = TRUE, stat = "density") + + p <- p + geom_density(alpha = 0.7) + } + p <- p + scale_fill_manual(values = viridisLite::viridis(n_distinct(tabla$iteration))) + + facet_grid(vars(iteration_f), scales = "free") + + labs(x = param_name, y = "Frequency") + + theme( + axis.title.y = element_text(), + axis.title.x = element_text(size = 10), + plot.title = element_text(hjust = 0.5, size = rel(0.8), face = "bold"), + axis.ticks.x = element_blank() + ) + + scale_y_continuous(n.breaks = 3) + + theme(strip.text.y = element_text(angle = 0), + legend.position = "none") + + # The plot is saved in a list + vectorPlot[1] <- list(p) + } + + # If the value in filename is added the pdf file is created + if (!is.null(filename)) { + ggsave(filename, plot = do.call("grid.arrange", c(vectorPlot, ncol = 1))) + # If you do not add the value of filename, the plot is displayed + } else { + p + return(p) + } +} diff --git a/R/sampling_heatmap.R b/R/sampling_heatmap.R new file mode 100644 index 0000000..14bb429 --- /dev/null +++ b/R/sampling_heatmap.R @@ -0,0 +1,298 @@ +# Get the domain intervals to build a heat map plot over +# the parameters values +# +# @description +# +# The `get_domain` function generates a set of size intervals +# based on the domain of a parameter +# +# @param param_name +# String, name of a parameter in the parameters object +# +# @param parameters +# List, Parameter definition object obtained from an irace log file +# +# @param size +# Integer, number of intervals to create in the domain +# +# @return list with domain elements: +# - param_name: parameter name +# - type: parameter type either "n" (numerical) or "c" categorical +# - size: size of the resulting domains +# - param: ticks for the plot +# - names: domain labels for the plot +# - domain: starting points of the intervals +get_domain <- function(param_name, parameters, size) +{ + is_real <- function(param_name, parameters) parameters$types[param_name] %in% c("r", "r,log") + is_integer <- function(param_name, parameters) parameters$types[param_name] %in% c("i", "i,log") + is_cat <- function(param_name, parameters) parameters$types[param_name] %in% c("c", "o") + + old_domain <- parameters$domain[[param_name]] + + if (is_cat(param_name, parameters)) { + size <- length(old_domain) + type <- "c" + param <- 1:length(old_domain) + domain <- old_domain + names <- old_domain + } else { + if (is_real(param_name, parameters)) { + # Set default size + if (size <= 0) + size <- 10 + } else if (size <= 0) { + size <- old_domain[2] - old_domain[1] + size <- min(10L, size) + } else if (size > (old_domain[2] - old_domain[1])) { + cli_alert_info(paste0("{.strong Note}: step size for integer parameters should not exceed", + " the size of their domain. Parameter {.field {param_name}} domain size: {old_domain[2] - old_domain[1]}, provided step size: {size}. Setting step size to: {min(old_domain[2] - old_domain[1], 10L)}\n")) + size <- min(old_domain[2] - old_domain[1], 10L) + } + type <- "n" + param <- seq(1,size) + domain <- seq(old_domain[1], old_domain[2], length.out = size+1) + # Generate domain names + names <- c() + for (i in 1:(size-1)) + names <- c(names, paste0("[", domain[i], ",", domain[i+1], ")")) + names <- c(names, paste0("[",domain[size],",", domain[size+1], "]")) + } + list(param_name = param_name, + type = type, + size = size, + param = param, + names = names, + domain = domain) +} + +# Assigns a vector of parameter values to a domain obtained with the +# get_domain function. +# +# @description +# +# The `which_domain` function generates a vector a transformed parameter values. +# These correspond to values of the intervals defined by the domain. +# +# @param param_values +# Vector, a vector of parameter values +# +# @param domain +# List, the domain definition obtained from the get_domain function +# +# @return a vector a transformed parameter values +which_domain <- function(param_values, domain) +{ + data <- rep(NA, length(param_values)) + + if (domain$type == "c") { + for (i in 1:domain$size) { + sel <- which(!is.na(param_values) & (param_values == domain$domain[i])) + data[sel] <- i + } + } else if (domain$type == "n") { + for (i in 1:(domain$size-1)) { + sel <- which(!is.na(param_values) & (param_values >= domain$domain[i]) & + (param_values < (domain$domain[i+1]))) + data[sel] <- i + } + # last interval should include last value + sel <- which(!is.na(param_values) & (param_values >= domain$domain[domain$size]) & + (param_values <= (domain$domain[domain$size+1]))) + data[sel] <- domain$size + } + + data[is.na(param_values)] <- 0 + return(data) +} + +#' Sampling heat map plot +#' +#' Heatmap that displays the frequency of sampling values of two parameters. +#' +#' @template arg_irace_results +#' @template arg_param_names +#' +#' @param sizes +#' Numeric vector that indicated the number of intervals to be considered for numerical +#' parameters. This argument is positional with respect to param_names. By default, +#' numerical parameters are displayed using 10 intervals. +#' (example sizes = c(0,10)) +#' +#' @param iterations +#' Numeric vector, iteration number that should be included in the plot +#' (example: iterations = c(1,4,5)) +#' +#' @param only_elite +#' logical (default TRUE), only print elite configurations. +#' +#' @template arg_filename +#' +#' @return sampling heat map plot +#' @export +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' sampling_heatmap(iraceResults, param_names=c("beta", "alpha")) +#' sampling_heatmap(iraceResults, param_names=c("beta", "alpha"), iterations = c(3,4)) +#' sampling_heatmap(iraceResults, param_names=c("beta", "alpha"), only_elite = FALSE) +sampling_heatmap <- function(irace_results, param_names, sizes = c(0,0), + iterations = NULL, only_elite = TRUE, + filename = NULL) +{ + # Check parameter values + param_names <- check_unknown_param_names(param_names, irace_results$parameters$names) + if (length(param_names) != 2L) stop("'param_names' must specify two parameters") + + # Check iterations + if (!is.null(iterations)) { + it <- 1:length(irace_results$allElites) + if (any(!(iterations %in% it))) { + stop("The iterations entered are outside the possible range") + } + } else { + iterations <- 1:length(irace_results$allElites) + } + + # Check configurations + if (only_elite) + id_configuration <- unlist(unique(irace_results$allElites[iterations])) + else + id_configuration <- unique(irace_results$experimentLog[irace_results$experimentLog[,"iteration"] %in% iterations, "configuration"]) + + # Select data + config <- irace_results$allConfigurations[irace_results$allConfigurations[, ".ID."] %in% id_configuration, ,drop=FALSE] + config <- config[, colnames(config) %in% param_names] + + domain1 <- get_domain(param_names[1], irace_results$parameters, sizes[1]) + domain2 <- get_domain(param_names[2], irace_results$parameters, sizes[2]) + + params <- data.frame(param1 = which_domain(config[, param_names[1]], domain1), + param2 = which_domain(config[, param_names[2]], domain2), + stringsAsFactors = FALSE) + + domain_names1 <- domain1$names + if (any(params$param1 == 0)) { + domain_names1 <- c("NA", domain1$names) + params$param1 <- factor(params$param1, levels=c(0,domain1$param)) + } else { + params$param1 <- factor(params$param1, levels=domain1$param) + } + + domain_names2 <- domain2$names + if (any(params$param2 == 0)) { + domain_names2 <- c("NA", domain2$names) + params$param2 <- factor(params$param2, levels=c(0, domain2$param)) + } else { + params$param2 <- factor(params$param2, levels=domain2$param) + } + + param1 <- param2 <- n <- NULL # Silence warnings + df <- params %>% count(param1, param2, .drop = FALSE) + + p <- ggplot(df, aes(x = param1, y = param2, fill=n)) + + geom_tile(color = "white", lwd = 0.5, linetype = 1) + + xlab(param_names[1]) +ylab(param_names[2]) + + scale_x_discrete(labels=domain_names1) + + scale_y_discrete(labels=domain_names2) + + theme_bw() + + theme(axis.text.x = element_text(angle = -90, vjust = 0.5, hjust=0), + panel.border=element_blank(), + legend.title = element_blank()) + + # If the value in filename is added the pdf file is created + if (!is.null(filename)) { + ggsave(filename, plot = p) + # If you do not add the value of filename, the plot is displayed + } else { + p + return(p) + } +} + +#' Sampling heat map plot +#' +#' Heatmap that displays the frequency of sampling values of two parameters. +#' +#' @param configurations +#' Data frame, configurations in `irace` format +#' (example: `configurations = iraceResults$allConfigurations`) +#' +#' @param parameters +#' List, parameter object in irace format +#' (example: `configurations = iraceResults$parameters`) +#' +#' @param param_names +#' String vector of size 2, names of the parameters that should be included in the plot +#' (example: param_names = c("beta","alpha")) +#' +#' @param sizes +#' Numeric vector that indicated the number of intervals to be considered for numerical +#' parameters. This argument is positional with respect to param_names. By default, +#' numerical parameters are displayed using 10 intervals. +#' (example sizes = c(0,10)) +#' +#' @template arg_filename +#' +#' @return sampling heat map plot +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' sampling_heatmap2(iraceResults$allConfigurations, iraceResults$parameters, +#' param_names=c("beta", "alpha")) +#' @export +sampling_heatmap2 <- function(configurations, parameters, param_names, + sizes = c(0,0), filename = NULL) +{ + param_names <- check_unknown_param_names(param_names, parameters$names) + if (length(param_names) != 2L) stop("'param_names' must specify two parameters") + + # Select data + config <- configurations[, colnames(configurations) %in% param_names] + + domain1 <- get_domain(param_names[1], parameters, sizes[1]) + domain2 <- get_domain(param_names[2], parameters, sizes[2]) + + params <- data.frame(param1 = which_domain(config[, param_names[1]], domain1), + param2 = which_domain(config[, param_names[2]], domain2), + stringsAsFactors = FALSE) + + domain_names1 <- domain1$names + if (any(params$param1 == 0)) { + domain_names1 <- c("NA", domain1$names) + params$param1 <- factor(params$param1, levels=c(0,domain1$param)) + } else { + params$param1 <- factor(params$param1, levels=domain1$param) + } + + domain_names2 <- domain2$names + if (any(params$param2 == 0)) { + domain_names2 <- c("NA", domain2$names) + params$param2 <- factor(params$param2, levels=c(0, domain2$param)) + } else { + params$param2 <- factor(params$param2, levels=domain2$param) + } + param1 <- param2 <- n <- NULL + df <- params %>% count(param1, param2, .drop = FALSE) + + p <- ggplot(df, aes(x = param1, y = param2, fill=n)) + + geom_tile(color = "white", lwd = 0.5, linetype = 1) + + xlab(param_names[1]) + ylab(param_names[2]) + + scale_x_discrete(labels=domain_names1) + + scale_y_discrete(labels=domain_names2) + + theme_bw() + + theme(axis.text.x = element_text(angle = -90, vjust = 0.5, hjust=0), + panel.border=element_blank(), + legend.title = element_blank()) + + # If the value in filename is added the pdf file is created + if (!is.null(filename)) { + ggsave(filename, plot = p) + # If you do not add the value of filename, the plot is displayed + } else { + p + } + return(p) +} diff --git a/R/sampling_pie.R b/R/sampling_pie.R new file mode 100644 index 0000000..7c84c9b --- /dev/null +++ b/R/sampling_pie.R @@ -0,0 +1,131 @@ +#' Sampling pie plot +#' +#' This function creates a pie plot of the values sampled of a set of selected +#' parameters. Numerical parameters are discretized to maximum `n_bins` +#' intervals. The size of the slices are proportional to the number of +#' configurations that have assigned a parameter value within the rank or the +#' value assigned to that slice. Parameters can be selected by providing their +#' names in the `param_names` argument. +#' +#' @template arg_irace_results +#' +#' @param param_names +#' String vector, A set of parameters to be included (example: param_names = c("algorithm","dlb")) +#' +#' @param n_bins +#' Numeric (default 3), number of intervals to generate for numerical parameters. +#' +#' @template arg_filename +#' +#' @return Sampling pie plot +#' @export +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' sampling_pie(iraceResults) +#' \donttest{ +#' sampling_pie(iraceResults, param_names = c("algorithm", "dlb", "ants")) +#' } +sampling_pie <- function(irace_results, param_names = NULL, n_bins=3, filename = NULL) +{ + param_names <- subset_param_names(param_names, irace_results$parameters$names, irace_results$parameters$isFixed) + + # variable assignment + parents <- labels <- values <- ids <- depend <- NULL + + if (!is.numeric(n_bins) || n_bins < 1) stop("'n_bins' must be numeric > 0") + + # Logical (default FALSE) that allows to verify if the parameters + # are dependent on others, modifying the visualization of the plot + dependency <- FALSE + + # the table is generated only with categorical parameters + data <- irace_results$allConfigurations[,param_names, drop=FALSE] + + # discretize numerical parameters + for (pname in param_names) { + n_bins_param <- n_bins + if (irace_results$parameters$types[pname] %in% c("i", "r", "i,log", "r,log")) { + not.na <- !is.na(data[,pname]) + if(any(!not.na)) { + n_bins_param <- max(1, n_bins_param - 1) + } + # FIXME: this cut might improved by using the median of the data + val <- data[not.na, pname] + #same size bins + ss <- seq(irace_results$parameters$domain[[pname]][1], irace_results$parameters$domain[[pname]][2], length.out=n_bins_param+1) + if (irace_results$parameters$types[pname] %in% c("i", "i,log")) { + ss <- round(ss) + } + bins <- cut(val, breaks=ss, include.lowest=TRUE, ordered_result=TRUE) + # quatile-based bins + #bins <- cut(val, breaks=c(quantile(val, probs=seq(0,1, by=1/n_bins_param))), + # include.lowest = TRUE, ordered_result=TRUE) + bins <- as.character(bins,scientific = F) + data[not.na, pname] <- bins + } + } + + # checks if there is dependency between the parameters + if (dependency == TRUE) { + for (i in 1:length(data)) { + if (!identical(irace_results$parameters$depends[[colnames(data)[i]]], character(0))) { + depend[colnames(data)[i]] <- list(irace_results$parameters$depends[[colnames(data)[i]]]) + } + } + } + + # the table data is generated + for (j in 1:length(data)) { + tabla <- table(data[j], useNA = "ifany") + + for (k in 1:length(tabla)) { + if (k == 1) { + ids <- c(ids, colnames(data)[j]) + + if (!is.null(depend[[colnames(data)[j]]]) && dependency == TRUE) { + parents <- c(parents, depend[[colnames(data)[j]]]) + } else { + parents <- c(parents, "") + } + labels <- c(labels, colnames(data)[j]) + values <- c(values, sum(tabla)) + } + ids <- c(ids, paste(colnames(data)[j], names(tabla)[k], sep = " - ")) + parents <- c(parents, colnames(data)[j]) + labels <- c(labels, names(tabla)[k]) + values <- c(values, tabla[[k]]) + } + } + + # The data table that will be used for the graph is created + data_f <- data.frame(ids, parents, labels, values, stringsAsFactors = FALSE) + data_f[is.na(data_f)] <- "NA" + + # if there is a dependency, the values of the dependent data are added to its parent + if (!is.null(depend) && dependency == TRUE) { + for (i in 1:length(depend)) { + data_f$values[data_f$ids == depend[[i]]] <- data_f$values[data_f$ids == depend[[i]]] + data_f$values[data_f$ids == names(depend[i])] + } + } + + # Create plot + p <- plotly::plot_ly( + type = "sunburst", + ids = data_f$ids, + labels = data_f$labels, + parents = data_f$parents, + values = data_f$values, + branchvalues = "total" + ) + + # If the value in filename is added the pdf file is created + if (!is.null(filename)) { + orca_pdf(filename, p) + } else { + # If you do not add the value of filename, the plot is displayed + p + return(p) + } +} diff --git a/R/scatter_performance.R b/R/scatter_performance.R new file mode 100644 index 0000000..1be6f53 --- /dev/null +++ b/R/scatter_performance.R @@ -0,0 +1,164 @@ +#' Performance Scatter Plot of Two Configurations +#' +#' Create a scatter plot that displays the performance of two configurations on +#' a provided experiment matrix. Each point in the plot represents an instance +#' and the color of the points indicates if one configuration is better than +#' the other. +#' +#' The performance matrix is assumed to be provided in the format of the irace +#' experiment matrix thus, NA values are allowed. Consequently the number of +#' evaluations can differ between configurations due to the elimination process +#' applied by irace. This plot only shows performance data only for instances +#' in which both configurations are executed. +#' +#' @param experiments +#' Experiment matrix obtained from irace training or testing data. Configurations +#' in columns and instances in rows. As in irace, column names (configurations ids) +#' should be characters. Row names will be used as instance names. +#' +#' @param x_id,y_id Configuration IDs for x-axis and y-axis, respectively. +#' +#' @template arg_rpd +#' @template arg_filename +#' @template arg_interactive +#' +#' @param instance_names Either a character vector of instance names in the +#' same order as `rownames(experiments)` or a function that takes +#' `rownames(experiments)` as input. +#' +#' @return [ggplot2::ggplot()] object +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' best_id <- iraceResults$iterationElites[length(iraceResults$iterationElites)] +#' scatter_performance(iraceResults$experiments, x_id = 1, y_id = best_id) +#' @export +scatter_performance <- function(experiments, x_id, y_id, rpd = TRUE, + filename = NULL, interactive = base::interactive(), + instance_names = NULL) +{ + x_id <- as.character(x_id) + y_id <- as.character(y_id) + # Verify that the entered id are within the possible range + if (!(x_id %in% colnames(experiments))) stop("x_id out of range", x_id) + if (!(y_id %in% colnames(experiments))) stop("y_id out of range", y_id) + orig_instance_names <- rownames(experiments) + if (is.null(orig_instance_names)) + orig_instance_names <- as.character(1:nrow(experiments)) + + if (rpd) { + experiments <- calculate_rpd(experiments) + xlab <- paste0("RPD (%) of configuration ", x_id) + ylab <- paste0("RPD (%) of configuration ", y_id) + } else { + xlab <- paste0("Cost of configuration ", x_id) + ylab <- paste0("Cost of configuration ", y_id) + } + x_data <- experiments[, x_id] + y_data <- experiments[, y_id] + instances <- which(!is.na(x_data) & !is.na(y_data)) + # Select only rows that have data for both configurations. + if (length(instances) == 0) stop("No instance has data for both configurations") + x_data <- x_data[instances] + y_data <- y_data[instances] + best <- rep("equal", length(instances)) + best[x_data < y_data] <- "conf1" + best[x_data > y_data] <- "conf2" + + if (is.null(instance_names)) { + instance_names <- orig_instance_names[instances] + } else if (is.function(instance_names)) { + instance_names <- sapply(orig_instance_names[instances], instance_names) + } else { + if (length(instance_names) != nrow(x_data)) + stop("`instance_names` must have the same length as `nrow(experiments)`") + instance_names <- instance_names[instances] + } + # A table is created with only the paired values + data <- data.frame(conf1 = x_data, conf2 = y_data, + instance = instance_names, best = best) + # Variable assignment + conf1 <- conf2 <- instance <- best <- point_text <- NULL + data <- data %>% mutate(point_text = paste0(instance, "\nx: ", conf1, "\ny: ", conf2)) + + # The plot scatter is created and assigned to p + q <- ggplot(data, aes(x = conf1, y = conf2, color = best, text = point_text)) + + geom_abline(intercept = 0, slope = 1, color = "lightgray") + + geom_point(show.legend=FALSE) + + scale_color_manual(values=c(conf1="#6600CC", conf2="#00BFC4", equal="darkgray")) + + theme(legend.position = 'none') + + labs(color = " ", x = xlab, y = ylab) + + if (interactive) { + q <- plotly::ggplotly(p=q, tooltip = "point_text") + return(q) + } + + # If the value in filename is added the pdf file is created + if (!is.null(filename)) { + ggsave(filename, plot = q) + # If you do not add the value of filename, the plot is displayed + } else { + q + } + return(q) +} + +#' @rdname scatter_performance +#' +#' @details +#' [scatter_training()] compares the performance of two configurations on the +#' training instances. The performance data is obtained from the evaluations +#' performed by irace during the execution process. +#' +#' @template arg_irace_results +#' @param ... Other arguments passed to [scatter_performance()] +#' +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", +#' "guide-example.Rdata", mustWork = TRUE)) +#' scatter_training(iraceResults, x_id = 806, y_id = 809) +#' \donttest{ +#' scatter_training(iraceResults, x_id = 806, y_id = 809, rpd = FALSE) +#' } +#' @export +scatter_training <- function(irace_results, ...) +{ + experiments <- irace_results$experiments + instances.ids <- irace_results$state$.irace$instancesList[1:nrow(experiments), "instance"] + rownames(experiments) <- irace_results$scenario$instances[instances.ids] + scatter_performance(experiments, ...) +} + +#' @rdname scatter_performance +#' +#' @details +#' [scatter_test()] compares the performance of two configurations on the test +#' instances. The performance data is obtained from the test evaluations +#' performed by irace. Note that testing is not enabled by default in irace +#' and should be enabled in the scenario setup. Moreover, configuration ids +#' provided in `x_id` and `y_id` should belong to elite configuration set +#' evaluated in the test (see the irace package user guide for more details). +#' +#' @template arg_irace_results +#' @param ... Other arguments passed to [scatter_performance()]. +#' +#' +#' @examples +#' iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", +#' "guide-example.Rdata", mustWork = TRUE)) +#' scatter_test(iraceResults, x_id = 92, y_id = 119) +#' \donttest{ +#' scatter_test(iraceResults, x_id = 92, y_id = 119, rpd=FALSE) +#' } +#' @export +scatter_test <- function(irace_results, ...) +{ + if (!has_testing_data(irace_results)) + stop("irace_results does not contain the testing data") + experiments <- irace_results$testing$experiments + rownames(experiments) <- irace_results$scenario$testInstances + scatter_performance(experiments, ...) +} diff --git a/R/summarise_by_instance.R b/R/summarise_by_instance.R new file mode 100644 index 0000000..91c6058 --- /dev/null +++ b/R/summarise_by_instance.R @@ -0,0 +1,46 @@ +#' Summarise by instance +#' +#' @template arg_irace_results +#' +#' @return tibble +#' +#' @examples +#' irace_result <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' summarise_by_instance(irace_result) +#' @export +summarise_by_instance <- function(irace_results) +{ + instances <- irace_results$scenario$instances[irace_results$state$.irace$instancesList[1:nrow(irace_results$experiments), "instance"]] + + # FIXME: There must be a faster/easier way to do this. + freq_count <- function(x) { + x <- table(x) + setNames(as.vector(x), names(x)) + } + exp_by_instance <- freq_count(instances[irace_results$experimentLog[,"instance"]]) + seeds_by_instance <- freq_count(instances) + + ID <- value <- instance <- NULL # Silence warnings + + byinstance <- as_tibble(irace_results$experiments) %>% + mutate(instance = instances, .before=1) %>% + tidyr::pivot_longer(!c("instance"), names_to="ID") %>% + group_by(instance) %>% tidyr::drop_na() %>% + summarise(mean = mean(value), sd = sd(value), median = median(value), + min = min(value), max = max(value), + best_id = ID[which.min(value)]) %>% + # Sort by the original order in instancesList + arrange(factor(instance, levels = unique(instances))) %>% + mutate(seeds = seeds_by_instance[as.character(instance)], .after="instance") %>% + mutate(experiments = exp_by_instance[as.character(instance)], .after="instance") + + if (is.character(byinstance$instance)) { + # FIXME: This should be smarter and try harder to detect if it is a path. + basename_inst <- basename(byinstance$instance) + if (length(basename_inst) == length(byinstance$instance)) { + byinstance <- byinstance %>% mutate(instance = basename_inst) + } + } + byinstance +} diff --git a/R/summarise_by_iteration.R b/R/summarise_by_iteration.R new file mode 100644 index 0000000..19b3328 --- /dev/null +++ b/R/summarise_by_iteration.R @@ -0,0 +1,21 @@ +#' Summarise by iteration +#' +#' @template arg_irace_results +#' +#' @return tibble +#' +#' @examples +#' irace_result <- read_logfile(system.file(package="irace", "exdata", +#' "irace-acotsp.Rdata", mustWork = TRUE)) +#' summarise_by_iteration(irace_result) +#' @export +summarise_by_iteration <- function(irace_results) +{ + iteration <- configuration <- instance <- NULL # Silence warnings + as_tibble(irace_results$experimentLog) %>% + group_by(iteration) %>% + summarise(configurations = n_distinct(configuration), + instances = n_distinct(instance), experiments=dplyr::n()) %>% + mutate(elites = sapply(irace_results$allElites, length), + best_id = irace_results$iterationElites) +} diff --git a/README.md b/README.md new file mode 100644 index 0000000..2d0ebeb --- /dev/null +++ b/README.md @@ -0,0 +1,150 @@ + +# The iraceplot package + + +[![CRAN +Status](https://www.r-pkg.org/badges/version-last-release/iraceplot)](https://cran.r-project.org/package=iraceplot) +[![R-CMD-check](https://github.com/auto-optimization/iraceplot/workflows/R-CMD-check/badge.svg)](https://github.com/auto-optimization/iraceplot/actions) +[![Codecov test coverage](https://codecov.io/gh/auto-optimization/iraceplot/branch/master/graph/badge.svg)](https://app.codecov.io/gh/auto-optimization/iraceplot?branch=master) + + +**Maintainers:** Leslie Pérez Cáceres, [Manuel López-Ibáñez](https://lopez-ibanez.eu) + +**Creators:** Pablo Oñate Marín, Leslie Pérez Cáceres, [Manuel López-Ibáñez](https://lopez-ibanez.eu) + +**Contact:** + +--------------------------------------- + +Introduction +============ + +The iraceplot package provides a set of functions that create different plots to visualize +the data generated by the irace configurator (https://cran.r-project.org/package=irace). + +This package provides visualizations of: + +- Configurations generated by irace (elite and non-elite) +- Elite configurations performance (training and testing) +- Parameter values and sampling distributions +- Configuration process overview + +Also, the package allows creating a small HTML report summarizing relevant information obtained during the execution of irace. + +The aim of this package it to provide support for the analysis of the best parameter settings found, the assessment of the parameter space explored by irace and, the overall performance of the configuration process. Such analyses might lead to insights about the role of algorithmic components their interactions, or to improve the configuration process itself. + +**Keywords:** automatic configuration, offline tuning, parameter tuning, parameter visualization, irace. + + +Requisites +-------------- + + * R () is required for running irace and to use iraceplot, but + you don't need to know the R language to use any of them. + +User guide +---------- + +A [user guide](https://auto-optimization.github.io/iraceplot/articles/user_guide/guide.html) +comes with the package. The following is a quick-start guide. The user guide gives more detailed +instructions. + + +Installing R +============ + +The official instructions are available at +. We give below +a quick R installation guide that will work in most cases. + +GNU/Linux +--------- + +You should install R from your package manager. On a Debian/Ubuntu system it +will be something like: + + $ sudo apt-get install r-base + +Once R is installed, you can launch R from the Terminal and from the R +prompt install the iraceplot package. See instructions below. + + +OS X +---- + +You can install R directly from a CRAN mirror +(). + +Alternatively, if you use homebrew, you can just brew the R formula +from the science tap (unfortunately it does not come already bottled +so you need to have Xcode installed to compile it): + +```bash +$ brew tap homebrew/science +$ brew install r +``` + +Once R is installed, you can launch R from the Terminal (or from your +Applications), and from the R prompt install the iraceplot package. See +instructions below. + +Windows +------- + +You can install R from a CRAN mirror +(). Once R is installed, you can +launch the R console and install the iraceplot package from it. See instructions +below. + + + +Installing the iraceplot package +============================ + +1. Install within R (automatic download): +For installing iraceplot you need to install the devtools package: + +``` r +install.packages("devtools") +``` +Currently, the iraceplot package can be installed from Gtihub: + +``` r +devtools::install_github("auto-optimization/iraceplot") + +``` +2. Manually + [download the package from CRAN](https://cran.r-project.org/package=iraceplot/) + and invoke at the command-line: +```bash +$ R CMD INSTALL +``` + where `` is one of the three versions available: `.tar.gz` + (Unix/BSD/GNU/Linux), `.tgz` (MacOS X), or `.zip` (Windows). + + +How To Use +=========================== + +Load the package in the R console: + +``` r +library(iraceplot) +``` + +Load the log file generated by irace (`.Rdata`) for example, replace the path to your `irace.Rdata` file in the following line: + +``` r +iraceResults <- read_logfile("~/path/example/irace.Rdata") +``` + +For example you can plot the training performance with: +```r +boxplot_training(iraceResults) +``` + +Check the [documentation](https://auto-optimization.github.io/iraceplot/reference/index.html) and the [User Guide](https://auto-optimization.github.io/iraceplot/articles/user_guide/guide.html) to find the plot most suited to your needs or generate a general-purpose report with: + +``` r +report(iraceResults, "path/to/my_report") +``` diff --git a/build/vignette.rds b/build/vignette.rds new file mode 100644 index 0000000..421e65f Binary files /dev/null and b/build/vignette.rds differ diff --git a/inst/doc/iraceplot_package.R b/inst/doc/iraceplot_package.R new file mode 100644 index 0000000..702e888 --- /dev/null +++ b/inst/doc/iraceplot_package.R @@ -0,0 +1,78 @@ +## ----echo=FALSE, prompt=FALSE, message=FALSE, warning=FALSE------------------- + +library(iraceplot, quietly = TRUE) +library(plotly) +iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", "guide-example.Rdata", mustWork = TRUE)) + +## ----eval=FALSE--------------------------------------------------------------- +# library(iraceplot) + +## ----eval=FALSE--------------------------------------------------------------- +# iraceResults <- irace::read_logfile("~/path/to/irace.Rdata") + +## ----fig.align="center", fig.width=7, eval=FALSE------------------------------ +# boxplot_test(iraceResults, type="best") + +## ----fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE---- +# parallel_coord(iraceResults) + +## ----fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE---- +# all_elite <- iraceResults$allConfigurations[unlist(iraceResults$allElites),] +# parallel_coord2(all_elite, iraceResults$parameters) + +## ----fig.align="center", fig.width= 7, fig.height=6, message=FALSE, prompt=FALSE, eval=FALSE---- +# parallel_cat(irace_results = iraceResults, +# param_names=c("algorithm", "localsearch", "dlb", "nnls")) + +## ----fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE---- +# sampling_pie(irace_results = iraceResults, param_names=c("algorithm", "localsearch", "alpha", "beta", "rho")) + +## ----fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE---- +# sampling_frequency(iraceResults, param_names = c("beta")) + +## ----fig.align="center", fig.width= 7, fig.height=7, message=FALSE, prompt=FALSE, eval=FALSE---- +# sampling_frequency(iraceResults$allConfigurations, iraceResults$parameters, param_names = c("alpha")) + +## ----fig.align="center", fig.width= 7, fig.height=6, eval=FALSE--------------- +# sampling_frequency_iteration(iraceResults, param_name = "beta") + +## ----fig.align="center", fig.width= 7, fig.height=6, eval=FALSE--------------- +# sampling_heatmap(iraceResults, param_names = c("beta","alpha")) + +## ----fig.align="center", fig.width= 7, fig.height=6, eval=FALSE--------------- +# sampling_heatmap2(iraceResults$allConfigurations, iraceResults$parameters, +# param_names = c("localsearch","nnls"), sizes=c(0,5)) + +## ----fig.align="center", fig.width= 7, eval=FALSE----------------------------- +# sampling_distance(iraceResults, t=0.05) + +## ----fig.align="center", fig.width=7, eval=FALSE------------------------------ +# boxplot_test(iraceResults, type="best") + +## ----fig.align="center", fig.width=7, eval=FALSE------------------------------ +# scatter_test(iraceResults, x_id = 808, y_id = 809, interactive=TRUE) + +## ----fig.align="center", fig.width=7, eval=FALSE------------------------------ +# boxplot_training(iraceResults) + +## ----fig.align="center", fig.width=7, eval=FALSE------------------------------ +# scatter_training(iraceResults, x_id = 808, y_id = 809, interactive=TRUE) + +## ----fig.align="center", fig.width=7, eval=FALSE------------------------------ +# boxplot_performance(iraceResults$experiments, allElites=list(c(803,808), c(809,800)), first_is_best = TRUE) + +## ----fig.align="center", fig.width=7, eval=FALSE------------------------------ +# scatter_performance(iraceResults$experiments, x_id = 83, y_id = 809, interactive=TRUE) + +## ----fig.align="center", fig.width=7, eval=FALSE------------------------------ +# plot_experiments_matrix(iraceResults, interactive = TRUE) + +## ----fig.align="center", fig.width=7, eval=FALSE------------------------------ +# plot_model(iraceResults, param_name="algorithm") + +## ----fig.align="center", fig.width=7, fig.height=6, message=FALSE, prompt=FALSE, results='hide', eval=FALSE---- +# plot_model(iraceResults, param_name="alpha") + +## ----fig.align="center", eval=FALSE------------------------------------------- +# report(iraceResults, filename="report") + diff --git a/inst/doc/iraceplot_package.Rmd b/inst/doc/iraceplot_package.Rmd new file mode 100644 index 0000000..bc7db7c --- /dev/null +++ b/inst/doc/iraceplot_package.Rmd @@ -0,0 +1,303 @@ +--- +title: The iraceplot package +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{The iraceplot package} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r echo=FALSE, prompt=FALSE, message=FALSE, warning=FALSE} + +library(iraceplot, quietly = TRUE) +library(plotly) +iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", "guide-example.Rdata", mustWork = TRUE)) +``` + + +## Introduction +The iraceplot package provides a set of functions to create plots to visualize +the configuration data generated by the configuration process implemented in +the irace package. + +The configuration process performed by irace will show ar the end of the execution one +or more configurations that are the best performing configurations found. This package +provides a set of functions that allow to further assess the performance of these +configurations and provides support to obtain insights about the details of the configuration +process. + +For more details about these functions, please check the [user guide](https://auto-optimization.github.io/iraceplot/) +of the package and the documentation of the functions implemented in the package. + + +# Installation +## Install iraceplot + +You can install the package directly from CRAN +``` +install.packages("iraceplot") +``` + +or you can install the last development version from Github: + +``` +install.packages("devtools") +devtools::install_github("auto-optimization/iraceplot/") +``` + +## How to use + +This is a basic example that shows how to use iraceplot: + +```{r eval=FALSE} +library(iraceplot) +``` + +To use the functions its required to have the log file generated by irace (logFile +option in the irace package), commonly saved in the in the directory in which irace +was executed with the name `irace.Rdata`. To load the irace log data, replace the +path of the file with yours: + +```{r eval=FALSE} +iraceResults <- irace::read_logfile("~/path/to/irace.Rdata") +``` + +This file contains the `iraceResults` variable which contains the irace log. For +more details about this variable, please go to the documentation of the irace package. +Now you can use the the functions of the package, for example: + +```{r fig.align="center", fig.width=7, eval=FALSE} +boxplot_test(iraceResults, type="best") +``` + +## Executing irace + +To use the methods provided by this package you must have an irace data object, +this object is saved as an Rdata file (irace.Rdata by default) after each irace +execution. + +During the configuration procedure irace evaluates several candidate configurations +(parameter settings) on different training insrances, creating an algorithm performance +data set we call the *training data set*. This information is thus, the data +that irace had access to when configuring the algorithm. + +You can also enable the test evaluation option in irace, in which a set of elite +configurations will be evaluated on a set of test instances after the execution of +irace is finished. Nota that this option is not enabled by default and you +must provide the test instances in order to enable it. The performance obtained +in this evalaution is called the *test data set*. This evaluation helps assess +the results of the configuration in a more "real" setup. For example, we can assess +if the configuration process incurred in overtuning or if a type of instance +was underrepresented in the training set. We note that irace allows to perform +the test evaluations to the final elite configurations and to the elite configurations +of each iterations. For information about the irace setup we refer you to the irace +package user guide. + +Note: Before executing irace, consider setting the test evaluation option of irace. + +Once irace is executed, you can load the irace log in the R console as previously +shown. + +# Function overview + +## Visualizing configurations + +The irace plot package provides several functions that display information about +configurations. For visualizing individual configurations the `parallel_coord` +shows each configuration as a line. + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE} +parallel_coord(iraceResults) +``` + +The `parallel_coord2` function generates a similar parallel coordinates plot when +provided with an arbitrary set of configurations without the irace execution context. +For example, to display all elite configurations: + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE} +all_elite <- iraceResults$allConfigurations[unlist(iraceResults$allElites),] +parallel_coord2(all_elite, iraceResults$parameters) +``` + +A similar display can be obtained using the `parallel_cat` function. For example to +visualize the configurations of a selected set of parameters: + +```{r fig.align="center", fig.width= 7, fig.height=6, message=FALSE, prompt=FALSE, eval=FALSE} +parallel_cat(irace_results = iraceResults, + param_names=c("algorithm", "localsearch", "dlb", "nnls")) +``` + +The `sampling_pie` function creates a plot that displays the values of all configurations +sampling during the configuration process. The size of each parameter +value in the plot is dependent of the number of configurations having that value in the +configurations. + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE} +sampling_pie(irace_results = iraceResults, param_names=c("algorithm", "localsearch", "alpha", "beta", "rho")) +``` + +Note that for some of the previous plots, numerical parameters domains are discretized +to be showm in the plot. Check the documentation of the functions and the [user guide](https://auto-optimization.github.io/iraceplot/) +to adjust this setting. + +## Visualising sampled values and frequencies + +The package provides several functions to visualize values sampled during the +configuration procedure and their distributions. These plots can help identifying +the areas in the parameter space where irace detected a high performance. + +A general overview of the sampled parameters values can be obtained with the `sampling_frequency` +function which generates frequency and density plots for the sampled values: + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE} + sampling_frequency(iraceResults, param_names = c("beta")) +``` + + +If you would like to visualize the distribution of a particular set of configurations, +you can pass directly a set of configurations +and a parameters object in the irace format to the `sampling_frequency` function: +```{r fig.align="center", fig.width= 7, fig.height=7, message=FALSE, prompt=FALSE, eval=FALSE} + sampling_frequency(iraceResults$allConfigurations, iraceResults$parameters, param_names = c("alpha")) +``` + +A detailed plot showing the sampling by iteration can be obtained with the +`sampling_frequency_iteration` function. This plot shows the convergence of the +configuration process reflected in the sampled parameter values. + +```{r fig.align="center", fig.width= 7, fig.height=6, eval=FALSE} +sampling_frequency_iteration(iraceResults, param_name = "beta") +``` + +To visualize the joint sampling frequency of two parameters you can use the `sampling_heatmap` +function. + +```{r fig.align="center", fig.width= 7, fig.height=6, eval=FALSE} +sampling_heatmap(iraceResults, param_names = c("beta","alpha")) +``` + +The configurations can be provided directly to the `sampling_heatmap2` function. In both +functions, the parameter sizes can be used to adjust the number of intervals to be +displayed: + +```{r fig.align="center", fig.width= 7, fig.height=6, eval=FALSE} +sampling_heatmap2(iraceResults$allConfigurations, iraceResults$parameters, + param_names = c("localsearch","nnls"), sizes=c(0,5)) +``` + +For more details of these functions, check the documentation of the functions and the [user guide](https://auto-optimization.github.io/iraceplot/). + +## Visualizing sampling distance + +You may like to have a general overview of the distance of the configurations sampled +across the configuration process. This can allow you to assess the convergence of the +configuration process. Use the `sampling_distance` function to display the mean distance +of the configurations across the iterations of the configuration process: + +```{r fig.align="center", fig.width= 7, eval=FALSE} +sampling_distance(iraceResults, t=0.05) +``` + +Numerical parameter distance can be adjusted with a treshold (`t=0.05`), +check the documentation of the function and the [user guide](https://auto-optimization.github.io/iraceplot/) +for details. + + +## Visualizing test performance + +The test performance of the best final configurations can be visualized using the `boxplot_test` +function. + +```{r fig.align="center", fig.width=7, eval=FALSE} +boxplot_test(iraceResults, type="best") +``` + +Note that the irace execution log includes test data (test is not enabled by default), check the +irace package [user guide](https://CRAN.R-project.org/package=irace/vignettes/irace-package.pdf) +for details on how to use the test feature in irace. + +To investigate the difference in the performance of two configurations the `scatter_test` function displays +the performance of both configurations paired by instance (each point represents an instance): + +```{r fig.align="center", fig.width=7, eval=FALSE} +scatter_test(iraceResults, x_id = 808, y_id = 809, interactive=TRUE) +``` + + +## Visualizing training performance + +Visualizing training performance might help to obtain insights about the reasoning +that followed irace when searching the parameter space, and thus it can be used +to understand why irace considers certain configurations as high or low performing. + +To visualize the performance of the final elites observed by irace, the `boxplot_training` +function plots the experiments performed on these configurations. Note that this data corresponds +to the performance generated during the configuration process thus, the number of instances on +which the configurations were evaluated might vary between elite configurations. + +```{r fig.align="center", fig.width=7, eval=FALSE} +boxplot_training(iraceResults) +``` + +To observe the difference in the performance of two configurations you can also generate +a scatter plot using the `scatter_training` function: + +```{r fig.align="center", fig.width=7, eval=FALSE} +scatter_training(iraceResults, x_id = 808, y_id = 809, interactive=TRUE) +``` + +## Visualizing performance (general purpose) +To plot the performance of a selected set of configurations in an experiment matrix, +you can use the `boxplot_performance` function. The configurations can be selected +in a vector or a list (allElites): + +```{r fig.align="center", fig.width=7, eval=FALSE} +boxplot_performance(iraceResults$experiments, allElites=list(c(803,808), c(809,800)), first_is_best = TRUE) +``` + +In the same way, you can use the `scatter_perfomance` function to plot the difference +between configurations: + +```{r fig.align="center", fig.width=7, eval=FALSE} +scatter_performance(iraceResults$experiments, x_id = 83, y_id = 809, interactive=TRUE) +``` + +Note that there these functions can be adjusted to display differently the configurations (i.e. include or not instancs). +Check the package [user guide](https://auto-optimization.github.io/iraceplot/) and the +documentation of each function for details. + +## Visualizing the configuration process + +In some cases, it might be interesting have a general visualization for the configuration process. +This can be obtained with the `plot_experiments_matrix` function: + +```{r fig.align="center", fig.width=7, eval=FALSE} +plot_experiments_matrix(iraceResults, interactive = TRUE) +``` + +The sampling distributions used by irace during the configuration process can be displayed using the +`plot_model` function. For categorical parameters, this function displays the sampling probabilities associated to each parameter value by iteration (x axis top) in each elite configuration model (bars): + +```{r fig.align="center", fig.width=7, eval=FALSE} +plot_model(iraceResults, param_name="algorithm") +``` + +For numerical parameters, this function shows the sampling distributions associated to each parameter. +These plots display the the density function of the truncated normal distribution associated to the +models of each elite configuration in each instance: + + +```{r fig.align="center", fig.width=7, fig.height=6, message=FALSE, prompt=FALSE, results='hide', eval=FALSE} +plot_model(iraceResults, param_name="alpha") +``` + +# Report + +The `report` function generates an HTML report with a summary of the configuration +process executed by irace. The function will create an HTML file in the path +provided in the `filename` argument and appending the `".html"` extension to it. + + +```{r fig.align="center", eval=FALSE} +report(iraceResults, filename="report") +``` diff --git a/inst/doc/iraceplot_package.html b/inst/doc/iraceplot_package.html new file mode 100644 index 0000000..830a9f3 --- /dev/null +++ b/inst/doc/iraceplot_package.html @@ -0,0 +1,450 @@ + + + + + + + + + + + + + + +The iraceplot package + + + + + + + + + + + + + + + + + + + + + + + + + +

The iraceplot package

+ + + +
+

Introduction

+

The iraceplot package provides a set of functions to create plots to visualize the configuration data generated by the configuration process implemented in the irace package.

+

The configuration process performed by irace will show ar the end of the execution one or more configurations that are the best performing configurations found. This package provides a set of functions that allow to further assess the performance of these configurations and provides support to obtain insights about the details of the configuration process.

+

For more details about these functions, please check the user guide of the package and the documentation of the functions implemented in the package.

+
+
+

Installation

+
+

Install iraceplot

+

You can install the package directly from CRAN

+
install.packages("iraceplot")
+

or you can install the last development version from Github:

+
install.packages("devtools")
+devtools::install_github("auto-optimization/iraceplot/")
+
+
+

How to use

+

This is a basic example that shows how to use iraceplot:

+ +

To use the functions its required to have the log file generated by irace (logFile option in the irace package), commonly saved in the in the directory in which irace was executed with the name irace.Rdata. To load the irace log data, replace the path of the file with yours:

+ +

This file contains the iraceResults variable which contains the irace log. For more details about this variable, please go to the documentation of the irace package. Now you can use the the functions of the package, for example:

+ +
+
+

Executing irace

+

To use the methods provided by this package you must have an irace data object, this object is saved as an Rdata file (irace.Rdata by default) after each irace execution.

+

During the configuration procedure irace evaluates several candidate configurations (parameter settings) on different training insrances, creating an algorithm performance data set we call the training data set. This information is thus, the data that irace had access to when configuring the algorithm.

+

You can also enable the test evaluation option in irace, in which a set of elite configurations will be evaluated on a set of test instances after the execution of irace is finished. Nota that this option is not enabled by default and you must provide the test instances in order to enable it. The performance obtained in this evalaution is called the test data set. This evaluation helps assess the results of the configuration in a more “real” setup. For example, we can assess if the configuration process incurred in overtuning or if a type of instance was underrepresented in the training set. We note that irace allows to perform the test evaluations to the final elite configurations and to the elite configurations of each iterations. For information about the irace setup we refer you to the irace package user guide.

+

Note: Before executing irace, consider setting the test evaluation option of irace.

+

Once irace is executed, you can load the irace log in the R console as previously shown.

+
+
+
+

Function overview

+
+

Visualizing configurations

+

The irace plot package provides several functions that display information about configurations. For visualizing individual configurations the parallel_coord shows each configuration as a line.

+ +

The parallel_coord2 function generates a similar parallel coordinates plot when provided with an arbitrary set of configurations without the irace execution context. For example, to display all elite configurations:

+ +

A similar display can be obtained using the parallel_cat function. For example to visualize the configurations of a selected set of parameters:

+ +

The sampling_pie function creates a plot that displays the values of all configurations sampling during the configuration process. The size of each parameter value in the plot is dependent of the number of configurations having that value in the configurations.

+ +

Note that for some of the previous plots, numerical parameters domains are discretized to be showm in the plot. Check the documentation of the functions and the user guide to adjust this setting.

+
+
+

Visualising sampled values and frequencies

+

The package provides several functions to visualize values sampled during the configuration procedure and their distributions. These plots can help identifying the areas in the parameter space where irace detected a high performance.

+

A general overview of the sampled parameters values can be obtained with the sampling_frequency function which generates frequency and density plots for the sampled values:

+ +

If you would like to visualize the distribution of a particular set of configurations, you can pass directly a set of configurations and a parameters object in the irace format to the sampling_frequency function:

+ +

A detailed plot showing the sampling by iteration can be obtained with the sampling_frequency_iteration function. This plot shows the convergence of the configuration process reflected in the sampled parameter values.

+ +

To visualize the joint sampling frequency of two parameters you can use the sampling_heatmap function.

+ +

The configurations can be provided directly to the sampling_heatmap2 function. In both functions, the parameter sizes can be used to adjust the number of intervals to be displayed:

+ +

For more details of these functions, check the documentation of the functions and the user guide.

+
+
+

Visualizing sampling distance

+

You may like to have a general overview of the distance of the configurations sampled across the configuration process. This can allow you to assess the convergence of the configuration process. Use the sampling_distance function to display the mean distance of the configurations across the iterations of the configuration process:

+ +

Numerical parameter distance can be adjusted with a treshold (t=0.05), check the documentation of the function and the user guide for details.

+
+
+

Visualizing test performance

+

The test performance of the best final configurations can be visualized using the boxplot_test function.

+ +

Note that the irace execution log includes test data (test is not enabled by default), check the irace package user guide for details on how to use the test feature in irace.

+

To investigate the difference in the performance of two configurations the scatter_test function displays the performance of both configurations paired by instance (each point represents an instance):

+ +
+
+

Visualizing training performance

+

Visualizing training performance might help to obtain insights about the reasoning that followed irace when searching the parameter space, and thus it can be used to understand why irace considers certain configurations as high or low performing.

+

To visualize the performance of the final elites observed by irace, the boxplot_training function plots the experiments performed on these configurations. Note that this data corresponds to the performance generated during the configuration process thus, the number of instances on which the configurations were evaluated might vary between elite configurations.

+ +

To observe the difference in the performance of two configurations you can also generate a scatter plot using the scatter_training function:

+ +
+
+

Visualizing performance (general purpose)

+

To plot the performance of a selected set of configurations in an experiment matrix, you can use the boxplot_performance function. The configurations can be selected in a vector or a list (allElites):

+ +

In the same way, you can use the scatter_perfomance function to plot the difference between configurations:

+ +

Note that there these functions can be adjusted to display differently the configurations (i.e. include or not instancs). Check the package user guide and the documentation of each function for details.

+
+
+

Visualizing the configuration process

+

In some cases, it might be interesting have a general visualization for the configuration process. This can be obtained with the plot_experiments_matrix function:

+ +

The sampling distributions used by irace during the configuration process can be displayed using the plot_model function. For categorical parameters, this function displays the sampling probabilities associated to each parameter value by iteration (x axis top) in each elite configuration model (bars):

+ +

For numerical parameters, this function shows the sampling distributions associated to each parameter. These plots display the the density function of the truncated normal distribution associated to the models of each elite configuration in each instance:

+ +
+
+
+

Report

+

The report function generates an HTML report with a summary of the configuration process executed by irace. The function will create an HTML file in the path provided in the filename argument and appending the ".html" extension to it.

+ +
+ + + + + + + + + + + diff --git a/inst/exdata/guide-example.Rdata b/inst/exdata/guide-example.Rdata new file mode 100644 index 0000000..ea23e70 Binary files /dev/null and b/inst/exdata/guide-example.Rdata differ diff --git a/inst/template/report_html.Rmd b/inst/template/report_html.Rmd new file mode 100644 index 0000000..35d20fc --- /dev/null +++ b/inst/template/report_html.Rmd @@ -0,0 +1,251 @@ +--- +title: "Report generated by the iraceplot package" +output: + html_document: + toc: true + highlight: pygments + toc_float: true +--- +```{css, echo=FALSE} +pre { + white-space:pre !important; + overflow-x: auto; + max-width: 100%; +} +pre code { + white-space:pre !important; + overflow-x: auto; +} +.overflow { + white-space:pre !important; + overflow-x: auto; + max-width: 100%; +} +``` + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +```{r setup, include= FALSE} +library(iraceplot) +library(tidyr, quietly = TRUE) +library(dplyr, quietly = TRUE) +sections <- get0("sections", ifnotfound=list(experiments_matrix=TRUE)) +if (!exists("interactive_plots")) interactive_plots <- base::interactive() +# FIXME: Move show_table to the package so people can use it to create their +# own reports. +show_table <- function(x, row.names = FALSE, search = FALSE, lengthMenu = NULL, + searchHighlight = TRUE, dom='lrtip', colorbar = NULL, style = list(), ...) { + + if (!interactive_plots) return(knitr::kable(x, row.names = row.names)) + + if (search) dom <- sub("r", "fr", dom) + if (is.null(lengthMenu)) { + pageLength <- NULL + } else { + pageLength <- lengthMenu[1] + if (nrow(x) <= pageLength) dom <- gsub("[lip]", "", dom) # autoHideNavigation + } + columnDefs <- list(list(className = 'dt-right', targets = "_all")) + tab <- DT::datatable(x, rownames = row.names, + class = 'compact row-border hover', + autoHideNavigation = TRUE, + options = list( + searchHighlight = search, + scrollX = TRUE, + pageLength=pageLength, + lengthMenu = lengthMenu, + dom = dom, + columnDefs = columnDefs)) %>% DT::formatStyle(colnames(x), fontSize='85%') + for (col in colorbar) { + if(!(col %in% colnames(x))) + stop("Column ", col, " not found in table: ", paste0(colnames(x), collapse=", ")) + if (all(is.na(x[, col]))) next # Don't add colorbar if all are NA. + tab <- DT::formatStyle(tab, columns = col, + background = DT::styleColorBar(range(x[, col], na.rm=TRUE), 'lightblue'), + backgroundSize = '98% 88%', + backgroundRepeat = 'no-repeat', + backgroundPosition = 'center') + } + for (s in style) { + tab <- do.call(DT::formatStyle, c(table=tab, s)) + } + tab +} + +niterations <- length(irace_results$allElites) +ninstances <- nrow(irace_results$experiments) +byiterations <- summarise_by_iteration(irace_results) +byinstance <- summarise_by_instance(irace_results) +best_elites <- as.character(irace_results$allElites[[length(irace_results$allElites)]]) +``` + +# Scenario +
+Click to show +```{r, echo=TRUE,eval=FALSE,code=capture.output(irace::printScenario(irace_results$scenario))} +``` +
+ + +# Parameters +
+Click to show +
+```{r printParameters, echo=FALSE, results='asis'}
+irace::printParameters(irace_results$parameters)
+```
+
+
+ + +# General information + +- irace version: `r {irace_results$irace.version}` +- Iterations: `r {niterations}` +- Configurations: `r {nrow(irace_results$allConfigurations)}` +- Instances: `r {ninstances}` +- Experiments: `r {nrow(irace_results$experimentLog)}` +- Elite configurations: `r {length(irace_results$allElites[[niterations]])}` +- Soft restarts: `r {if(any(irace_results$softRestart)) paste0(sum(irace_results$softRestart), " / ", length(irace_results$softRestart)) else 0L}` +- Rejected configurations: `r {length(irace_results$state$rejectedIDs)}` +- Running time (seconds): `r {if(is.null(irace_results$state$elapsed)) "Unknown" else paste0(round(irace_results$state$elapsed["user"],0), " (CPU-user); ", round(irace_results$state$elapsed["system"],0), " (CPU-sys); ", round(irace_results$state$elapsed["wallclock"],0), " (Wall-clock)")}` +- Termination reason: `r {if(is.null(irace_results$state$completed)) "Missing" else irace_results$state$completed}` + +## By iteration + +```{r table-by-iterations,fig.align="center", echo=FALSE} +show_table(byiterations, lengthMenu = c(20, 50, 100), + colorbar = c("configurations", "instances", "experiments", "elites")) +``` + +## By instance + +```{r table-by-instance,fig.align="center", echo=FALSE} +show_table(byinstance, lengthMenu = c(20, 50, 100), search = TRUE, + colorbar = c("experiments", "mean", "sd", "median", "min", "max")) +#columnDefs = list(list(targets = -1, searchable = FALSE)) +``` + + + +# Elite configurations + +The final best configurations found by irace are: + +```{r table-elite,fig.align="center", echo=FALSE} +show_table(irace_results$allConfigurations[best_elites, , drop=FALSE], + lengthMenu = c(10,20)) +``` + +## Parallel coordinates visualization (only elites) + +```{r parcoords-elites,fig.align="center", out.width="100%", echo=FALSE} +parallel_coord(irace_results) +``` + +# Sampling model + +The frequency of the parameter values sampled by irace: + +```{r fig-sampling-model,fig.align="center", out.width="100%", echo=FALSE} +sampling_frequency(irace_results) +``` + + +# Testing performance + +- Number of elites tested: `r {irace_results$scenario$testNbElites}` +- Iteration elites tested: `r {irace_results$scenario$testIterationElites}` + +## Performance of the elite configurations on the test instances + +```{r,fig.align="center", echo=FALSE} +if (has_testing_data(irace_results)) { +# FIXME: Move this table generation to a function inside the package. +tested_elites <- colnames(irace_results$testing$experiments) +best_results <- as_tibble(irace_results$testing$experiments[, tested_elites, drop=FALSE]) +best_results <- best_results %>% pivot_longer(all_of(tested_elites), names_to="ID") %>% group_by(ID) %>% summarise(n_instances = sum(!is.na(value)), mean = mean(value), sd = sd(value), median = median(value), min = min(value), max = max(value)) +best_results <- best_results[match(tested_elites, best_results$ID), , drop=FALSE] +show_table(best_results, lengthMenu = c(5, 10), + colorbar = c("mean", "sd", "median", "min", "max")) +} else { + cat("No test instances given.\n") +} +``` + +## Final elite configurations on the test instances + +```{r boxplot-test-elites-rpd,fig.align="center", out.width="100%", echo=FALSE} +if (has_testing_data(irace_results)) { + boxplot_test(irace_results, type = "best", interactive = interactive_plots) +} else { + cat("No test instances given.\n") +} +``` +```{r boxplot-test-elites-raw,fig.align="center", out.width="100%", echo=FALSE} +if (has_testing_data(irace_results)) + boxplot_test(irace_results, type = "best", rpd = FALSE, interactive = interactive_plots) +``` + +## Iteration elite configurations on the test instances + +```{r boxplot-test-iteration-elites-rpd,fig.align="center", out.width="100%", echo=FALSE} +if (irace_results$scenario$testIterationElites && has_testing_data(irace_results)) { + boxplot_test(irace_results, type = "all", show_points=FALSE, interactive = interactive_plots) +} else{ + cat("Iteration elites were not tested.\n") +} +``` + +# Training performance + +## Performance of the final elite configurations on the training instances + +```{r table-train-elites,fig.align="center", echo=FALSE} +best_results <- as_tibble(irace_results$experiments[, best_elites, drop=FALSE]) +best_results <- best_results %>% pivot_longer(all_of(best_elites), names_to="ID") %>% group_by(ID) %>% summarise(n_instances = sum(!is.na(value)), mean = mean(value), sd = sd(value), median = median(value), min = min(value), max = max(value)) +best_results <- best_results[match(best_elites, best_results$ID), , drop=FALSE] +show_table(best_results, lengthMenu = c(10, 20, 50, 100), + colorbar = c("mean", "sd", "median", "min", "max")) +``` + +## Final elite configurations on the training instances + + +```{r boxplot-train-elite-rpd,fig.align="center", out.width="100%", echo=FALSE} +boxplot_training(irace_results, interactive = interactive_plots) +``` +```{r boxplot-train-elite-raw,fig.align="center", out.width="100%", echo=FALSE} +boxplot_training(irace_results, rpd = FALSE, interactive = interactive_plots) +``` + +```{asis, echo=irace_results$scenario$testIterationElites} +## Iteration elite configurations on the training instances +``` +```{r boxplot-train-iteration-elites-rpd,fig.align="center", out.width="100%", echo=FALSE} +# FIXME: This should boxplot_training(irace_results, type="ibest") +boxplot_performance(experiments = irace_results$experiments, + allElites = lapply(irace_results$allElites, utils::head, irace_results$scenario$testNbElites), + type = "ibest", interactive = interactive_plots) +``` +```{r boxplot-train-iteration-elites-raw,fig.align="center", out.width="100%", echo=FALSE} +boxplot_performance(experiments = irace_results$experiments, + allElites = lapply(irace_results$allElites, utils::head, irace_results$scenario$testNbElites), + type = "ibest", rpd = FALSE, interactive = interactive_plots) +``` + +# Races overview + +```{r experiments-matrix,fig.align="center", out.width="100%", echo=FALSE, eval=isTRUE(sections$experiments_matrix)} +plot_experiments_matrix(irace_results, interactive = interactive_plots) +``` + +```{asis, echo=!isTRUE(sections$experiments_matrix)} +Disabled because `sections$experiments_matrix` is FALSE. +``` + diff --git a/man/boxplot_performance.Rd b/man/boxplot_performance.Rd new file mode 100644 index 0000000..3ee4508 --- /dev/null +++ b/man/boxplot_performance.Rd @@ -0,0 +1,83 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/boxplot_performance.R +\name{boxplot_performance} +\alias{boxplot_performance} +\title{Box Plot of the performance of a set of configurations} +\usage{ +boxplot_performance( + experiments, + allElites = NULL, + type = c("all", "ibest"), + first_is_best = TRUE, + rpd = TRUE, + show_points = TRUE, + best_color = "#08bfaa", + x_lab = "Configurations", + boxplot = FALSE, + filename = NULL, + interactive = base::interactive() +) +} +\arguments{ +\item{experiments}{Experiment matrix obtained from irace training or testing data. Configurations +in columns and instances in rows. As in irace, column names (configurations ids) +should be characters.} + +\item{allElites}{List or vector of configuration ids, (default NULL). These configurations +should be included in the plot. If the argument is not provided all configurations +in experiments are included. If allElites is a vector all configurations are +assumed without iteration unless argument \code{type="ibest"} is provided, in which case +each configuration is assumed to be from a different iteration. If \code{allElites} +is a list, each element of the list is assumed as an iteration.} + +\item{type}{String, (default "all") possible values are "all" or "ibest". "all" +shows all the selected configurations showing iterations if the information +is provided. "ibest" shows the elite configurations of each iteration, note +that the best configuration is always assumed to be first in the vector of +each iteration.} + +\item{first_is_best}{Boolean (default TRUE) Enables the display in a different color the best configuration +identified as the first one in a vector. If FALSE, all configurations are shown +in the same color.} + +\item{rpd}{(\code{logical(1)}) TRUE to plot performance as the relative +percentage deviation to best results per instance, FALSE to plot raw +performance.} + +\item{show_points}{Logical, (default TRUE) TRUE to plot performance points together with the box plot.} + +\item{best_color}{String, (default \code{"#08bfaa"}) color to display best configurations.} + +\item{x_lab}{String, (default \code{"Configurations"}) label for the x axis.} + +\item{boxplot}{By default, display a violin plot (\code{\link[ggplot2:geom_violin]{ggplot2::geom_violin()}}). +If \code{TRUE}, show a classical boxplot.} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} + +\item{interactive}{(\code{logical(1)}) TRUE if the report may use interactive features (using \code{\link[plotly:ggplotly]{plotly::ggplotly()}}, \code{\link[plotly:plot_ly]{plotly::plot_ly()}} and \code{\link[DT:dataTableOutput]{DT::renderDataTable()}}) or FALSE if such features must be disabled. Defaults to the value returned by \code{interactive()},} +} +\value{ +\code{\link[ggplot2:ggplot]{ggplot2::ggplot()}} boxplot object +} +\description{ +Creates a box plot that displays the performance of a set of configurations +which can be displayed by iteration. +} +\details{ +The performance data is obtained from the experiment matrix provided in the +experiments argument. The configurations can be selected using the allElites +argument and this argument can be also used to define the iteration of each +elite configuration was evaluated. +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +boxplot_performance(iraceResults$experiments, iraceResults$allElites) +\donttest{ +boxplot_performance(iraceResults$testing$experiments, iraceResults$iterationElites) +} +} +\seealso{ +\code{\link[=boxplot_test]{boxplot_test()}} \code{\link[=boxplot_training]{boxplot_training()}} +} diff --git a/man/boxplot_test.Rd b/man/boxplot_test.Rd new file mode 100644 index 0000000..20329e9 --- /dev/null +++ b/man/boxplot_test.Rd @@ -0,0 +1,34 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/boxplot_test.R +\name{boxplot_test} +\alias{boxplot_test} +\title{Box Plot Testing Performance} +\usage{ +boxplot_test(irace_results, type = c("all", "ibest", "best"), ...) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{type}{String, (default \code{"all"}) possible values are \code{"all"}, "ibest" or "best". "all" shows all the configurations included in the test, "best" shows the elite configurations of the last iteration and "ibest" shows the elite configurations of each iteration (requires that irace includes the iteration elites in the testing).} + +\item{...}{Other arguments passed to \code{\link[=boxplot_performance]{boxplot_performance()}}.} +} +\value{ +\code{\link[ggplot2:ggplot]{ggplot2::ggplot()}} boxplot object +} +\description{ +Creates a box plot that displays the performance of a set of configurations on the test instances. +} +\details{ +The performance data is obtained from the test evaluations performed +by irace. Note that the testing is not a default feature in irace and should +be enabled in the setup (see the irace package user guide for more details). +} +\examples{ +iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", + "guide-example.Rdata", mustWork = TRUE)) +boxplot_test(iraceResults) +} +\seealso{ +\code{\link[=boxplot_training]{boxplot_training()}} \code{\link[=boxplot_performance]{boxplot_performance()}} +} diff --git a/man/boxplot_training.Rd b/man/boxplot_training.Rd new file mode 100644 index 0000000..2b7914e --- /dev/null +++ b/man/boxplot_training.Rd @@ -0,0 +1,54 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/boxplot_training.R +\name{boxplot_training} +\alias{boxplot_training} +\title{Box Plot Training} +\usage{ +boxplot_training( + irace_results, + iteration = NULL, + id_configurations = NULL, + ... +) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{iteration}{Numeric, iteration number that should be included in the plot (example: \code{iteration = 5}) +When no iteration and no id_condigurations are provided, the iterations is assumed to be +the last one performed by irace. + +The performance data is obtained from the evaluations performed by irace +during the execution process. This implies that the number of evaluations +can differ between configurations due to the elimination process applied by +irace. This plot, consequently, does not provide a complete compaarison of +two configurations, for a fair comparison use the test data plot.} + +\item{id_configurations}{Numeric vector, configurations ids whose performance should be included in the plot. +If no ids are provided, the configurations ids are set as the elite configuration ids +of the selected iteration (last iteration by default) +(example: \code{id_configurations = c(20,50,100,300,500,600,700)}).} + +\item{...}{Other arguments passed to \code{\link[=boxplot_performance]{boxplot_performance()}}.} +} +\value{ +\code{\link[ggplot2:ggplot]{ggplot2::ggplot()}} boxplot object +} +\description{ +Creates a box plot that displays the performance of a set of configurations +on the training instances. Performance data is obtained from the evaluations +performed by irace during the execution process. This implies that the +number of evaluations can differ between configurations. +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +boxplot_training(iraceResults) +\donttest{ +boxplot_training(iraceResults, iteration = 5) +boxplot_training(iraceResults, id_configurations = c(23,28,29)) +} +} +\seealso{ +\code{\link[=boxplot_test]{boxplot_test()}} \code{\link[=boxplot_performance]{boxplot_performance()}} +} diff --git a/man/configurations_display.Rd b/man/configurations_display.Rd new file mode 100644 index 0000000..738102a --- /dev/null +++ b/man/configurations_display.Rd @@ -0,0 +1,37 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/configurations_display.R +\name{configurations_display} +\alias{configurations_display} +\title{The configurations by iteration and instance} +\usage{ +configurations_display( + irace_results, + rpd = TRUE, + filename = NULL, + interactive = base::interactive() +) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{rpd}{(\code{logical(1)}) TRUE to plot performance as the relative +percentage deviation to best results per instance, FALSE to plot raw +performance.} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} + +\item{interactive}{(\code{logical(1)}) TRUE if the report may use interactive features (using \code{\link[plotly:ggplotly]{plotly::ggplotly()}}, \code{\link[plotly:plot_ly]{plotly::plot_ly()}} and \code{\link[DT:dataTableOutput]{DT::renderDataTable()}}) or FALSE if such features must be disabled. Defaults to the value returned by \code{interactive()},} +} +\value{ +\code{\link[ggplot2:ggplot]{ggplot2::ggplot()}} object +} +\description{ +A graph is created with all the settings and instance of the training data +} +\examples{ +\donttest{ +iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", + "guide-example.Rdata", mustWork = TRUE)) +configurations_display(iraceResults) +} +} diff --git a/man/distance_config.Rd b/man/distance_config.Rd new file mode 100644 index 0000000..51ad326 --- /dev/null +++ b/man/distance_config.Rd @@ -0,0 +1,26 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/distance_config.R +\name{distance_config} +\alias{distance_config} +\title{Distance between configurations} +\usage{ +distance_config(irace_results, id_configuration, t = 0.05) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{id_configuration}{Numeric, configuration id which should be compared to others +(example: id_configuration = c(806,809))} + +\item{t}{Numeric, (default 0.05) threshold that defines the distance (percentage of the domain size) +to consider a parameter value equal to other.} +} +\value{ +numeric +} +\description{ +Calculate the difference between a configuration and the others in the irace data. +} +\examples{ +NULL +} diff --git a/man/has_testing_data.Rd b/man/has_testing_data.Rd new file mode 100644 index 0000000..88a0631 --- /dev/null +++ b/man/has_testing_data.Rd @@ -0,0 +1,17 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/common.R +\name{has_testing_data} +\alias{has_testing_data} +\title{Check if the results object generated by irace has data about the testing phase.} +\usage{ +has_testing_data(irace_results) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} +} +\value{ +\code{logical(1)} +} +\description{ +Check if the results object generated by irace has data about the testing phase. +} diff --git a/man/iraceplot-package.Rd b/man/iraceplot-package.Rd new file mode 100644 index 0000000..42a2c6a --- /dev/null +++ b/man/iraceplot-package.Rd @@ -0,0 +1,67 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/iraceplot-package.R +\docType{package} +\name{iraceplot-package} +\alias{iraceplot} +\alias{iraceplot-package} +\title{iraceplot: Plots for Visualizing the Data Produced by the 'irace' Package} +\description{ +Graphical visualization tools for analyzing the data produced by 'irace'. The 'iraceplot' package enables users to analyze the performance and the parameter space data sampled by the configuration during the search process. It provides a set of functions that generate different plots to visualize the configurations sampled during the execution of 'irace' and their performance. The functions just require the log file generated by 'irace' and, in some cases, they can be used with user-provided data. + +Graphical Visualization Tools for Analysing the Data Produced by irace. + +boxplot_performance; +boxplot_test; +boxplot_training; +parallel_cat; +parallel_coord2; +parallel_coord; +plot_experiments_matrix; +plot_model; +report; +sampling_distance; +sampling_frequency; +sampling_frequency_iteration; +sampling_heatmap2; +sampling_heatmap; +sampling_pie; +scatter_performance; +scatter_test; +scatter_training; + +If you need information about any function you can write: +?name_function + +If you need more information, go to the following page: +https://auto-optimization.github.io/iraceplot/ +} +\details{ +License: MIT + file LICENSE +} +\seealso{ +Useful links: +\itemize{ + \item \url{https://auto-optimization.github.io/iraceplot/} + \item \url{https://github.com/auto-optimization/iraceplot/} + \item Report bugs at \url{https://github.com/auto-optimization/iraceplot/issues} +} + +} +\author{ +\strong{Maintainer}: Manuel López-Ibáñez \email{manuel.lopez-ibanez@manchester.ac.uk} (\href{https://orcid.org/0000-0001-9974-1295}{ORCID}) + +Authors: +\itemize{ + \item Pablo Oñate Marín \email{pablo.onate.m@gmail.com} + \item Leslie Pérez Cáceres \email{leslie.perez@pucv.cl} (\href{https://orcid.org/0000-0001-5553-6150}{ORCID}) +} + + +Maintainers: Pablo Oñate Marín and Leslie Pérez Cáceres and Manuel López-Ibañez +\email{leslie.perez@pucv.cl} +} +\keyword{automatic} +\keyword{configuration} +\keyword{internal} +\keyword{package} +\keyword{plot} diff --git a/man/parallel_cat.Rd b/man/parallel_cat.Rd new file mode 100644 index 0000000..0377eaf --- /dev/null +++ b/man/parallel_cat.Rd @@ -0,0 +1,65 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/parallel_cat.R +\name{parallel_cat} +\alias{parallel_cat} +\title{Parallel Coordinates Category} +\usage{ +parallel_cat( + irace_results, + id_configurations = NULL, + param_names = NULL, + iterations = NULL, + by_n_param = NULL, + n_bins = 3, + filename = NULL +) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{id_configurations}{Configuration ids to be included in the +plot. Example: \code{c(20,50,100,300,500,600,700)}} + +\item{param_names}{(\code{character()}) Parameters to be included in the plot. Example: +\code{c("algorithm","alpha","rho","q0","rasrank")}.} + +\item{iterations}{Numeric vector, iterations from which configuration should be obtained +(example: iterations = c(1,4,5))} + +\item{by_n_param}{Numeric (optional), maximum number of parameters to be displayed.} + +\item{n_bins}{Numeric (default 3), number of intervals to generate for numerical parameters.} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} +} +\value{ +parallel categories plot +} +\description{ +Parallel categories plot of selected configurations. Numerical parameters +are discretized to maximum \code{n_bins} intervals. To visualize configurations +of other iterations these must be provided setting the argument iterations, +groups of configurations of different iterations are shown in different +colors. Specific configurations can be selected providing their ids in the +\code{id_configurations} argument. +} +\details{ +The parameters to be included in the plot can be selected with the +param_names argument. Additionally, the maximum number of parameters to be +displayed in one plot. A list of plots is returned by this function in +several plots are required to display the selected data. +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +parallel_cat(iraceResults) +\donttest{ +parallel_cat(iraceResults, by_n_param = 6) +parallel_cat(iraceResults, id_configurations = c(20, 50, 100)) +parallel_cat(iraceResults, param_names = c("algorithm", "alpha", "rho", "q0", "rasrank")) +parallel_cat(iraceResults, iterations = c(1, 4, 6), n_bins=4) +} +} +\seealso{ +\code{\link[=parallel_coord]{parallel_coord()}} \code{\link[=parallel_coord2]{parallel_coord2()}} +} diff --git a/man/parallel_coord.Rd b/man/parallel_coord.Rd new file mode 100644 index 0000000..6689a2c --- /dev/null +++ b/man/parallel_coord.Rd @@ -0,0 +1,76 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/parallel_coord.R +\name{parallel_coord} +\alias{parallel_coord} +\title{Parallel Coordinates Plot} +\usage{ +parallel_coord( + irace_results, + id_configurations = NULL, + param_names = NULL, + iterations = NULL, + only_elite = TRUE, + by_n_param = NULL, + color_by_instances = TRUE, + filename = NULL +) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{id_configurations}{Configuration ids to be included in the +plot. Example: \code{c(20,50,100,300,500,600,700)}} + +\item{param_names}{(\code{character()}) Parameters to be included in the plot. Example: +\code{c("algorithm","alpha","rho","q0","rasrank")}.} + +\item{iterations}{Numeric vector, iteration number that should be included in the plot +(example: iterations = c(1,4,5))} + +\item{only_elite}{logical (default TRUE), only print elite configurations (argument ignored when +id_configurations is provided)} + +\item{by_n_param}{Numeric (optional), maximum number of parameters to be displayed.} + +\item{color_by_instances}{Logical (default TRUE), choose how to color the lines. TRUE shows the number +of instances evaluated by the configuration in the colores. FALSE to show +the iteration number where the configuration was sampled.} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} +} +\value{ +parallel coordinates plot +} +\description{ +Parallel coordinates plot of a set of selected configurations. Each line in +the plot represents a configuration. By default, the final elite +configurations are shown. To visualize configurations of other iterations +these must be provided setting the argument iterations, configurations of +different iterations are shown in different colors. Setting the only_elites +argument to FALSE allows to display all configurations in the selected +iterations, specific configurations can be selected providing their ids in +the id_configuration argument. +} +\details{ +The parameters to be included in the plot can be selected with the param_names +argument. Additionally, the maximum number of parameters to be displayed in one +plot. A list of plots is returned by this function in several plots are required +to display the selected data. + +To export the plot to a file, it is possible to do it so manually using the +functionality provided by plotly in the plot. If a filename is provided, +orca server will be used to export the plots and thus, it requires the library +to be installed (\url{https://github.com/plotly/orca}). +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +parallel_coord(iraceResults) +\donttest{ +parallel_coord(iraceResults, by_n_param = 5) +parallel_coord(iraceResults, only_elite = FALSE) +parallel_coord(iraceResults, id_configurations = c(20, 30, 40, 50, 100)) +parallel_coord(iraceResults, param_names = c("algorithm", "alpha", "rho", "q0", "rasrank")) +parallel_coord(iraceResults, iterations = c(1, 4, 6)) +} +} diff --git a/man/parallel_coord2.Rd b/man/parallel_coord2.Rd new file mode 100644 index 0000000..7bafd57 --- /dev/null +++ b/man/parallel_coord2.Rd @@ -0,0 +1,56 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/parallel_coord.R +\name{parallel_coord2} +\alias{parallel_coord2} +\title{Parallel Coordinates Plot (configurations)} +\usage{ +parallel_coord2( + configurations, + parameters, + param_names = parameters$names, + by_n_param = NULL, + filename = NULL +) +} +\arguments{ +\item{configurations}{Data frame, configurations in \code{irace} format +(example: \code{configurations = iraceResults$allConfigurations})} + +\item{parameters}{List, parameter object in irace format +(example: \code{parameters = iraceResults$parameters})} + +\item{param_names}{(\code{character()}) Parameters to be included in the plot. Example: +\code{c("algorithm","alpha","rho","q0","rasrank")}.} + +\item{by_n_param}{Numeric (optional), maximum number of parameters to be displayed} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} +} +\value{ +parallel coordinates plot +} +\description{ +Parallel coordinates plot of a set of provided configurations. Each line in +the plot represents a configuration. The parameters to be included in the +plot can be selected with the param_names argument. Additionally, the +maximum number of parameters to be displayed in one plot. A list of plots is +returned by this function in several plots are required to display the +selected data. +} +\details{ +To export the plot to a file, it is possible to do it so manually using the +functionality provided by plotly in the plot. If a filename is provided, +orca server will be used to export the plots and thus, it requires the library +to be installed (\url{https://github.com/plotly/orca}). +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +parallel_coord2(iraceResults$allConfigurations[iraceResults$iterationElites,], + iraceResults$parameters) +parallel_coord2(iraceResults$allConfigurations[iraceResults$iterationElites,], + iraceResults$parameters, + param_names = c("algorithm", "alpha", "rho", "q0", "rasrank")) +parallel_coord2(iraceResults$allConfigurations[iraceResults$iterationElites,], + iraceResults$parameters, by_n_param = 5) +} diff --git a/man/plot_experiments_matrix.Rd b/man/plot_experiments_matrix.Rd new file mode 100644 index 0000000..3a8ff3f --- /dev/null +++ b/man/plot_experiments_matrix.Rd @@ -0,0 +1,44 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/plot_experiments_matrix.R +\name{plot_experiments_matrix} +\alias{plot_experiments_matrix} +\title{Heat Map Plot} +\usage{ +plot_experiments_matrix( + irace_results, + filename = NULL, + metric = c("raw", "rpd", "rank"), + show_conf_ids = FALSE, + interactive = base::interactive() +) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} + +\item{metric}{Cost metric shown in the plot: \code{"raw"} shows the raw +values, \code{"rpd"} shows relative percentage deviation per instance and +\code{"rank"} shows rank per instance.} + +\item{show_conf_ids}{If \code{TRUE}, it shows the configuration IDs in the x-axis. Usually there are too many configurations, thus the default is \code{FALSE}.} + +\item{interactive}{(\code{logical(1)}) TRUE if the report may use interactive features (using \code{\link[plotly:ggplotly]{plotly::ggplotly()}}, \code{\link[plotly:plot_ly]{plotly::plot_ly()}} and \code{\link[DT:dataTableOutput]{DT::renderDataTable()}}) or FALSE if such features must be disabled. Defaults to the value returned by \code{interactive()},} +} +\value{ +\code{\link[ggplot2:ggplot]{ggplot2::ggplot()}} object +} +\description{ +Creates a heatmap plot that shows all performance data seen by irace. +Configurations are shown in the x-axis in the order in which they are +created in the configuration process. Instances are shown in the y-axis in +the order in which they where seen during the configuration run. This plot +gives a general idea of the configuration process progression, the number of +evaluations of each configuration show how long they survived in the +iterated racing procedure. +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +plot_experiments_matrix(iraceResults) +} diff --git a/man/plot_model.Rd b/man/plot_model.Rd new file mode 100644 index 0000000..8ec8ebb --- /dev/null +++ b/man/plot_model.Rd @@ -0,0 +1,39 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/plot_model.R +\name{plot_model} +\alias{plot_model} +\title{Plot the sampling models used by irace} +\usage{ +plot_model(irace_results, param_name, filename = NULL) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{param_name}{String, parameter to be included in the plot, e.g., \code{param_name = "algorithm"}} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} +} +\value{ +sampling model plot +} +\description{ +Display the sampling models from which irace generated parameter values for +new configurations during the configurations process. + +For categorical parameters a stacked bar plot is created. This plot shows +the sampling probabilities of the parameter values for the elite +configurations in the iterations of the configuration process. + +For numerical parameters a sampling distributions plot of the +numerical parameters for the elite configurations of an iteration. +This plot shows de density function of the truncated normal distributions +associated to each parameter for each elite configuration on each iteration. +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +plot_model(iraceResults, param_name="algorithm") +\donttest{ +plot_model(iraceResults, param_name="alpha") +} +} diff --git a/man/reexports.Rd b/man/reexports.Rd new file mode 100644 index 0000000..81f2ecc --- /dev/null +++ b/man/reexports.Rd @@ -0,0 +1,16 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/reexport.R +\docType{import} +\name{reexports} +\alias{reexports} +\alias{read_logfile} +\title{Objects exported from other packages} +\keyword{internal} +\description{ +These objects are imported from other packages. Follow the links +below to see their documentation. + +\describe{ + \item{irace}{\code{\link[irace]{read_logfile}}} +}} + diff --git a/man/report.Rd b/man/report.Rd new file mode 100644 index 0000000..d4b1218 --- /dev/null +++ b/man/report.Rd @@ -0,0 +1,40 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/report.R +\name{report} +\alias{report} +\title{Create HTML Report from irace data} +\usage{ +report( + irace_results, + filename = "report", + sections = list(experiments_matrix = NULL), + interactive = base::interactive() +) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{filename}{(\code{character(1)}) +Filename indicating where to save the report (example: \code{"~/path-to/filename"}).} + +\item{sections}{(\code{list()}) List of sections to enable/disable. This is useful for disabling sections that may cause problems, such as out-of-memory errors. \code{NA} means automatically enable/disable a section depending on the memory required.} + +\item{interactive}{(\code{logical(1)}) TRUE if the report may use interactive features (using \code{\link[plotly:ggplotly]{plotly::ggplotly()}}, \code{\link[plotly:plot_ly]{plotly::plot_ly()}} and \code{\link[DT:dataTableOutput]{DT::renderDataTable()}}) or FALSE if such features must be disabled. Defaults to the value returned by \code{interactive()},} +} +\value{ +filename where the report was created or it opens the report in the default browser (interactive). +} +\description{ +This function creates an HTML report of the most relevant irace data. This +report provides general statistics and plots that show the best +configurations and their performance. Example: \url{https://auto-optimization.github.io/iraceplot/articles/example/report_example.html} +} +\examples{ +\donttest{ + withr::with_tempdir({ + iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) + report(iraceResults, filename = file.path(getwd(), "report")) + }, clean = !base::interactive()) +} +} diff --git a/man/sampling_distance.Rd b/man/sampling_distance.Rd new file mode 100644 index 0000000..84ec5e3 --- /dev/null +++ b/man/sampling_distance.Rd @@ -0,0 +1,46 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/sampling_distance.R +\name{sampling_distance} +\alias{sampling_distance} +\title{Sampling distance Plot} +\usage{ +sampling_distance( + irace_results, + type = c("boxplot", "line", "both"), + t = 0.05, + filename = NULL +) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{type}{String, (default "boxplot") Type of plot to be produces, either "line", "boxplot" +or "both". The "boxplot" setting shows a boxplot of the mean distance of all +configurations, "line" shows the mean distance of the solution population in each +iteration, "both" shows both plots.} + +\item{t}{Numeric, (default 0.05) percentage factor that will determine a distance to +define equal numerical parameter values. If the numerical parameter values to be +compared are v1 and v2 they are considered equal if \verb{|v1-v2| <= |ub-lb|*t}.} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} +} +\value{ +line or box plot +} +\description{ +The \code{sampling_distance} function creates a plot that displays the mean of the +distance between the configurations that were executed in each iteration. + +For categorical parameters the distance is calculated as the hamming distance, +for numerical parameters a equality interval is defined by a threshold +specified by argument t and hamming distance is calculated using this interval. +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +sampling_distance(iraceResults) +\donttest{ +sampling_distance(iraceResults, type = "boxplot", t=0.07) +} +} diff --git a/man/sampling_frequency.Rd b/man/sampling_frequency.Rd new file mode 100644 index 0000000..aafc2d6 --- /dev/null +++ b/man/sampling_frequency.Rd @@ -0,0 +1,68 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/sampling_frequency.R +\name{sampling_frequency} +\alias{sampling_frequency} +\title{Parameter Frequency and Density Plot} +\usage{ +sampling_frequency( + configurations, + parameters, + param_names = NULL, + n = NULL, + filename = NULL +) +} +\arguments{ +\item{configurations}{Data frame, configurations in \code{irace} format. Example: \code{iraceResults$allConfigurations}.} + +\item{parameters}{List, parameters object in irace format. If this argument +is missing, the first parameter is taken as the \code{iraceResults} data +generated when loading the \code{.Rdata} file created by \code{irace} and +\code{configurations=iraceResults$allConfigurations} and \code{parameters = iraceResults$parameters}.} + +\item{param_names}{(\code{character()}) Parameters to be included in the plot. Example: +\code{c("algorithm","alpha","rho","q0","rasrank")}.} + +\item{n}{Numeric, for scenarios with large parameter sets, it selects a subset +of 9 parameters. For example, \code{n=1} selects the first 9 (1 to 9) parameters, n=2 selects +the next 9 (10 to 18) parameters and so on.} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} +} +\value{ +Frequency and/or density plot +} +\description{ +Frequency or density plot that depicts the sampling performed by irace +across the iterations of the configuration process. For categorical +parameters a frequency plot is created, while for numerical parameters a +histogram and density plots are created. The plots are shown in groups of +maximum 9, the parameters included in the plot can be specified by setting +the param_names argument. +} +\note{ +If there are more than 9 parameters, a pdf file extension is +recommended as it allows to create a multi-page document. Otherwise, you +can use the \code{n} argument of the function to generate the plot of a subset +of the parameters. +} +\examples{ +# Either use iraceResults +iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", + "guide-example.Rdata", mustWork = TRUE)) +sampling_frequency(iraceResults) +\donttest{ +sampling_frequency(iraceResults, n = 2) +sampling_frequency(iraceResults, param_names = c("alpha")) +sampling_frequency(iraceResults, param_names = c("algorithm", "alpha", "rho", "q0", "rasrank")) +} +# Or explicitly specify the configurations and parameters. +sampling_frequency(iraceResults$allConfigurations, iraceResults$parameters) +\donttest{ +sampling_frequency(iraceResults$allConfigurations, iraceResults$parameters, n = 2) +sampling_frequency(iraceResults$allConfigurations, iraceResults$parameters, + param_names = c("alpha")) +sampling_frequency(iraceResults$allConfigurations, iraceResults$parameters, + param_names = c("algorithm", "alpha", "rho", "q0", "rasrank")) +} +} diff --git a/man/sampling_frequency_iteration.Rd b/man/sampling_frequency_iteration.Rd new file mode 100644 index 0000000..fe6fa54 --- /dev/null +++ b/man/sampling_frequency_iteration.Rd @@ -0,0 +1,40 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/sampling_frequency_iteration.R +\name{sampling_frequency_iteration} +\alias{sampling_frequency_iteration} +\title{Frequency and Density plot based on its iteration} +\usage{ +sampling_frequency_iteration( + irace_results, + param_name, + numerical_type = "both", + filename = NULL +) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{param_name}{String, name of the parameter to be included (example: param_name = "algorithm")} + +\item{numerical_type}{String, (default "both") Indicates the type of plot to be displayed for numerical +parameters. "density" shows a density plot, "frequency" shows a frequency plot and +"both" show both frequency and density.} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} +} +\value{ +Frequency and/or density plot +} +\description{ +The function will return a frequency plot used +for categorical data (its values are string, show a bar plot) or +numeric data (show a histogram and density plot) by each iteration +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +sampling_frequency_iteration(iraceResults, param_name = "alpha") +\donttest{ +sampling_frequency_iteration(iraceResults, param_name = "alpha", numerical_type="density") +} +} diff --git a/man/sampling_heatmap.Rd b/man/sampling_heatmap.Rd new file mode 100644 index 0000000..6f82820 --- /dev/null +++ b/man/sampling_heatmap.Rd @@ -0,0 +1,46 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/sampling_heatmap.R +\name{sampling_heatmap} +\alias{sampling_heatmap} +\title{Sampling heat map plot} +\usage{ +sampling_heatmap( + irace_results, + param_names, + sizes = c(0, 0), + iterations = NULL, + only_elite = TRUE, + filename = NULL +) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{param_names}{(\code{character()}) Parameters to be included in the plot. Example: +\code{c("algorithm","alpha","rho","q0","rasrank")}.} + +\item{sizes}{Numeric vector that indicated the number of intervals to be considered for numerical +parameters. This argument is positional with respect to param_names. By default, +numerical parameters are displayed using 10 intervals. +(example sizes = c(0,10))} + +\item{iterations}{Numeric vector, iteration number that should be included in the plot +(example: iterations = c(1,4,5))} + +\item{only_elite}{logical (default TRUE), only print elite configurations.} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} +} +\value{ +sampling heat map plot +} +\description{ +Heatmap that displays the frequency of sampling values of two parameters. +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +sampling_heatmap(iraceResults, param_names=c("beta", "alpha")) +sampling_heatmap(iraceResults, param_names=c("beta", "alpha"), iterations = c(3,4)) +sampling_heatmap(iraceResults, param_names=c("beta", "alpha"), only_elite = FALSE) +} diff --git a/man/sampling_heatmap2.Rd b/man/sampling_heatmap2.Rd new file mode 100644 index 0000000..5bbbad0 --- /dev/null +++ b/man/sampling_heatmap2.Rd @@ -0,0 +1,43 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/sampling_heatmap.R +\name{sampling_heatmap2} +\alias{sampling_heatmap2} +\title{Sampling heat map plot} +\usage{ +sampling_heatmap2( + configurations, + parameters, + param_names, + sizes = c(0, 0), + filename = NULL +) +} +\arguments{ +\item{configurations}{Data frame, configurations in \code{irace} format +(example: \code{configurations = iraceResults$allConfigurations})} + +\item{parameters}{List, parameter object in irace format +(example: \code{configurations = iraceResults$parameters})} + +\item{param_names}{String vector of size 2, names of the parameters that should be included in the plot +(example: param_names = c("beta","alpha"))} + +\item{sizes}{Numeric vector that indicated the number of intervals to be considered for numerical +parameters. This argument is positional with respect to param_names. By default, +numerical parameters are displayed using 10 intervals. +(example sizes = c(0,10))} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} +} +\value{ +sampling heat map plot +} +\description{ +Heatmap that displays the frequency of sampling values of two parameters. +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +sampling_heatmap2(iraceResults$allConfigurations, iraceResults$parameters, + param_names=c("beta", "alpha")) +} diff --git a/man/sampling_pie.Rd b/man/sampling_pie.Rd new file mode 100644 index 0000000..b9e9f4d --- /dev/null +++ b/man/sampling_pie.Rd @@ -0,0 +1,36 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/sampling_pie.R +\name{sampling_pie} +\alias{sampling_pie} +\title{Sampling pie plot} +\usage{ +sampling_pie(irace_results, param_names = NULL, n_bins = 3, filename = NULL) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{param_names}{String vector, A set of parameters to be included (example: param_names = c("algorithm","dlb"))} + +\item{n_bins}{Numeric (default 3), number of intervals to generate for numerical parameters.} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} +} +\value{ +Sampling pie plot +} +\description{ +This function creates a pie plot of the values sampled of a set of selected +parameters. Numerical parameters are discretized to maximum \code{n_bins} +intervals. The size of the slices are proportional to the number of +configurations that have assigned a parameter value within the rank or the +value assigned to that slice. Parameters can be selected by providing their +names in the \code{param_names} argument. +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +sampling_pie(iraceResults) +\donttest{ +sampling_pie(iraceResults, param_names = c("algorithm", "dlb", "ants")) +} +} diff --git a/man/scatter_performance.Rd b/man/scatter_performance.Rd new file mode 100644 index 0000000..06fe238 --- /dev/null +++ b/man/scatter_performance.Rd @@ -0,0 +1,90 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/scatter_performance.R +\name{scatter_performance} +\alias{scatter_performance} +\alias{scatter_training} +\alias{scatter_test} +\title{Performance Scatter Plot of Two Configurations} +\usage{ +scatter_performance( + experiments, + x_id, + y_id, + rpd = TRUE, + filename = NULL, + interactive = base::interactive(), + instance_names = NULL +) + +scatter_training(irace_results, ...) + +scatter_test(irace_results, ...) +} +\arguments{ +\item{experiments}{Experiment matrix obtained from irace training or testing data. Configurations +in columns and instances in rows. As in irace, column names (configurations ids) +should be characters. Row names will be used as instance names.} + +\item{x_id, y_id}{Configuration IDs for x-axis and y-axis, respectively.} + +\item{rpd}{(\code{logical(1)}) TRUE to plot performance as the relative +percentage deviation to best results per instance, FALSE to plot raw +performance.} + +\item{filename}{(\code{character(1)}) File name to save the plot, for example \code{"~/path/example/filename.png"}.} + +\item{interactive}{(\code{logical(1)}) TRUE if the report may use interactive features (using \code{\link[plotly:ggplotly]{plotly::ggplotly()}}, \code{\link[plotly:plot_ly]{plotly::plot_ly()}} and \code{\link[DT:dataTableOutput]{DT::renderDataTable()}}) or FALSE if such features must be disabled. Defaults to the value returned by \code{interactive()},} + +\item{instance_names}{Either a character vector of instance names in the +same order as \code{rownames(experiments)} or a function that takes +\code{rownames(experiments)} as input.} + +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} + +\item{...}{Other arguments passed to \code{\link[=scatter_performance]{scatter_performance()}}.} +} +\value{ +\code{\link[ggplot2:ggplot]{ggplot2::ggplot()}} object +} +\description{ +Create a scatter plot that displays the performance of two configurations on +a provided experiment matrix. Each point in the plot represents an instance +and the color of the points indicates if one configuration is better than +the other. +} +\details{ +The performance matrix is assumed to be provided in the format of the irace +experiment matrix thus, NA values are allowed. Consequently the number of +evaluations can differ between configurations due to the elimination process +applied by irace. This plot only shows performance data only for instances +in which both configurations are executed. + +\code{\link[=scatter_training]{scatter_training()}} compares the performance of two configurations on the +training instances. The performance data is obtained from the evaluations +performed by irace during the execution process. + +\code{\link[=scatter_test]{scatter_test()}} compares the performance of two configurations on the test +instances. The performance data is obtained from the test evaluations +performed by irace. Note that testing is not enabled by default in irace +and should be enabled in the scenario setup. Moreover, configuration ids +provided in \code{x_id} and \code{y_id} should belong to elite configuration set +evaluated in the test (see the irace package user guide for more details). +} +\examples{ +iraceResults <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +best_id <- iraceResults$iterationElites[length(iraceResults$iterationElites)] +scatter_performance(iraceResults$experiments, x_id = 1, y_id = best_id) +iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", + "guide-example.Rdata", mustWork = TRUE)) +scatter_training(iraceResults, x_id = 806, y_id = 809) +\donttest{ +scatter_training(iraceResults, x_id = 806, y_id = 809, rpd = FALSE) +} +iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", + "guide-example.Rdata", mustWork = TRUE)) +scatter_test(iraceResults, x_id = 92, y_id = 119) +\donttest{ +scatter_test(iraceResults, x_id = 92, y_id = 119, rpd=FALSE) +} +} diff --git a/man/summarise_by_instance.Rd b/man/summarise_by_instance.Rd new file mode 100644 index 0000000..512ecc2 --- /dev/null +++ b/man/summarise_by_instance.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/summarise_by_instance.R +\name{summarise_by_instance} +\alias{summarise_by_instance} +\title{Summarise by instance} +\usage{ +summarise_by_instance(irace_results) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} +} +\value{ +tibble +} +\description{ +Summarise by instance +} +\examples{ +irace_result <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +summarise_by_instance(irace_result) +} diff --git a/man/summarise_by_iteration.Rd b/man/summarise_by_iteration.Rd new file mode 100644 index 0000000..eaba49f --- /dev/null +++ b/man/summarise_by_iteration.Rd @@ -0,0 +1,22 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/summarise_by_iteration.R +\name{summarise_by_iteration} +\alias{summarise_by_iteration} +\title{Summarise by iteration} +\usage{ +summarise_by_iteration(irace_results) +} +\arguments{ +\item{irace_results}{The data generated when loading the \code{.Rdata} file created by \code{irace} (or the filename of that file).} +} +\value{ +tibble +} +\description{ +Summarise by iteration +} +\examples{ +irace_result <- read_logfile(system.file(package="irace", "exdata", + "irace-acotsp.Rdata", mustWork = TRUE)) +summarise_by_iteration(irace_result) +} diff --git a/tests/testthat.R b/tests/testthat.R new file mode 100644 index 0000000..9849801 --- /dev/null +++ b/tests/testthat.R @@ -0,0 +1,4 @@ +library(testthat) +library(iraceplot) + +test_check("iraceplot") diff --git a/tests/testthat/bug32.Rdata b/tests/testthat/bug32.Rdata new file mode 100644 index 0000000..46ebc1a Binary files /dev/null and b/tests/testthat/bug32.Rdata differ diff --git a/tests/testthat/setup.R b/tests/testthat/setup.R new file mode 100644 index 0000000..cdb9738 --- /dev/null +++ b/tests/testthat/setup.R @@ -0,0 +1,6 @@ +testthat_old_opts <- options( + warnPartialMatchArgs = TRUE, + warnPartialMatchAttr = TRUE, + warnPartialMatchDollar = TRUE +) +testthat_old_opts <- lapply(testthat_old_opts, function(x) if (is.null(x)) FALSE else x) diff --git a/tests/testthat/teardown.R b/tests/testthat/teardown.R new file mode 100644 index 0000000..5b5b208 --- /dev/null +++ b/tests/testthat/teardown.R @@ -0,0 +1 @@ +options(testthat_old_opts) diff --git a/tests/testthat/test-boxplot_test.R b/tests/testthat/test-boxplot_test.R new file mode 100644 index 0000000..a1bc653 --- /dev/null +++ b/tests/testthat/test-boxplot_test.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(boxplot_test(rpd = NA)) +}) diff --git a/tests/testthat/test-boxplot_training.R b/tests/testthat/test-boxplot_training.R new file mode 100644 index 0000000..a1c6fee --- /dev/null +++ b/tests/testthat/test-boxplot_training.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(boxplot_training(iraceResults, rpd = NA)) +}) diff --git a/tests/testthat/test-bug-32.R b/tests/testthat/test-bug-32.R new file mode 100644 index 0000000..24557a0 --- /dev/null +++ b/tests/testthat/test-bug-32.R @@ -0,0 +1,16 @@ + +test_that("bug32", { + ### To regenerate the data + ## library(irace) + ## parameters <- irace:::readParameters(text='p "" r (0,1)') + ## target.runner <- function(experiment, scenario) + ## list(cost = experiment[['configuration']]['p'], call = toString(experiment)) + + ## scenario <- list(targetRunner = target.runner, + ## instances=1:5, + ## maxExperiments = 250, logFile = "bug32.Rdata") + ## scenario <- checkScenario (scenario) + ## confs <- irace(scenario = scenario, parameters = parameters) + summarise_by_instance(read_logfile("bug32.Rdata")) +}) + diff --git a/tests/testthat/test-configurations_display.R b/tests/testthat/test-configurations_display.R new file mode 100644 index 0000000..1205d51 --- /dev/null +++ b/tests/testthat/test-configurations_display.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(configurations_display(iraceResults, rpd = NA)) +}) diff --git a/tests/testthat/test-distance_config.R b/tests/testthat/test-distance_config.R new file mode 100644 index 0000000..6cbb83f --- /dev/null +++ b/tests/testthat/test-distance_config.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(distance_config(iraceResults, idConfigurations = 1, 3, 4)) +}) diff --git a/tests/testthat/test-parallel_cat.R b/tests/testthat/test-parallel_cat.R new file mode 100644 index 0000000..7cfac77 --- /dev/null +++ b/tests/testthat/test-parallel_cat.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(parallel_cat(iraceResults, iterations = c(1,))) +}) diff --git a/tests/testthat/test-parallel_coord.R b/tests/testthat/test-parallel_coord.R new file mode 100644 index 0000000..4a2a890 --- /dev/null +++ b/tests/testthat/test-parallel_coord.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(parallel_coord(iraceResults, pdfAllParameters = NULL)) +}) diff --git a/tests/testthat/test-plot_experiments_matrix.R b/tests/testthat/test-plot_experiments_matrix.R new file mode 100644 index 0000000..0dc9c5e --- /dev/null +++ b/tests/testthat/test-plot_experiments_matrix.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(plot_experiments_matrix(iraceResults, fileName = "/")) +}) diff --git a/tests/testthat/test-plot_model.R b/tests/testthat/test-plot_model.R new file mode 100644 index 0000000..45995d0 --- /dev/null +++ b/tests/testthat/test-plot_model.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(plot_model(iraceResults, param_name="test")) +}) diff --git a/tests/testthat/test-report.R b/tests/testthat/test-report.R new file mode 100644 index 0000000..88a937c --- /dev/null +++ b/tests/testthat/test-report.R @@ -0,0 +1,3 @@ +test_that("needs file", { + expect_error(report(iraceFile = NULL)) +}) diff --git a/tests/testthat/test-sampling_distance.R b/tests/testthat/test-sampling_distance.R new file mode 100644 index 0000000..b5db155 --- /dev/null +++ b/tests/testthat/test-sampling_distance.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(sampling_distance()) +}) diff --git a/tests/testthat/test-sampling_frequency.R b/tests/testthat/test-sampling_frequency.R new file mode 100644 index 0000000..b3867c0 --- /dev/null +++ b/tests/testthat/test-sampling_frequency.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(sampling_frequency(param_names = "dlb")) +}) diff --git a/tests/testthat/test-sampling_frequency_iteration.R b/tests/testthat/test-sampling_frequency_iteration.R new file mode 100644 index 0000000..8ce0e15 --- /dev/null +++ b/tests/testthat/test-sampling_frequency_iteration.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(sampling_frequency_iteration(iraceResults, parameter = j)) +}) diff --git a/tests/testthat/test-sampling_heatmap.R b/tests/testthat/test-sampling_heatmap.R new file mode 100644 index 0000000..707a988 --- /dev/null +++ b/tests/testthat/test-sampling_heatmap.R @@ -0,0 +1,5 @@ +test_that("multiplication works", { + expect_error(sampling_heatmap(iraceResults, param_names = "dlb")) + expect_error(sampling_heatmap(iraceResults, param_names = c("ants","other"))) + expect_error(sampling_heatmap(iraceResults)) +}) diff --git a/tests/testthat/test-sampling_pie.R b/tests/testthat/test-sampling_pie.R new file mode 100644 index 0000000..7116356 --- /dev/null +++ b/tests/testthat/test-sampling_pie.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(sampling_pie(iraceResults, dile)) +}) diff --git a/tests/testthat/test-scatter_test.R b/tests/testthat/test-scatter_test.R new file mode 100644 index 0000000..0bd64db --- /dev/null +++ b/tests/testthat/test-scatter_test.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(scatter_test(iraceResults, lidConfigurations = c(299, 222, 000), )) +}) diff --git a/tests/testthat/test-scatter_training.R b/tests/testthat/test-scatter_training.R new file mode 100644 index 0000000..253163a --- /dev/null +++ b/tests/testthat/test-scatter_training.R @@ -0,0 +1,3 @@ +test_that("multiplication works", { + expect_error(scatter_training(iraceResults, lidConfigurations = c(299, 222, 000), )) +}) diff --git a/vignettes/example/report_example.Rmd b/vignettes/example/report_example.Rmd new file mode 100644 index 0000000..70ae28f --- /dev/null +++ b/vignettes/example/report_example.Rmd @@ -0,0 +1,14 @@ +--- +title: "Example report provided by iraceplot package" +output: + html_document: + toc: true +--- + +```{r, echo=FALSE, results='asis'} +irace_results <- irace::read_logfile(system.file(package="iraceplot", "exdata", "guide-example.Rdata", mustWork = TRUE)) +interactive_plots <- TRUE +res <- knitr::knit_child(system.file("template", "report_html.Rmd", package="iraceplot"), quiet = TRUE) +cat(res, sep = '\n') +``` + diff --git a/vignettes/iraceplot_package.Rmd b/vignettes/iraceplot_package.Rmd new file mode 100644 index 0000000..bc7db7c --- /dev/null +++ b/vignettes/iraceplot_package.Rmd @@ -0,0 +1,303 @@ +--- +title: The iraceplot package +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{The iraceplot package} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r echo=FALSE, prompt=FALSE, message=FALSE, warning=FALSE} + +library(iraceplot, quietly = TRUE) +library(plotly) +iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", "guide-example.Rdata", mustWork = TRUE)) +``` + + +## Introduction +The iraceplot package provides a set of functions to create plots to visualize +the configuration data generated by the configuration process implemented in +the irace package. + +The configuration process performed by irace will show ar the end of the execution one +or more configurations that are the best performing configurations found. This package +provides a set of functions that allow to further assess the performance of these +configurations and provides support to obtain insights about the details of the configuration +process. + +For more details about these functions, please check the [user guide](https://auto-optimization.github.io/iraceplot/) +of the package and the documentation of the functions implemented in the package. + + +# Installation +## Install iraceplot + +You can install the package directly from CRAN +``` +install.packages("iraceplot") +``` + +or you can install the last development version from Github: + +``` +install.packages("devtools") +devtools::install_github("auto-optimization/iraceplot/") +``` + +## How to use + +This is a basic example that shows how to use iraceplot: + +```{r eval=FALSE} +library(iraceplot) +``` + +To use the functions its required to have the log file generated by irace (logFile +option in the irace package), commonly saved in the in the directory in which irace +was executed with the name `irace.Rdata`. To load the irace log data, replace the +path of the file with yours: + +```{r eval=FALSE} +iraceResults <- irace::read_logfile("~/path/to/irace.Rdata") +``` + +This file contains the `iraceResults` variable which contains the irace log. For +more details about this variable, please go to the documentation of the irace package. +Now you can use the the functions of the package, for example: + +```{r fig.align="center", fig.width=7, eval=FALSE} +boxplot_test(iraceResults, type="best") +``` + +## Executing irace + +To use the methods provided by this package you must have an irace data object, +this object is saved as an Rdata file (irace.Rdata by default) after each irace +execution. + +During the configuration procedure irace evaluates several candidate configurations +(parameter settings) on different training insrances, creating an algorithm performance +data set we call the *training data set*. This information is thus, the data +that irace had access to when configuring the algorithm. + +You can also enable the test evaluation option in irace, in which a set of elite +configurations will be evaluated on a set of test instances after the execution of +irace is finished. Nota that this option is not enabled by default and you +must provide the test instances in order to enable it. The performance obtained +in this evalaution is called the *test data set*. This evaluation helps assess +the results of the configuration in a more "real" setup. For example, we can assess +if the configuration process incurred in overtuning or if a type of instance +was underrepresented in the training set. We note that irace allows to perform +the test evaluations to the final elite configurations and to the elite configurations +of each iterations. For information about the irace setup we refer you to the irace +package user guide. + +Note: Before executing irace, consider setting the test evaluation option of irace. + +Once irace is executed, you can load the irace log in the R console as previously +shown. + +# Function overview + +## Visualizing configurations + +The irace plot package provides several functions that display information about +configurations. For visualizing individual configurations the `parallel_coord` +shows each configuration as a line. + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE} +parallel_coord(iraceResults) +``` + +The `parallel_coord2` function generates a similar parallel coordinates plot when +provided with an arbitrary set of configurations without the irace execution context. +For example, to display all elite configurations: + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE} +all_elite <- iraceResults$allConfigurations[unlist(iraceResults$allElites),] +parallel_coord2(all_elite, iraceResults$parameters) +``` + +A similar display can be obtained using the `parallel_cat` function. For example to +visualize the configurations of a selected set of parameters: + +```{r fig.align="center", fig.width= 7, fig.height=6, message=FALSE, prompt=FALSE, eval=FALSE} +parallel_cat(irace_results = iraceResults, + param_names=c("algorithm", "localsearch", "dlb", "nnls")) +``` + +The `sampling_pie` function creates a plot that displays the values of all configurations +sampling during the configuration process. The size of each parameter +value in the plot is dependent of the number of configurations having that value in the +configurations. + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE} +sampling_pie(irace_results = iraceResults, param_names=c("algorithm", "localsearch", "alpha", "beta", "rho")) +``` + +Note that for some of the previous plots, numerical parameters domains are discretized +to be showm in the plot. Check the documentation of the functions and the [user guide](https://auto-optimization.github.io/iraceplot/) +to adjust this setting. + +## Visualising sampled values and frequencies + +The package provides several functions to visualize values sampled during the +configuration procedure and their distributions. These plots can help identifying +the areas in the parameter space where irace detected a high performance. + +A general overview of the sampled parameters values can be obtained with the `sampling_frequency` +function which generates frequency and density plots for the sampled values: + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE, eval=FALSE} + sampling_frequency(iraceResults, param_names = c("beta")) +``` + + +If you would like to visualize the distribution of a particular set of configurations, +you can pass directly a set of configurations +and a parameters object in the irace format to the `sampling_frequency` function: +```{r fig.align="center", fig.width= 7, fig.height=7, message=FALSE, prompt=FALSE, eval=FALSE} + sampling_frequency(iraceResults$allConfigurations, iraceResults$parameters, param_names = c("alpha")) +``` + +A detailed plot showing the sampling by iteration can be obtained with the +`sampling_frequency_iteration` function. This plot shows the convergence of the +configuration process reflected in the sampled parameter values. + +```{r fig.align="center", fig.width= 7, fig.height=6, eval=FALSE} +sampling_frequency_iteration(iraceResults, param_name = "beta") +``` + +To visualize the joint sampling frequency of two parameters you can use the `sampling_heatmap` +function. + +```{r fig.align="center", fig.width= 7, fig.height=6, eval=FALSE} +sampling_heatmap(iraceResults, param_names = c("beta","alpha")) +``` + +The configurations can be provided directly to the `sampling_heatmap2` function. In both +functions, the parameter sizes can be used to adjust the number of intervals to be +displayed: + +```{r fig.align="center", fig.width= 7, fig.height=6, eval=FALSE} +sampling_heatmap2(iraceResults$allConfigurations, iraceResults$parameters, + param_names = c("localsearch","nnls"), sizes=c(0,5)) +``` + +For more details of these functions, check the documentation of the functions and the [user guide](https://auto-optimization.github.io/iraceplot/). + +## Visualizing sampling distance + +You may like to have a general overview of the distance of the configurations sampled +across the configuration process. This can allow you to assess the convergence of the +configuration process. Use the `sampling_distance` function to display the mean distance +of the configurations across the iterations of the configuration process: + +```{r fig.align="center", fig.width= 7, eval=FALSE} +sampling_distance(iraceResults, t=0.05) +``` + +Numerical parameter distance can be adjusted with a treshold (`t=0.05`), +check the documentation of the function and the [user guide](https://auto-optimization.github.io/iraceplot/) +for details. + + +## Visualizing test performance + +The test performance of the best final configurations can be visualized using the `boxplot_test` +function. + +```{r fig.align="center", fig.width=7, eval=FALSE} +boxplot_test(iraceResults, type="best") +``` + +Note that the irace execution log includes test data (test is not enabled by default), check the +irace package [user guide](https://CRAN.R-project.org/package=irace/vignettes/irace-package.pdf) +for details on how to use the test feature in irace. + +To investigate the difference in the performance of two configurations the `scatter_test` function displays +the performance of both configurations paired by instance (each point represents an instance): + +```{r fig.align="center", fig.width=7, eval=FALSE} +scatter_test(iraceResults, x_id = 808, y_id = 809, interactive=TRUE) +``` + + +## Visualizing training performance + +Visualizing training performance might help to obtain insights about the reasoning +that followed irace when searching the parameter space, and thus it can be used +to understand why irace considers certain configurations as high or low performing. + +To visualize the performance of the final elites observed by irace, the `boxplot_training` +function plots the experiments performed on these configurations. Note that this data corresponds +to the performance generated during the configuration process thus, the number of instances on +which the configurations were evaluated might vary between elite configurations. + +```{r fig.align="center", fig.width=7, eval=FALSE} +boxplot_training(iraceResults) +``` + +To observe the difference in the performance of two configurations you can also generate +a scatter plot using the `scatter_training` function: + +```{r fig.align="center", fig.width=7, eval=FALSE} +scatter_training(iraceResults, x_id = 808, y_id = 809, interactive=TRUE) +``` + +## Visualizing performance (general purpose) +To plot the performance of a selected set of configurations in an experiment matrix, +you can use the `boxplot_performance` function. The configurations can be selected +in a vector or a list (allElites): + +```{r fig.align="center", fig.width=7, eval=FALSE} +boxplot_performance(iraceResults$experiments, allElites=list(c(803,808), c(809,800)), first_is_best = TRUE) +``` + +In the same way, you can use the `scatter_perfomance` function to plot the difference +between configurations: + +```{r fig.align="center", fig.width=7, eval=FALSE} +scatter_performance(iraceResults$experiments, x_id = 83, y_id = 809, interactive=TRUE) +``` + +Note that there these functions can be adjusted to display differently the configurations (i.e. include or not instancs). +Check the package [user guide](https://auto-optimization.github.io/iraceplot/) and the +documentation of each function for details. + +## Visualizing the configuration process + +In some cases, it might be interesting have a general visualization for the configuration process. +This can be obtained with the `plot_experiments_matrix` function: + +```{r fig.align="center", fig.width=7, eval=FALSE} +plot_experiments_matrix(iraceResults, interactive = TRUE) +``` + +The sampling distributions used by irace during the configuration process can be displayed using the +`plot_model` function. For categorical parameters, this function displays the sampling probabilities associated to each parameter value by iteration (x axis top) in each elite configuration model (bars): + +```{r fig.align="center", fig.width=7, eval=FALSE} +plot_model(iraceResults, param_name="algorithm") +``` + +For numerical parameters, this function shows the sampling distributions associated to each parameter. +These plots display the the density function of the truncated normal distribution associated to the +models of each elite configuration in each instance: + + +```{r fig.align="center", fig.width=7, fig.height=6, message=FALSE, prompt=FALSE, results='hide', eval=FALSE} +plot_model(iraceResults, param_name="alpha") +``` + +# Report + +The `report` function generates an HTML report with a summary of the configuration +process executed by irace. The function will create an HTML file in the path +provided in the `filename` argument and appending the `".html"` extension to it. + + +```{r fig.align="center", eval=FALSE} +report(iraceResults, filename="report") +``` diff --git a/vignettes/user_guide/guide.Rmd b/vignettes/user_guide/guide.Rmd new file mode 100644 index 0000000..1c516c4 --- /dev/null +++ b/vignettes/user_guide/guide.Rmd @@ -0,0 +1,483 @@ +--- +title: "The iraceplot package: user guide" +output: + html_document: + toc: true +--- + +```{r echo=FALSE, prompt=FALSE, message=FALSE, warning=FALSE} + +library(iraceplot, quietly = TRUE) +library(plotly, quietly = TRUE) +iraceResults <- read_logfile(system.file(package="iraceplot", "exdata", "guide-example.Rdata", mustWork = TRUE)) +``` + +## Introduction + + +The iraceplot package provides a set of functions to create plots to visualize +the configuration data generated by the configuration process implemented in +the irace package. + +The configuration process performed by irace will show ar the end of the execution one +or more configurations that are the best performing configurations found. This package +provides a set of functions that allow to further assess the performance of these +configurations and provides support to obtain insights about the details of the configuration +process. + + +# Installation +## Install iraceplot from CRAN +``` +install.packages("iraceplot") +``` + +## Install iraceplot from github + +For installing iraceplot you need to install the devtools package: +``` +install.packages("devtools") +devtools::install_github("auto-optimization/iraceplot/") +``` + +## How to use + +This is a basic example that shows how to use iraceplot: + +```{r eval=FALSE} +library(iraceplot) +``` + +To use the functions its required to have the log file generated by irace (logFile +option in the irace package). This file is commonly saved in the in the directory +in which irace was executed with the name `irace.Rdata`. To load the irace log data, +replace the path of the file with yours: + +```{r eval=FALSE} +iraceResults <- irace::read_logfile("~/path/to/irace.Rdata") +``` + +This file contains the `iraceResults` variable which contains the irace log. For +more details about this variable, please go to the [user guide](https://cran.r-project.org/web/packages/irace/vignettes/irace-package.pdf) of the irace package. + +## Executing irace + +To use the methods provided by this package you must have an irace data object, +this object is saved as an Rdata file (irace.Rdata by default) after each irace +execution. + +During the configuration procedure irace evaluates several candidate configurations +(parameter settings) on different training insrances, creating an algorithm performance +data set we call the *training data set*. This information is thus, the data +that irace had access to when configuring the algorithm. + +You can also enable the test evaluation option in irace, in which a set of elite +configurations will be evaluated on a set of test instances after the execution of +irace is finished. Nota that this option is not enabled by default and you +must provide the test instances in order to enable it. The performance obtained +in this evalaution is called the *test data set*. This evaluation helps assess +the results of the configuration in a more "real" setup. For example, we can assess +if the configuration process incurred in overtuning or if a type of instance +was underrepresented in the training set. We note that irace allows to perform +the test evaluations to the final elite configurations and to the elite configurations +of each iterations. For information about the irace setup we refer you to the irace +package user guide. + +Note: Before executing irace, consider setting the test evaluation option of irace. + +Once irace is executed, you can load the irace log in the R console as previously +shown. + +# Visualizing irace configuration data +In the following, we provide an example how the functions implemented in this package +can be used to visualize the information generated by irace. + + +## Configurations + +Once irace is executed, the first thing you might want to do is to visualize how the +best configurations look like. You can do this with the `parallel_coord` method: + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE} +parallel_coord(iraceResults) +``` + +The plot shows by default the final elite configurations (last iteration), each +line in the plot represents one configuration. The plot produced by `parallel_coord` +can help you to both the distribution of the parameter values of a set of configurations +and the common associations between these values. By default, the plot colors the +lines using the number of instances evaluated by the configuration. You can use +the `color_by_instances` argument to choose between coloring the lines by the +number of instances executed or the last iteration the configurationwas executed on. + +To visualize the configurations considered as elites in each iteration use the +`iterations` option. You can select one or more iterations. For example, to select +all iterations executed by irace: + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE} +parallel_coord(iraceResults, iterations=1:iraceResults$state$nbIterations, + color_by_instances = FALSE) +``` + +You can also visualize all configurations sampled in one or more iterations disabling the +`only_elite` option. For example, to visualize configurations sampled in iterations 1 to 9: + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE} +parallel_coord(iraceResults, iterations=1:9, only_elite=FALSE) +``` + +If you are looking for something more flexible and you would like to provide +your own set of configurations, you can use the `parallel_coord2` function. +This function generates a parallel coordinates plot (similar to the ones generated +by `parallel_coord`) when provided with an arbitrary set of configurations and a +parameter space object. The configurations must be provided in the format +in which irace handles configurations: a dataframe with parameter in each column. +For information about these formats, please chech the irace package [user guide](https://cran.r-project.org/web/packages/irace/vignettes/irace-package.pdf). +As an example, this lines display all elite configurations: + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE} +all_elite <- iraceResults$allConfigurations[unlist(iraceResults$allElites),] +parameters <- iraceResults$parameters +parallel_coord2(all_elite, parameters) +``` + +The `parallel_cat` function displays all configurations sampled in a set +of iterations. The plot groups parameter values in intervals and thus, it can be +useful to visualize more tendencies of association between parameters values. As +in the previous functions you can use the `iterations` argument to select the +iterations from which configurations should be selected. For example, to visualize +configurations sampled on iterations 3, 4 and 5: + +```{r fig.align="center", fig.width= 7, fig.height=7, message=FALSE, prompt=FALSE} +parallel_cat(irace_results = iraceResults, iterations = c(3,4,5) ) +``` + +You can also adjust the number of value intervals that are displayed for each parameter +using the `n_bins` parameter which by default is set to 3. Also, it is possible to +select a subset of configurations using the `id_configurations` argument providing +a vector a configuration ids. + +Both in the `parallel_coord` and `parallel_cat` functions parameters can be selected +using the `param_names` argument. For example to select parameters un `parallel_cat`: + +```{r fig.align="center", fig.width= 7, fig.height=6, message=FALSE, prompt=FALSE} +parallel_cat(irace_results = iraceResults, + param_names=c("algorithm", "localsearch", "dlb", "nnls")) +``` + +This setting is useful to visualize the association between parameters that, for example, +seem to interact. For configuration scenarios that define a large number of parameters, +it is impossible to visualize all parameters in one of these plots. If do not know which +parameters to select, all parameters can be split in different plots using the `by_n_param` +argument which specifies the maximum number of parameters to include in a plot. The functions +will generate as many plots as needed to cover all parameters included. + +## Sampled values and frecuencies + +The `sampling_pie` function creates a plot that displays the values of all configurations +sampled during the configuration process. This plot can be useful to display the tendencies +in the sampling in a simple format. As well as in the `previous`parallel_cat` plot, numerical +parameters domains are discretized to be shown in the plot. The size of each parameter +value in the plot is dependent of the number of configurations having that value in the +configurations. + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE} +sampling_pie(irace_results = iraceResults) +``` + +As in previous plots, you can select the parameters to display with the `param_names` +argument and the number of intervals of each parameter with the `n_bins` argument. The +generated plot is interactive, you can click in each parameter to display it +independently. + +In some cases in might be interesting to have a look at the values sampled during the +configuration procedure as a distribution. Such plot shows the areas in the parameter +space where irace detected a high performance. A general overview of the distribution of +sampled parameters values can be obtained with the `sampling_frequency`function which +generates frequency and density plots for the sampled values: + +```{r fig.align="center", fig.width= 7, fig.height=7, message=FALSE, prompt=FALSE} + sampling_frequency(iraceResults) +``` + +If you would like to visualize the distribution of a particular set of configurations, +you can pass directly a set of configurations +and a parameters object in the irace format to the `sampling_frequency` function: +```{r fig.align="center", fig.width= 7, fig.height=7, message=FALSE, prompt=FALSE, eval=FALSE} + sampling_frequency(iraceResults$allConfigurations, iraceResults$parameters) +``` + +The previous functions display the parameter frequency plots grouped by 9 plots, you can +adjust this setting using the `n` argument. You can select the parameters to be displayed, +with the `param_names` argument. You can use these plots to ge a general idea of the area +of the parameter space in which the sampling performed by irace was focused. For example: + +```{r fig.align="center", fig.width= 7, message=FALSE, prompt=FALSE} + sampling_frequency(iraceResults, param_names = c("beta")) +``` + +If you would like to see more details, a plot showing the sampling by iteration can be +obtained with the `sampling_frequency_iteration`function. This plot shows the convergence +of the configuration process reflected in the parameter values sampled each iteration: + +```{r fig.align="center", fig.width= 7, fig.height=6} +sampling_frequency_iteration(iraceResults, param_name = "beta") +``` + +In some cases, you may want to assess how the sampling frequencies of two parameters are +related. You can visualize the joint sampling frequency of two parameters using the +`sampling_heatmap` function. By default, this plot uses the elite configurations values +(of all configurations), you can allow the plot to show all sampled configurations using +the `only_elite` argument. You must select two parameters using the `param_names` argument, +for example: + + +```{r fig.align="center", fig.width= 7, fig.height=6} +sampling_heatmap(iraceResults, param_names = c("beta","alpha")) +``` + +You can also select the iterations from which configurations will be selected using +the `iterations` argument. The size of the intervals considered for numerical parameters +in the heatmap can be also adjusted, see the example to have more details. + +If you would like to display a set of configurations directly provided by you, +use the `sampling_heatmap2` function. In both `sampling_heatmap2` and `sampling_heatmap2`, +you can adjust the number of intervals to be displayed for numerical parameters +using the `sizes` argument. For example, we set to 5 the number of intervals displayed +for the second parameter, which is numerical, using `sizes=c(0,5)`. The the 0 value in +this argument indicates that the default interval size should be used. In this case, we set +0 for the first parameter but note that it is not possible to adjust the number of intervals +for categorical or ordered parameter types. + +```{r fig.align="center", fig.width= 7, fig.height=6} +sampling_heatmap2(iraceResults$allConfigurations, iraceResults$parameters, + param_names = c("localsearch","nnls"), sizes=c(0,5)) +``` + +## Sampling distance + +You may like to have a general overview of the distance of the configurations sampled +across the configuration process. This can allow you to assess the convergence of the +configuration process. The mean distance between the sampled configurations can be +visualized using the `sampling_distance` function. This function compares the parameter +values of all configurations and aggregates these comparisons to calculate the overall +distance: + +```{r fig.align="center", fig.width= 7} +sampling_distance(iraceResults, t=0.05) +``` + +Note that for categorical and ordered parameters the comparison is straightforward, but +this is not the case for numerical parameters. The argument `t` defines a percentage +used to define a domain interval to assess equality of numerical parameters. For example, +if the domain of a parameter is `[0,10]` and `t=0.1` then when comparing any value to `v=2` +we define an interval `s=t* (upper_bound -lower_bound) = 0.1*(10-0)=1`. Then all +values in the interval `[v-s, v+s] [1,3]` will be equal to `v=2`. + +# Visualizing Performance + +## Test performance (elite configurations) + +When executing irace, you can enable the testing feature that will evaluate elite configurations +in a set of test instances. For more details about how you can use this feature check the irace package +[user guide](https://cran.r-project.org/web/packages/irace/vignettes/irace-package.pdf). + +The test performance of the evaluated configurations can be visualised using the `boxplot_test` +function. Note that the irace execution log file includes test data (test is not enabled by default). + +```{r fig.align="center", fig.width=7} +boxplot_test(iraceResults, type="best") +``` + +This plot shows all final elite configurations evaluation on the test instance set, we can compare +the performance of these configurations to select one that has the best test performance. Note that +the best configuration (identified by irace) is displayed in a different color. By default, the plot +displayes the relative percentage deviation of the performance of the configurations, to disable this +and display the raw performance use the `rpd=FALSE` argument. + +If the irace log file includes the evaluation on the test set of the iteration elite configurations, +its possible to plot the test performance of elite configurations across the iterations using the +argument `type="all"`. The best elite configuration on each iteration is displayed in different color: + +```{r fig.align="center", fig.width=7} +boxplot_test(iraceResults, type="all", show_points=FALSE) +``` + +This plot allows to assess the progress of the configuration process regarding the test set performance, +which would be useful when dealing with heterogeneous instance sets. In these cases, good configurations +across the full set can be challenging to find and it is possible that the algorithm could be mislead if +instances sets are prone to introduce bias due to instance ordering. + +Note that in this example the elite configuration with id 808 seems to have a slightly better performance +than the configuration identified as the best (id:809) by irace. It is important to note that all elite +configurations are not statistically different and thus, its very possible that such situation is observed +when evaluuting test performance, specially when configuring heterogeneous, difficult to balance, instance +sets. + +If you would like further detauls about the difference in the performance of two configurations, +you can use the `scatter_test` function. This function displays the performance of both configurations +paired by instance (each point represents an instance): + +```{r fig.align="center", fig.width=7} +scatter_test(iraceResults, x_id = 808, y_id = 809, interactive=TRUE, instance_names = basename) +``` + +If the plot is created using the argument `interactive=TRUE` you can visualize the instance name when +placing the cursor over each performance point. This plot can help to identify subsets of instances +in which a configuration clearly outperforms other. To further understand the difference of these two +configurations the trainig data might be explored to verify if such effect holds for the training set. + + +## Training performance (all configurations) + +During the execution of irace considerable performance data is obtained in order to +assess the performance of the candidate configurations. This data can be useful to +understand the performance of the configured algorithm and to get insights about how +to improve the configuration process. + +The following functions create plots of the training data in the irace log. Note that this +data is obtained during the search of good configurations. Due to the racing procedure, +some configurations are more evaluated than others (best configurations are more evaluated +than poor performing configurations). See the irace package documentation for details. + +Visualizing training performance might help to obtain insights about the reasoning +that followed irace when searching the parameter space, and thus it can be used +to understand why irace considers certain configurations as high or low performing. + +To visualize the performance of the final elites as observed by irace, the `boxplot_training` +function plots the experiments performed on these configurations. Note that this data corresponds +to the performance generated during the configuration process thus, the number of instances on +which the configurations were evaluated might vary between elite configurations. + + +```{r fig.align="center", fig.width=7} +boxplot_training(iraceResults) +``` + +You can select to display the elite configurations of a different iteration by using the +`iteration`. In case you would like to visualize non-elite configurations you can +directly provide the instance ids using the `. This can be very useful to assess the performance of +initial configurations provided for the configuration process for example, providing +insights about why irace did not select them as final elite configurations: + +```{r fig.align="center", fig.width=7} +iteration_elites = iraceResults$iterationElites +boxplot_training(iraceResults, id_configurations=c(1, iteration_elites)) +``` + +To visualize the difference in the performance of two configurations you can also generate +a scatter plot using the `scatter_training` function: + +```{r fig.align="center", fig.width=7} +scatter_training(iraceResults, x_id = 808, y_id = 809, interactive=TRUE) +``` + +If the plot is created using the argument `interactive=TRUE` you can visualize the instance name when +placing the cursor over each performance point. + +For both functions `boxplot_training` and `scatter_training`, you can display either the +relative percentage deviation (`rpd=TRUE`) or the raw performance (`rpd=FALSE`). + +You can also plot the performance of configurations which was not necessarily obtained +when executing irace. Check the General purpose performance section for details. + + +## General purpose performance + +You can use the following functions to plot the performance of a selected set +of configurations in an experiment matrix provided directly by you. These functions +can be also useful when you would like to compare a configuration that +was not generated in the configuration process. Note thar such comparison should be +carefully considered as execution conditions might differ from the ones when irace was +executed. Keep in mind that irace provides seeds to execute instances when configuring +stochastic algorithms and also might define an execution bound when adaptive capping is +active. Check the irace package [user guide](https://cran.r-project.org/web/packages/irace/vignettes/irace-package.pdf) +for details about this. + +To plot the performance of a selected set of configurations in an experiment matrix, +you can use the `boxplot_performance` function. The configurations can be selected +in a vector (allElites): + +```{r fig.align="center", fig.width=7} +boxplot_performance(iraceResults$experiments, allElites=c(800, 803,808,809)) +``` + +The experiment matrix should be provided in the irace format, columns should +have configuration ids as names (character) and rows names can be instances +names (optional). + +If the configurations are provided in a list, then the different elements of this list +will be considered as iterations. Note that this matches the `iraceResults$allElites` +variable in the irace log. For example, to plot 3 configurations as they are assigned +to 3 different iterations: + +```{r fig.align="center", fig.width=7} +boxplot_performance(iraceResults$experiments, allElites=as.list(c(800,803,808,809))) +``` + +Each element of a list can have more than one id (vector). You can place at the start of +each vector the configuration id you want to be identified as the best one and use +`first_is_best = TRUE` to have it displayed in the different color. Adjust the color using +the `best_color` argument. + +```{r fig.align="center", fig.width=7} + +boxplot_performance(iraceResults$experiments, allElites=list(c(803,808), c(809,800)), first_is_best = TRUE) +``` + +If you want to further compare the performance of two configurtions, you can use the +`scatter_perfomance` function to plot the difference between configurations: + +```{r fig.align="center", fig.width=7} +scatter_performance(iraceResults$experiments, x_id = 803, y_id = 809, interactive=TRUE, instance_names = basename) +``` + +If the plot is created using the argument `interactive=TRUE` and, the provided +matrix has row names, you can visualize the instance name when placing the cursor +over each performance point otherwise an instance ID is displayed. In this +example, we further transform instance names using the function `basename()`. + +# Visualizing the configuration process + +In some cases, it might be interesting have a general visualization for the +configuration process progress. This can be generated with the `plot_experiments_matrix` +function: + +```{r fig.align="center", fig.width=7} +plot_experiments_matrix(iraceResults, interactive = TRUE) +``` + +This plot shows configurations in the x axis and instances in the y axis. Each point +in the plot display in color the ranking of the configuration on each instance. If +`interactive = TRUE`, you can place your cursor on each point to visualize the configuration +id, instance id, and the rank of the configuration. + +The sampling distributions used by irace during the configuration process can be displayed using the +`plot_model` function. For categorical parameters, this function displays the sampling probabilities +associated to each parameter value by iteration (x axis top) in each elite configuration model (bars): + +```{r fig.align="center", fig.width=7} +plot_model(iraceResults, param_name="algorithm") +``` + +For numerical parameters, this function shows the sampling distributions associated to each parameter. +These plots display the the density function of the truncated normal distribution associated to the +models of each elite configuration in each instance: + + +```{r fig.align="center", fig.width=7, fig.height=6, message=FALSE, prompt=FALSE, results='hide'} +plot_model(iraceResults, param_name="alpha") +``` + +# Report + +If you want a quick and portable overview of the configuration process, you can use +the `report` function which generates an HTML report with a summary of the configuration +process executed by irace. The function will create an HTML file in the path +provided in the `filename` argument and appending the `".html"` extension to it. + + +```{r fig.align="center", eval=FALSE} +report(iraceResults, filename="report") +``` +