Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.4.2 #78

Merged
merged 4 commits into from
Dec 9, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 21 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,39 @@ sudo: false
cache: packages
dist: trusty

r:
- release
- devel

os:
- linux
- osx

matrix:
exclude:
- os: osx
r: devel

env:
global:
- R_CHECK_ARGS="--no-build-vignettes --no-manual --timings --run-donttest"

notifications:
email: false

#before_install:
# - sudo apt-get install --yes udunits-bin libproj-dev libgeos-dev libgdal-dev libgdal1-dev libudunits2-dev
before_install:
- sudo apt-get install --yes udunits-bin libproj-dev libgeos-dev libgdal-dev libgdal1-dev libudunits2-dev

- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew install llvm &&
export PATH="/usr/local/opt/llvm/bin:$PATH" &&
export LDFLAGS="-L/usr/local/opt/llvm/lib" &&
export CFLAGS="-I/usr/local/opt/llvm/include"; fi

r_packages:
- ggplot2
- covr

r_github_packages: modelOriented/DALEX
#r_github_packages: modelOriented/DALEX

after_success:
- Rscript -e 'library(covr); codecov()'
Expand Down
5 changes: 2 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
Package: ingredients
Title: Effects and Importances of Model Ingredients
Version: 0.4.1
Version: 0.4.2
Authors@R: c(person("Przemyslaw", "Biecek", email = "przemyslaw.biecek@gmail.com",
role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-8423-1823")),
person("Hubert", "Baniecki", role = "aut",
comment = c(ORCID = "0000-0001-6661-5364")),
person("Adam", "Izdebski", role = "aut"),
person("Katarzyna", "Pekala", role = "aut"))
person("Adam", "Izdebski", role = "aut"))
Description: Collection of tools for assessment of feature importance and feature effects.
Key functions are:
feature_importance() for assessment of global level feature importance,
Expand Down
12 changes: 11 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
ingredients 0.4.2
---------------------------------------------------------------
* change `variable_type` and `variables` arguments usage in the
`aggregate_profiles`, `plot.ceteris_paribus` and `plotD3.ceteris_paribus`
* remove `variable_type` argument from `plotD3.aggregated_profiles`
(now the same as in `plot.aggregated_profiles`)
* Kasia Pekala is removed from the contributors as `aspect_importance` is moved to `DALEXtra`
([See v0.3.12 changelog](https://modeloriented.github.io/ingredients/news/index.html#ingredients-0-3-12))
* added Travis-CI for OSX

ingredients 0.4.1
---------------------------------------------------------------
* fixed rounding problem in the describe function ([#76](https://github.com/ModelOriented/ingredients/issues/76))
Expand Down Expand Up @@ -31,7 +41,7 @@ cluster_profiles(), plot() and others, as requested in #15

ingredients 0.3.8
----------------------------------------------------------------
* Natural language description generated with `describe()` function for `ceteris_paribus()`, `feature_importance()` and `aggregated_profiles()` explanations.
* Natural language description generated with `describe()` function for `ceteris_paribus()`, `feature_importance()` and `aggregate_profiles()` explanations.


ingredients 0.3.7
Expand Down
28 changes: 25 additions & 3 deletions R/aggregate_profiles.R
Original file line number Diff line number Diff line change
Expand Up @@ -104,16 +104,38 @@ aggregate_profiles <- function(x, ...,
if (length(all_variables_intersect) == 0) stop(paste0("parameter variables do not overlap with ", paste(all_variables, collapse = ", ")))
all_variables <- all_variables_intersect
}

# only numerical or only factors?
is_numeric <- sapply(all_profiles[, all_variables, drop = FALSE], is.numeric)

if (variable_type == "numerical") {
vnames <- names(which(is_numeric))
if (length(vnames) == 0) stop("There are no numerical variables")
all_profiles$`_x_` <- 0

# there are no numerical variables
if (length(vnames) == 0) {
# change to categorical
variable_type <- "categorical"
all_profiles$`_x_` <- ""
# send message
message("'variable_type' changed to 'categorical' due to lack of numerical variables.")
# take all
vnames <- all_variables
} else if (!is.null(variables) && length(vnames) != length(variables)) {
message("Non-numerical variables (from the 'variables' argument) are rejected.")
}
} else {
vnames <- names(which(!is_numeric))
if (length(vnames) == 0) stop("There are no non-numerical variables")
all_profiles$`_x_` <- ""

# there are variables selected
if (!is.null(variables)) {
# take all
vnames <- all_variables
} else if (length(vnames) == 0) {
# there were no variables selected and there are no categorical variables
stop("There are no non-numerical variables.")
}
}

# select only suitable variables
Expand All @@ -132,7 +154,7 @@ aggregate_profiles <- function(x, ...,
if (variable_type == "categorical") {
all_profiles$`_x_` <- as.character(apply(all_profiles, 1, function(x) x[x["_vname_"]]))
}

if (!is.null(groups) && ! groups %in% colnames(all_profiles)) {
stop("groups parameter is not a name of any column")
}
Expand Down
45 changes: 9 additions & 36 deletions R/plotD3_aggregated_profiles.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@
#' @param color a character. Set line/bar color
#' @param size a numeric. Set width of lines
#' @param alpha a numeric between \code{0} and \code{1}. Opacity of lines
#' @param variable_type a character. If "numerical" then only numerical variables will be plotted.
#' If "categorical" then only categorical variables will be plotted.
#' @param facet_ncol number of columns for the \code{\link[ggplot2]{facet_wrap}}
#' @param scale_plot a logical. If \code{TRUE}, the height of plot scales with window size. By default it's \code{FALSE}
#' @param variables if not \code{NULL} then only \code{variables} will be presented
Expand Down Expand Up @@ -49,25 +47,22 @@
#' pdp_rf_a <- aggregate_profiles(cp_rf, type = "accumulated", variable_type = "numerical")
#' pdp_rf_a$`_label_` <- "RF_accumulated"
#'
#' plotD3(pdp_rf_p, pdp_rf_c, pdp_rf_a, variable_type = "numerical", scale_plot = TRUE)
#' plotD3(pdp_rf_p, pdp_rf_c, pdp_rf_a, scale_plot = TRUE)
#'
#' pdp <- aggregate_profiles(cp_rf, type = "partial", variable_type = "categorical")
#' pdp$`_label_` <- "RF_partial"
#'
#' plotD3(pdp, variables = c("gender","class"), variable_type = "categorical", label_margin = 70)
#' plotD3(pdp, variables = c("gender","class"), label_margin = 70)
#'
#' @export
#' @rdname plotD3_aggregated_profiles
plotD3.aggregated_profiles_explainer <- function(x, ..., size = 2, alpha = 1,
color = "#46bac2",
variable_type = "numerical",
facet_ncol = 2, scale_plot = FALSE,
variables = NULL,
chart_title = "Aggregated Profiles",
label_margin = 60) {

check_variable_type(variable_type)

# if there is more explainers, they should be merged into a single data frame
dfl <- c(list(x), list(...))
aggregated_profiles <- do.call(rbind, dfl)
Expand All @@ -77,35 +72,13 @@ plotD3.aggregated_profiles_explainer <- function(x, ..., size = 2, alpha = 1,
if (!is.null(variables)) {
all_variables <- intersect(all_variables, variables)
if (length(all_variables) == 0) stop(paste0("variables do not overlap with ", paste(all_variables, collapse = ", ")))
}

hl <- split(aggregated_profiles, f = as.character(aggregated_profiles$`_vname_`), drop = FALSE)[all_variables]

# only numerical or only factor?
is_numeric <- unlist(lapply(hl, function(x){
is.numeric(x$`_x_`)
}))

if (variable_type == "numerical") {
vnames <- names(which(is_numeric))

if (length(vnames) == 0) {
# but `variables` are selected, then change to factor
if (length(variables) > 0) {
variable_type <- "categorical"
vnames <- variables
} else {
stop("There are no numerical variables")
}
}
} else {
vnames <- names(which(!is_numeric))
# there are no numerical features
if (length(vnames) == 0) stop("There are no non-numerical variables")
aggregated_profiles <- aggregated_profiles[aggregated_profiles$`_vname_` %in% all_variables, ]
}

is_x_numeric <- is.numeric(aggregated_profiles$`_x_`)

# prepare profiles data
aggregated_profiles <- aggregated_profiles[aggregated_profiles$`_vname_` %in% vnames, ]
aggregated_profiles$`_vname_` <- droplevels(aggregated_profiles$`_vname_`)
rownames(aggregated_profiles) <- NULL

Expand All @@ -118,7 +91,7 @@ plotD3.aggregated_profiles_explainer <- function(x, ..., size = 2, alpha = 1,
min_max_list <- ymean <- label_names <- NULL

# line plot or bar plot?
if (variable_type == "numerical") {
if (is_x_numeric) {
aggregated_profiles_list <- lapply(aggregated_profiles_list, function(x){
ret <- x[, c('_x_', "_yhat_", "_vname_", "_label_")]
colnames(ret) <- c("xhat", "yhat", "vname", "label")
Expand Down Expand Up @@ -150,12 +123,12 @@ plotD3.aggregated_profiles_explainer <- function(x, ..., size = 2, alpha = 1,
ymean <- round(attr(x, "mean_prediction"),3)
}

options <- list(variableNames = as.list(vnames),
n = length(vnames), c = length(list(...)) + 1,
options <- list(variableNames = as.list(all_variables),
n = length(all_variables), c = length(list(...)) + 1,
yMax = ymax + ymargin, yMin = ymin - ymargin,
yMean = ymean, labelNames = label_names,
size = size, alpha = alpha, color = color,
onlyNumerical = variable_type == "numerical",
onlyNumerical = is_x_numeric,
facetNcol = facet_ncol, scalePlot = scale_plot,
chartTitle = chart_title, labelMargin = label_margin)

Expand Down
30 changes: 21 additions & 9 deletions R/plotD3_ceteris_paribus.R
Original file line number Diff line number Diff line change
Expand Up @@ -85,20 +85,32 @@ plotD3.ceteris_paribus_explainer <- function(x, ..., size = 2, alpha = 1,

if (variable_type == "numerical") {
vnames <- names(which(is_numeric))
all_profiles$`_x_` <- 0

# there are no numerical variables
if (length(vnames) == 0) {
# but `variables` are selected, then change to factor
if (length(variables) > 0) {
variable_type <- "categorical"
vnames <- variables
} else {
stop("There are no numerical variables")
}
# change to categorical
variable_type <- "categorical"
all_profiles$`_x_` <- ""
# send message
message("'variable_type' changed to 'categorical' due to lack of numerical variables.")
# take all
vnames <- all_variables
} else if (!is.null(variables) && length(vnames) != length(variables)) {
message("Non-numerical variables (from the 'variables' argument) are rejected.")
}
} else {
vnames <- names(which(!is_numeric))
# there are no numerical features
if (length(vnames) == 0) stop("There are no non-numerical variables")
all_profiles$`_x_` <- ""

# there are variables selected
if (!is.null(variables)) {
# take all
vnames <- all_variables
} else if (length(vnames) == 0) {
# there were no variables selected and there are no categorical variables
stop("There are no non-numerical variables.")
}
}

# prepare clean observations data for tooltips
Expand Down
33 changes: 21 additions & 12 deletions R/plot_ceteris_paribus.R
Original file line number Diff line number Diff line change
Expand Up @@ -102,29 +102,38 @@ plot.ceteris_paribus_explainer <- function(x, ...,
}
# is color a variable or literal?
is_color_a_variable <- color %in% c(all_variables, "_label_", "_vname_", "_ids_")

# only numerical or only factors?
is_numeric <- sapply(all_profiles[, all_variables, drop = FALSE], is.numeric)

if (variable_type == "numerical") {
vnames <- names(which(is_numeric))
all_profiles$`_x_` <- 0

# there are no numerical variables
if (length(vnames) == 0) {
# but `variables` are selected, then change to factor
if (length(variables) > 0) {
variable_type <- "categorical"
vnames <- variables
all_profiles$`_x_` <- ""
} else {
stop("There are no numerical variables")
}
# change to categorical
variable_type <- "categorical"
all_profiles$`_x_` <- ""
# send message
message("'variable_type' changed to 'categorical' due to lack of numerical variables.")
# take all
vnames <- all_variables
} else if (!is.null(variables) && length(vnames) != length(variables)) {
message("Non-numerical variables (from the 'variables' argument) are rejected.")
}

} else {
vnames <- names(which(!is_numeric))
# there are no numerical features
if (length(vnames) == 0) stop("There are no non-numerical variables")

all_profiles$`_x_` <- ""

# there are variables selected
if (!is.null(variables)) {
# take all
vnames <- all_variables
} else if (length(vnames) == 0) {
# there were no variables selected and there are no categorical variables
stop("There are no non-numerical variables.")
}
}

# how to plot profiles
Expand Down
6 changes: 3 additions & 3 deletions docs/404.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions docs/articles/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading