You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firstly, thank you for writing an awesome package. I have been trying to visualize bar plots and histograms to get the distribution of the variables. My dataset has around 80 variables and some of them have more than 100 levels. I need to lump the insignificant levels into "other" i.e, setting the threshold, let's say 0.05. I want to include the frequency of each bar in the bar plot. I tweaked the source code a little bit to add the frequencies in the bar plot. I will include it here. It works in a single plot but, not in create_report function. The numbers become unreadable in create_report function. Could you help me to create a desirable report?
# Code for plot_bar
function (data, with = NULL, maxcat = 50, order_bar = TRUE,
binary_as_factor = TRUE, title = NULL, ggtheme = theme_gray(),
theme_config = list(), nrow = 3L, ncol = 3L, parallel = FALSE)
{
.doTrace({
}, "on entry")
{
frequency <- measure <- variable <- value <- NULL
if (!is.data.table(data))
data <- data.table(data)
split_data <- split_columns(data, binary_as_factor = binary_as_factor)
if (split_data$num_discrete == 0)
stop("No discrete features found!")
discrete <- split_data$discrete
ind <- .ignoreCat(discrete, maxcat = maxcat)
if (length(ind)) {
message(length(ind), " columns ignored with more than ",
maxcat, " categories.\n", paste0(names(ind),
": ", ind, " categories\n"))
drop_columns(discrete, names(ind))
if (length(discrete) == 0)
stop("Note: All discrete features ignored! Nothing to plot!")
}
feature_names <- names(discrete)
if (is.null(with)) {
dt <- discrete[, list(frequency = .N), by = feature_names]
}
else {
if (is.factor(data[[with]])) {
measure_var <- suppressWarnings(as.numeric(levels(data[[with]]))[data[[with]]])
}
else if (is.character(data[[with]])) {
measure_var <- as.numeric(data[[with]])
}
else {
measure_var <- data[[with]]
}
if (all(is.na(measure_var)))
stop("Failed to convert `", with, "` to continuous!")
if (with %in% names(discrete))
drop_columns(discrete, with)
tmp_dt <- data.table(discrete, measure = measure_var)
dt <- tmp_dt[, list(frequency = sum(measure, na.rm = TRUE)),
by = feature_names]
}
dt2 <- suppressWarnings(melt.data.table(dt, measure.vars = feature_names))
layout <- .getPageLayout(nrow, ncol, ncol(discrete))
plot_list <- .lapply(parallel = parallel, X = layout,
FUN = function(x) {
if (order_bar) {
base_plot <- ggplot(dt2[variable %in% feature_names[x]],
aes(x = reorder(value, frequency), y = frequency))
}
else {
base_plot <- ggplot(dt2[variable %in% feature_names[x]],
aes(x = value, y = frequency))
}
base_plot + geom_bar(stat = "identity") + geom_text(stat = "identity",
position = "identity", aes(label = frequency,
color = "red", angle = 90, fontface = "bold",
vjust = -0.5)) + coord_flip() + xlab("") +
ylab(ifelse(is.null(with), "Frequency", toTitleCase(with)))
})
class(plot_list) <- c("multiple", class(plot_list))
plotDataExplorer(plot_obj = plot_list, page_layout = layout,
title = title, ggtheme = ggtheme, theme_config = theme_config,
facet_wrap_args = list(facet = ~variable, nrow = nrow,
ncol = ncol, scales = "free"))
}
}
By the way, is there any way to include ColorBrewer to make the plot appealing? In addition to this, create_report function creates bar plots and histograms after detecting the type of the variable. For example, when I use str() function, I can see the variable classes. However, binary variables with 0 and 1 are still considered numerical. But, create_report considers them as discrete. How can I determine and set the numerical and categorical variables automatically without explicitly stating each variable?
Thank you,
Mehmet
The text was updated successfully, but these errors were encountered:
To group levels, you can use group_category(). Do that first then send it to create_report().
Is this the line you added? aes(label = frequency, color = "red", angle = 90, fontface = "bold", vjust = -0.5)). Here is the code for bar charts in the report. If you overwrite plot_bar, it should just work. Could you make sure it is named as plot_bar? Maybe temporarily setting it to global and try, i.e., <<-?
I fixed some bugs in the latest develop version. Could you update and see if it still exists? You might have to manually set the value for binary_as_factor.
I added that code to the source code in the second bullet but, I am confused where or how to overwrite plot_bar. Should I replace the code chunk I wrote with what you have suggested?
split_columns function works fine except if a continuous variable has NULL, the function considers it a discrete variable.
If you just run the new function, it should replace the old. You can also try to set it as global to verify, e.g., plot_bar <<- function(...) {...}.
I will look into the split_columns issue.
FYI, I am traveling at the moment, so might be slow to respond.
Hello,
Firstly, thank you for writing an awesome package. I have been trying to visualize bar plots and histograms to get the distribution of the variables. My dataset has around 80 variables and some of them have more than 100 levels. I need to lump the insignificant levels into "other" i.e, setting the threshold, let's say 0.05. I want to include the frequency of each bar in the bar plot. I tweaked the source code a little bit to add the frequencies in the bar plot. I will include it here. It works in a single plot but, not in
create_report
function. The numbers become unreadable increate_report
function. Could you help me to create a desirable report?By the way, is there any way to include ColorBrewer to make the plot appealing? In addition to this,
create_report
function creates bar plots and histograms after detecting the type of the variable. For example, when I usestr()
function, I can see the variable classes. However, binary variables with 0 and 1 are still considered numerical. But,create_report
considers them as discrete. How can I determine and set the numerical and categorical variables automatically without explicitly stating each variable?Thank you,
Mehmet
The text was updated successfully, but these errors were encountered: