Permalink
Switch branches/tags
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
786 lines (675 sloc) 28 KB
---
title: "ggcoefstats"
author: "Indrajeet Patil"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
fig_width: 6
fig.align: 'center'
fig.asp: 0.618
dpi: 100
toc: true
warning: FALSE
message: FALSE
vignette: >
%\VignetteIndexEntry{ggcoefstats}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
The function `ggstatsplot::ggcoefstats` generates **dot-and-whisker plots** of
regression models saved in tidy data frames (produced with the `broom` package).
By default, the plot displays `95%` confidence intervals for the regression
coefficients. The function currently supports only those classes of object that
are supported by the `broom` package. For an exhaustive list, see-
<https://broom.tidyverse.org/articles/available-methods.html>
In this vignette, we will see examples of how to use this function. We will try
to cover as many classes of objects as possible. Unfortunately, there is no
single dataset that will be helpful for carrying out all types of regression
analyses and, therefore, we will use various datasets to explore data-specific
hypotheses using regression models.
**Note before**: The following demo uses the pipe operator (`%>%`), so in case
you are not familiar with this operator, here is a good explanation:
<http://r4ds.had.co.nz/pipes.html>
## General structure of the plots
Although the statistical models displayed in the plot may differ based on the
class of models being investigated, there are few aspects of the plot that will
be invariant across models:
- The dot-whisker plot contains a dot representing the **estimate** and their
**confidence intervals** (`95%` is the default). The estimate can either be
effect sizes (for tests that depend on the `F` statistic) or regression
coefficients (for tests with `t` and `z` statistic), etc. The function will, by
default, display a helpful `x`-axis label that should clear up what estimates are
being displayed. The confidence intervals can sometimes be asymmetric if
bootstrapping was used.
- The caption will always contain diagnostic information, if available, about
models that can be useful for model selection: The smaller the Akaike's
Information Criterion (**AIC**) and the Bayesian Information Criterion (**BIC**)
values, the "better" the model is. Additionally, the higher the
**log-likelihood** value the "better" is the model fit.
- The output of this function will be a `ggplot2` object and, thus, it can be
further modified (e.g., change themes, etc.) with `ggplot2` functions.
In the following examples, we will try out a number of regression models and,
additionally, we will also see how we can change different aspects of the plot
itself.
## omnibus ANOVA (`aov`)
For this analysis, let's use the `movies_long` dataset, which provides
information about IMDB ratings, budget, length, MPAA ratings (e.g., R-rated,
NC-17, PG-13, etc.), and genre for a number of movies. Let's say our hypothesis
is that the IMDB ratings for a movie are predicted by a multiplicative effect of
the genre and the MPAA rating it got. Let's carry out an omnibus
ANOVA to see if this is the case.
```{r aov1, warning = FALSE, message = FALSE, fig.height = 6, fig.width = 8}
# loading needed libraries
library(ggstatsplot)
library(ggplot2)
# for reproducibility
set.seed(123)
# looking at the data
dplyr::glimpse(x = ggstatsplot::movies_long)
# to speed up the calculation, let's use only 10% of the data
movies_10 <- dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.1)
# plot
ggstatsplot::ggcoefstats(
x = stats::aov(
formula = rating ~ mpaa * genre,
data = movies_10
),
effsize = "omega", # changing the effect size estimate being displayed
point.color = "red", # changing the point color
point.size = 4, # changing the point size
point.shape = 15, # changing the point shape
package = "dutchmasters", # package from which color paletter is to be taken
palette = "milkmaid", # color palette for labels
title = "omnibus ANOVA" # title for the plot
) +
# further modification with the ggplot2 commands
# note the order in which the labels are entered
ggplot2::scale_y_discrete(labels = c("MPAA", "Genre", "Interaction term")) +
ggplot2::labs(x = "effect size estimate (partial omega-squared)",
y = NULL)
```
As this plot shows, there is no interaction effect between these two factors.
Note that we can also use this function for model selection. You can try out
different models with the code below and see how the AIC, BIC, and
log-likelihood values change. Looking at the model diagnostics, you should be
able to see that the model with only `genre` as the predictor of ratings seems
to perform almost equally well as more complicated additive and multiplicative
models. Although there is certainly some improvement with additive and
multiplicative models, it is by no means convincing enough for us to abandon a
simpler model.
```{r aov2, warning = FALSE, message = FALSE, fig.height = 10, fig.width = 10, eval = FALSE}
library(ggstatsplot)
# to speed up the calculation, let's use only 10% of the data
movies_10 <- dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.1)
# for reproducibility
set.seed(123)
# plot
ggstatsplot::combine_plots(
# model 1
ggstatsplot::ggcoefstats(
x = stats::aov(
formula = rating ~ mpaa,
data = movies_10
),
stats.label.color = "black",
title = "1. Only MPAA ratings"
),
ggstatsplot::ggcoefstats(
x = stats::aov(
formula = rating ~ genre,
data = movies_10
),
stats.label.color = "black",
title = "2. Only genre"
),
ggstatsplot::ggcoefstats(
x = stats::aov(
formula = rating ~ mpaa + genre,
data = movies_10
),
stats.label.color = "black",
title = "3. Additive effect of MPAA and genre"
),
ggstatsplot::ggcoefstats(
x = stats::aov(
formula = rating ~ mpaa * genre,
data = movies_10
),
stats.label.color = "black",
title = "4. Multiplicative effect of MPAA and genre"
),
title.text = "Model selection using ggcoefstats",
labels = c("(a)", "(b)", "(c)", "(d)")
)
```
## omnibus ANOVA (`anova`)
You can also use `car` package to run an omnibus ANOVA:
```{r anova1, warning = FALSE, message = FALSE, fig.height = 6, fig.width = 8}
library(car)
# creating a model
mod <- stats::lm(
formula = conformity ~ fcategory * partner.status,
data = Moore,
contrasts = list(fcategory = contr.sum, partner.status = contr.sum)
)
# plotting estimates
ggstatsplot::ggcoefstats(x = car::Anova(mod, type = "III"))
```
## linear model (`lm`)
Now that we have figured out that the movie `genre` best explains a fair deal of
variation in how good people rate the movie to be on IMDB. Let's run a linear
regression model to see how different types of genres compare with each other-
```{r lm, warning = FALSE, message = FALSE, fig.height = 8, fig.width = 8}
# let's check all the levels for the genre variable
levels(ggstatsplot::movies_long$genre)
# to speed up the calculation, let's use only 10% of the data
movies_10 <- dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.1)
# plot
ggstatsplot::ggcoefstats(
x = stats::lm(
formula = rating ~ genre,
data = movies_10
),
conf.level = 0.99, # changing the confidence levels for confidence intervals
sort = "ascending", # sorting the terms of the model based on estimate values
label.direction = "both", # direction in which to adjust position of labels (both x and y)
ggtheme = ggplot2::theme_gray(), # changing the default theme
stats.label.color = c("#CC79A7", "darkgreen", "#0072B2", "black", "red"),
title = "Movie ratings by their genre",
subtitle = "Source: www.imdb.com"
) +
# further modification with the ggplot2 commands
# note the order in which the labels are entered
ggplot2::scale_y_discrete(labels = c("Comedy", "Romance", "Documentary", "Animation", "Drama")) +
ggplot2::labs(y = "genre (comparison level: Action)") +
ggplot2::theme(axis.title.y = ggplot2::element_text(size = 14, face = "bold"))
```
As can be seen from the regression coefficients, compared to the action movies,
only romantic movies, animated movies, and dramas fare better with the
audiences.
## linear mixed-effects model (`lmer`)
Now let's say we want to see how movie's budget relates to how good the movie is
rated to be on IMDB (e.g., more money, better ratings?). But we have reasons to
believe that the relationship between these two variables might be different for
different genres (e.g., budget might be a good predictor of how good the movie
is rated to be for animations or actions movies as more money can help with
better visual effects and animations, but this may not be true for dramas, so we
don't want to use `stats::lm`. In this case, therefore, we will be running a
linear mixed-effects model (using `lme4::lmer` and *p*-values generated using
the `sjstats::p_values` function) with a random slope for the genre variable.
```{r lmer1, warning = FALSE, message = FALSE, fig.height = 14, fig.width = 7}
library(lme4)
library(ggstatsplot)
# to speed up the calculation, let's use only 10% of the data
movies_10 <- dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.1)
# combining the two different plots
ggstatsplot::combine_plots(
# model 1: simple linear model
ggstatsplot::ggcoefstats(
x = stats::lm(
formula = scale(rating) ~ scale(budget),
data = movies_10
),
title = "linear model",
stats.label.color = "black",
exclude.intercept = FALSE # show the intercept
) +
ggplot2::labs(x = parse(text = "'standardized regression coefficient' ~italic(beta)")),
# model 2: linear mixed-effects model
ggstatsplot::ggcoefstats(
x = lme4::lmer(
formula = scale(rating) ~ scale(budget) + (budget | genre),
data = movies_10,
control = lme4::lmerControl(calc.derivs = FALSE)
),
p.kr = TRUE, # use Kenward-Roger approximation to compute p-values (fast)
title = "linear mixed-effects model",
stats.label.color = "black",
exclude.intercept = FALSE # show the intercept
) +
ggplot2::labs(x = parse(text = "'standardized regression coefficient' ~italic(beta)"),
y = "fixed effects"),
labels = c("(a)", "(b)"),
nrow = 2,
ncol = 1,
title.text = "Relationship between movie budget and its IMDB rating"
)
```
As can be seen from these plots, although there seems to be a really small
correlation between budget and rating in a linear model, this effect is not
significant once we take into account the hierarchical structure of the data.
Note that for mixed-effects models, only the *fixed* effects are shown because
there are no confidence intervals for *random* effects terms. In case, you would
like to see these terms, you can enter the same object you entered as `x`
argument to `ggcoefstats` in `broom::tidy`:
```{r lmer2, warning = FALSE, message = FALSE}
library(ggstatsplot)
# to speed up the calculation, let's use only 10% of the data
movies_10 <- dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.1)
# tidy output
broom::tidy(
x = lme4::lmer(
formula = scale(rating) ~ scale(budget) + (budget | genre),
data = movies_10,
control = lme4::lmerControl(calc.derivs = FALSE)
),
conf.int = TRUE,
conf.level = 0.95
)
```
## non-linear least-squares model (`nls`)
So far we have been assuming a linear relationship between movie budget and
rating. But what if we want to also explore the possibility of a non-linear
relationship? In that case, we can run a non-linear least squares regression.
Note that you need to choose some non-linear function, which will be based on
prior exploratory data analysis (`y ~ k/x + c` implemented here, but you can try
out other non-linear functions, e.g. `Y ~ k * exp(-b*c)`).
```{r nls, warning = FALSE, message = FALSE, fig.height = 6, fig.width = 6}
library(ggstatsplot)
# to speed up the calculation, let's use only 10% of the data
movies_10 <- dplyr::sample_frac(tbl = ggstatsplot::movies_long, size = 0.1)
# plot
ggstatsplot::ggcoefstats(
x = stats::nls(
formula = rating ~ k / budget + c, # try toying around with the form of non-linear function
data = movies_10,
start = list(k = 1, c = 0)
),
title = "Non-linear relationship between budget and rating",
subtitle = "Source: IMDB"
)
```
This analysis shows that there is indeed a possible non-linear association
between rating and budget (non-linear regression term `k` is significant), at
least with the particular non-linear function we used.
## generalized linear model (`glm`)
In all the analyses carried out thus far, the outcome variable (`y` in `y ~ x`)
has been continuous. In case the outcome variable is nominal/categorical/factor,
we can use the **generalized** form of linear model that works even if the
response is a numeric vector or a factor vector, etc.
To explore this model, we will use the Titanic dataset, which tabulates
information on the fate of passengers on the fatal maiden voyage of the ocean
liner *Titanic*, summarized according to economic status (class), sex, age, and
survival. Let's say we want to know what was the strongest predictor of whether
someone survived the Titanic disaster-
```{r glm1, warning = FALSE, message = FALSE, fig.height = 6, fig.width = 6}
library(ggstatsplot)
# having a look at the Titanic dataset
df <- as.data.frame(x = Titanic)
str(df)
# plot
ggstatsplot::ggcoefstats(
x = stats::glm(
formula = Survived ~ Sex + Age,
data = df,
weights = df$Freq, # vector containing weights (no. of observations per row)
family = stats::binomial(link = "logit") # choosing the family
),
exponentiate = TRUE,
ggtheme = ggthemes::theme_economist_white(),
title = "general linear model (glm)",
vline.color = "red",
vline.linetype = "solid",
label.segment.color = "red",
stats.label.size = 3.5,
stats.label.color = c("orangered",
"dodgerblue")
)
```
As can be seen from the regression coefficients, all entered predictors were
significant predictors of the outcome. More specifically, being a female was
associated with higher likelihood of survival (compared to male). On other hand,
being an adult was associated with decreased likelihood of survival (compared to
child).
**Note**: Few things to keep in mind for `glm` models,
- The exact statistic will depend on the family used. Below we will see
a host of different function calls to `glm` with a variety of different
families.
- Some families will have a `t` statistic associated with them,
while others a `z` statistic. The function will figure this out for you.
```{r glm2, warning = FALSE, message = FALSE, fig.height = 13, fig.width = 10}
# creating dataframes to use for regression analyses
library(dplyr)
# dataframe #1
(
df.counts <-
base::data.frame(
treatment = gl(n = 3, k = 3, length = 9),
outcome = gl(n = 3, k = 1, length = 9),
counts = c(18, 17, 15, 20, 10, 20, 25, 13, 12)
) %>%
tibble::as_data_frame(x = .)
)
# dataframe #2
(df.clotting <- data.frame(
u = c(5, 10, 15, 20, 30, 40, 60, 80, 100),
lot1 = c(118, 58, 42, 35, 27, 25, 21, 19, 18),
lot2 = c(69, 35, 26, 21, 18, 16, 13, 12, 12)
) %>%
tibble::as_data_frame(x = .))
# dataframe #3
x1 <- stats::rnorm(50)
y1 <- stats::rpois(n = 50, lambda = exp(1 + x1))
(df.3 <- data.frame(x = x1, y = y1) %>%
tibble::as_data_frame(x = .))
# dataframe #4
x2 <- stats::rnorm(50)
y2 <- rbinom(n = 50,
size = 1,
prob = stats::plogis(x2))
(df.4 <- data.frame(x = x2, y = y2) %>%
tibble::as_data_frame(x = .))
# combining all plots in a single plot
ggstatsplot::combine_plots(
# Family: Poisson
ggstatsplot::ggcoefstats(
x = stats::glm(
formula = counts ~ outcome + treatment,
data = df.counts,
family = stats::poisson(link = "log")
),
title = "Family: Poisson",
stats.label.color = "black"
),
# Family: Gamma
ggstatsplot::ggcoefstats(
x = stats::glm(
formula = lot1 ~ log(u),
data = df.clotting,
family = stats::Gamma(link = "inverse")
),
title = "Family: Gamma",
stats.label.color = "black"
),
# Family: Quasi
ggstatsplot::ggcoefstats(
x = stats::glm(
formula = y ~ x,
family = quasi(variance = "mu", link = "log"),
data = df.3
),
title = "Family: Quasi",
stats.label.color = "black"
),
# Family: Quasibinomial
ggstatsplot::ggcoefstats(
x = stats::glm(
formula = y ~ x,
family = stats::quasibinomial(link = "logit"),
data = df.4
),
title = "Family: Quasibinomial",
stats.label.color = "black"
),
# Family: Quasipoisson
ggstatsplot::ggcoefstats(
x = stats::glm(
formula = y ~ x,
family = stats::quasipoisson(link = "log"),
data = df.4
),
title = "Family: Quasipoisson",
stats.label.color = "black"
),
# Family: Gaussian
ggstatsplot::ggcoefstats(
x = stats::glm(
formula = Sepal.Length ~ Species,
family = stats::gaussian(link = "identity"),
data = iris
),
title = "Family: Gaussian",
stats.label.color = "black"
),
labels = c("(a)", "(b)", "(c)", "(d)", "(e)", "(f)"),
ncol = 2,
title.text = "Exploring models with different `glm` families",
title.color = "blue"
)
```
## generalized linear mixed-effects model (`glmer`)
In the previous example, we saw how being a female and being a child was
predictive of surviving the Titanic disaster. But in that analysis, we didn't
take into account one important factor: the passenger class in which people were
traveling. Naively, we have reasons to believe that the effects of sex and age
might be dependent on the class (maybe rescuing passengers in the first class
were given priority?). To take into account this hierarchical structure of the
data, we can run generalized linear mixed effects model with a random slope for
class.
```{r glmer, warning = FALSE, message = FALSE, fig.height = 6, fig.width = 6}
# plot
ggstatsplot::ggcoefstats(
x = lme4::glmer(
formula = Survived ~ Sex + Age + (Sex + Age | Class),
# select 20% of the sample to reduce the time of execution
data = dplyr::sample_frac(tbl = ggstatsplot::Titanic_full, size = 0.2),
family = stats::binomial(link = "logit"),
control = lme4::glmerControl(
optimizer = "Nelder_Mead",
calc.derivs = FALSE,
boundary.tol = 1e-7
)
),
exponentiate = TRUE,
stats.label.color = "black"
)
```
As we had expected, once we take into account the differential relationship that
might exist between survival and predictors across different passenger classes,
only the sex factor remain a significant predictor. In other words, being a
female was the strongest predictor of whether someone survived the tragedy that
befell the Titanic.
## cumulative link models (`clm`)
So far we have dealt either with continuous or nominal/factor responses (or
output variables), but sometimes we will encounter **ordinal** data (e.g.,
Likert scale measurement in behavioral sciences). In these cases, ordinal
regression models are more suitable. To study these models, we will use
`intent_morality` dataset included in the `ggstatsplot` package. This dataset
contains moral judgments ("how wrong was this behavior?", "how much punishment
does the agent deserve?"; on a Likert scale of 1-7) by participants about
third-party actors who harmed someone. There are four different conditions
formed out of belief (neutral, negative) and outcome (neutral, negative) for
four different vignettes, each featuring a different type of harm. The question
we are interested in is what explains variation in participants' rating:
information about intentionality, consequences, or their interaction?
```{r clm, warning = FALSE, message = FALSE, fig.height = 6, fig.width = 6}
# for reproducibility
set.seed(123)
# to speed up calculations, we will use just 10% of the dataset
ggstatsplot::ggcoefstats(
x = ordinal::clm(
formula = as.factor(rating) ~ belief * outcome,
link = "logit",
data = dplyr::sample_frac(tbl = ggstatsplot::intent_morality, size = 0.10),
control = ordinal::clm.control(maxIter = 50,
convergence = "silent"),
),
stats.label.color = "black",
title = "Cumulative Link Model (clm)",
subtitle = "(Using `ordinal` package)",
caption.summary = FALSE # suppress model diagnostics
) +
ggplot2::labs(x = "logit regression coefficient",
y = NULL)
```
As can be seen from this plot, both factors (intentionality and consequences)
were significant, and so was their interaction.
## cumulative link mixed models (`clmm`)
In the previous analysis, we carried out a single ordinal regression models to
see the effects intent and outcome information on moral judgments. But what if
we also want to account for item level differences (since different items had
different types of harm)? For this, we can use ordinal mixed-effects regression
model (with random effects for type of harm) to see how intent and outcome
contribute towards variation in moral judgment ratings-
```{r clmm1, warning = FALSE, message = FALSE, fig.height = 6, fig.width = 6}
# for reproducibility
set.seed(123)
# to speed up calculations, we will use just 10% of the dataset
ggstatsplot::ggcoefstats(
x = ordinal::clmm(
formula = as.factor(rating) ~ belief * outcome + (belief + outcome |
harm),
data = dplyr::sample_frac(tbl = ggstatsplot::intent_morality, size = 0.10),
control = ordinal::clmm.control(
method = "nlminb",
maxIter = 50,
gradTol = 1e-4,
innerCtrl = "noWarn"
)
),
title = "Cumulative Link Mixed Model (clmm)",
subtitle = "(Using `ordinal` package)"
) +
ggplot2::labs(x = "coefficient from ordinal mixed-effects regression",
y = "fixed effects")
```
Mixed effects regression didn't reveal any interaction effect. That is, most of
the variance was accounted for by the information about whether there was
harmful intent and whether there was harm, at least this is the effect we found
with these four types of (minor) harms.
Note that, by default, `beta` parameters are shown for `clm` and `clmm` models,
but you can also plot either just `alpha` or `both` using `ggcoefstats`.
```{r clmm2, warning = FALSE, message = FALSE, fig.height = 6, fig.width = 6}
# for reproducibility
set.seed(123)
# to speed up calculations, we will use just 10% of the dataset
ggstatsplot::ggcoefstats(
x = ordinal::clmm(
formula = as.factor(rating) ~ belief * outcome + (belief + outcome |
harm),
link = "logit",
data = dplyr::sample_frac(tbl = ggstatsplot::intent_morality, size = 0.10),
control = ordinal::clmm.control(
maxIter = 50,
gradTol = 1e-4,
innerCtrl = "noWarn"
)
),
coefficient.type = "alpha"
) +
ggplot2::labs(x = "logit regression coefficients",
y = "threshold parameters")
```
## repeated measures ANOVA (`aovlist`)
Let's now consider an example of a repeated measures design where we want to run
omnibus ANOVA with a specific error structure. To carry out this analysis, we
will first have to convert the iris dataset from wide to long format such that
there is one column corresponding to `attribute` (which part of the calyx of a
flower is being measured: `sepal` or `petal`?) and one column corresponding to
`measure` used (`length` or `width`?). Note that this is within-subjects design
since the same flower has both measures for both attributes. The question we are
interested in is how much of the variance in measurements is explained by both
of these factors and their interaction.
```{r aovlist1, warning = FALSE, message = FALSE, fig.height = 6, fig.width = 8}
# for reproducibility
set.seed(123)
# having a look at iris before converting to long format
dplyr::glimpse(iris)
# converting the iris dataset to long format
iris_long <- datasets::iris %>%
dplyr::mutate(.data = ., id = dplyr::row_number(x = Species)) %>%
tidyr::gather(
data = .,
key = "condition",
value = "value",
Sepal.Length:Petal.Width,
convert = TRUE,
factor_key = TRUE
) %>%
tidyr::separate(
col = "condition",
into = c("attribute", "measure"),
sep = "\\.",
convert = TRUE
) %>%
tibble::as_data_frame(x = .)
# looking at the long format data
dplyr::glimpse(x = iris_long)
# let's use 20% of the sample to speed up the analysis
iris_long_20 <- dplyr::sample_frac(tbl = iris_long, size = 0.20)
# specifying the model (note the error structure)
ggstatsplot::ggcoefstats(
x = stats::aov(formula = value ~ attribute * measure + Error(id / (attribute * measure)),
data = iris_long_20),
effsize = "eta",
partial = FALSE,
nboot = 50,
ggtheme = ggthemes::theme_fivethirtyeight(),
ggstatsplot.layer = FALSE,
stats.label.color = c("#0072B2", "#D55E00", "darkgreen"),
title = "Variation in measurements for Iris species",
subtitle = "Source: Iris data set (by Fisher or Anderson)"
) +
ggplot2::labs(caption = "Results from 2 by 2 RM ANOVA") +
ggplot2::theme(plot.subtitle = element_text(size = 11, face = "plain"))
```
As revealed by this analysis, all effects of this model are significant. But
most of the variance is explained by the `attribute`, with the next important
explanatory factor being the `measure` used. A very little amount of variation
in measurement is accounted for by the interaction between these two factors.
## robust regression (`lmRob`, `glmRob`)
The robust regression models, as implemented in robust package are also
supported. But since no 95% CI are available in this case, only the dots for
estimates will be shown.
```{r robust, warning = FALSE, message = FALSE, fig.height = 12, fig.width = 8}
ggstatsplot::combine_plots(
# plot 1: glmRob
ggstatsplot::ggcoefstats(
x = robust::glmRob(
formula = Survived ~ Sex,
data = dplyr::sample_frac(tbl = ggstatsplot::Titanic_full, size = 0.20),
family = stats::binomial(link = "logit")
),
title = "generalized robust linear model",
package = "dichromat",
palette = "BrowntoBlue.10",
ggtheme = ggthemes::theme_fivethirtyeight(),
ggstatsplot.layer = FALSE
),
# plot 2: lmRob
ggstatsplot::ggcoefstats(
x = robust::lmRob(formula = Sepal.Length ~ Sepal.Width * Species,
data = iris),
title = "robust linear model",
package = "awtools",
palette = "a_palette",
ggtheme = ggthemes::theme_tufte(),
ggstatsplot.layer = FALSE
),
# arguments relevant for `combine_plots` function
labels = c("(a)", "(b)"),
title.text = "Robust variants of lm and glm",
nrow = 2,
ncol = 1
)
```
## robust regression using an M estimator (`rlm`)
We have already seen an example of `MASS::rlm()` in the `README` document:
<https://cran.r-project.org/package=ggstatsplot/readme/README.html>
## quantile regression (`rq`)
We have already seen an example of quantile regression (`quantreg::rq()`) in the
`README` document:
<https://cran.r-project.org/package=ggstatsplot/readme/README.html>
## And much more...
This vignette was supposed to give just a taste for only *some* of the
regression models supported by `ggcoefstats`. The full list of supported models
will keep expanding as additional tidiers are added to the `broom` package:
<https://broom.tidyverse.org/articles/available-methods.html>
Note that not **all** models supported by `broom` will be supported by
`ggcoefstats`. In particular, classes of objects for which there is no estimates
present (e.g., `kmeans`) are not supported.
# Suggestions
If you find any bugs or have any suggestions/remarks, please file an issue on
GitHub: <https://github.com/IndrajeetPatil/ggstatsplot/issues>
# Session Information
Summarizing session information for reproducibility.
```{r session_info}
options(width = 200)
devtools::session_info()
```