Skip to content

Commit

Permalink
Updates
Browse files Browse the repository at this point in the history
  • Loading branch information
ChristianGoueguel committed Apr 19, 2024
1 parent 355f4a3 commit 531f2b6
Show file tree
Hide file tree
Showing 13 changed files with 251 additions and 96 deletions.
5 changes: 3 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: ConfidenceEllipse
Type: Package
Title: Computation of 2D and 3D Confidence Ellipse Coordinates
Title: Computation of 2D and 3D Elliptical Joint Confidence Regions
Version: 1.0.0
Authors@R:
person(given = "Christian L.",
Expand All @@ -9,12 +9,13 @@ Authors@R:
email = "christian.goueguel@gmail.com",
comment = c(ORCID = "https://orcid.org/0000-0003-0521-3446"))
Maintainer: Christian L. Goueguel <christian.goueguel@gmail.com>
Description: Functions designed to calculate confidence ellipses and ellipsoids at a specified confidence level.
Description: Computing elliptical joint confidence regions at a specified confidence level. It provides the flexibility to estimate either classical or robust confidence regions, which can be visualized in 2D or 3D plots. The classical approach assumes normality and uses the mean and covariance matrix to define the confidence regions. Alternatively, the robustified version employs estimators like minimum covariance determinant (MCD) and M-estimator, making them less sensitive to outliers and departures from normality. Furthermore, the functions allow users to group the dataset based on categorical variables and estimate separate confidence regions for each group. This capability is particularly useful for exploring potential differences or similarities across subgroups within a dataset. Varmuza and Filzmoser (2009, ISBN:978-1-4200-5947-2). Johnson and Wichern (2007, ISBN:0-13-187715-1). Raymaekers and Rousseeuw (2019) <DOI:10.1080/00401706.2019.1677270>.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.1
Imports:
cellWise,
dplyr,
forcats,
ggplot2,
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
export("%>%")
export(confidence_ellipse)
export(confidence_ellipsoid)
importFrom(cellWise,estLocScale)
importFrom(cellWise,wrap)
importFrom(forcats,as_factor)
importFrom(ggplot2,aes)
importFrom(ggplot2,geom_line)
Expand Down
2 changes: 2 additions & 0 deletions R/ConfidenceEllipse-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
"_PACKAGE"

## usethis namespace: start
#' @importFrom cellWise estLocScale
#' @importFrom cellWise wrap
#' @importFrom forcats as_factor
#' @importFrom ggplot2 aes
#' @importFrom ggplot2 geom_line
Expand Down
30 changes: 19 additions & 11 deletions R/confidence_ellipse.R
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
#' @title A Function that Computes the Confidence Ellipse Coordinates for Bivariate Normal Data (with Optional Grouping)
#' @title Confidence Ellipse Coordinates
#' @author Christian L. Goueguel
#' @description This function computes a confidence ellipse (assuming bivariate normality) at a specified confidence level.
#' @param .data The data frame or tibble.
#' @param x The unquoted column name for the x-axis variable.
#' @param y The unquoted column name for the y-axis variable.
#' @param .group_by The unquoted column name for the grouping variable (optional). Note that this grouping variable must be a factor.
#' @param conf_level The confidence level for the ellipse (0.95 by default).
#' @return A data frame of the coordinates points of the ellipse.
#' @description Compute the coordinate points of confidence ellipses at a specified confidence level.
#' @param .data data frame or tibble.
#' @param x column name for the x-axis variable.
#' @param y column name for the y-axis variable.
#' @param .group_by column name for the grouping variable (`NULL` by default). Note that this grouping variable must be a factor.
#' @param conf_level confidence level for the ellipse (0.95 by default).
#' @param robust optional (`FALSE` by default). When set to `TRUE`, it indicates that robust estimation method is employed to calculate the coordinates of the ellipse. The location is the 1-step M-estimator with the biweight psi function. The scale is the Minimum Covariance Determinant (MCD) estimator. Raymaekers and Rousseeuw (2019).
#' @return Data frame of the coordinates points.
#' @export confidence_ellipse
#' @examples
#' # Data
Expand All @@ -20,7 +21,7 @@
#' .group_by = glassType
#' )
#'
confidence_ellipse <- function(.data, x, y, .group_by = NULL, conf_level = 0.95) {
confidence_ellipse <- function(.data, x, y, .group_by = NULL, conf_level = 0.95, robust = FALSE) {
if (missing(.data)) {
stop("Missing 'data' argument.")
}
Expand All @@ -34,8 +35,15 @@ confidence_ellipse <- function(.data, x, y, .group_by = NULL, conf_level = 0.95)
stop("'conf_level' must be between 0 and 1.")
}
transform_data <- function(.x, conf_level) {
mean_vec <- colMeans(.x)
cov_matrix <- stats::cov(.x)
if (robust == FALSE) {
mean_vec <- colMeans(.x)
cov_matrix <- stats::cov(.x)
} else {
locScale <- cellWise::estLocScale(.x, type = "wrap", center = TRUE, nLocScale = 50e3)
X_wrap <- cellWise::wrap(.x, locScale[["loc"]], locScale[["scale"]], imputeNA = FALSE, checkPars = list(coreOnly = TRUE)) %>% purrr::pluck("Xw")
mean_vec <- colMeans(X_wrap)
cov_matrix <- stats::cov(X_wrap)
}
if (any(is.na(cov_matrix))) {
stop("Covariance matrix contains NA values.")
}
Expand Down
31 changes: 20 additions & 11 deletions R/confidence_ellipsoid.R
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
#' @title A Function that Computes the 3D Confidence Ellipsoid Coordinates for Trivariate Normal Data (with Optional Grouping)
#' @title Confidence Ellipsoid Coordinates
#' @author Christian L. Goueguel
#' @param .data The data frame or tibble.
#' @param x The unquoted column name for the x-axis variable.
#' @param y The unquoted column name for the y-axis variable.
#' @param z The unquoted column name for the z-axis variable.
#' @param .group_by The unquoted column name for the grouping variable (optional). Note that this grouping variable must be a factor.
#' @param conf_level The confidence level for the ellipse (0.95 by default).
#' @return A data frame of the coordinates points of the ellipse.
#' @description Compute the coordinate points of confidence ellipsoids at a specified confidence level.
#' @param .data data frame or tibble.
#' @param x column name for the x-axis variable.
#' @param y column name for the y-axis variable.
#' @param z column name for the z-axis variable.
#' @param .group_by column name for the grouping variable (`NULL` by default). Note that this grouping variable must be a factor.
#' @param conf_level confidence level for the ellipsoid (0.95 by default).
#' @param robust optional (`FALSE` by default). When set to `TRUE`, it indicates that robust estimation method is employed to calculate the coordinates of the ellipsoid. The location is the 1-step M-estimator with the biweight psi function. The scale is the Minimum Covariance Determinant (MCD) estimator. Raymaekers and Rousseeuw (2019).
#' @return Data frame of the coordinate points.
#' @export confidence_ellipsoid
#' @examples
#' # Data
Expand All @@ -21,7 +23,7 @@
#' .group_by = glassType
#' )
#'
confidence_ellipsoid <- function(.data, x, y, z, .group_by = NULL, conf_level = 0.95) {
confidence_ellipsoid <- function(.data, x, y, z, .group_by = NULL, conf_level = 0.95, robust = FALSE) {
if (missing(.data)) {
stop("Missing 'data' argument.")
}
Expand All @@ -35,8 +37,15 @@ confidence_ellipsoid <- function(.data, x, y, z, .group_by = NULL, conf_level =
stop("'conf_level' must be between 0 and 1.")
}
transform_data <- function(.x, conf_level) {
mean_vec <- colMeans(.x)
cov_matrix <- stats::cov(.x)
if (robust == FALSE) {
mean_vec <- colMeans(.x)
cov_matrix <- stats::cov(.x)
} else {
locScale <- cellWise::estLocScale(.x, type = "wrap", center = TRUE, nLocScale = 50e3)
X_wrap <- cellWise::wrap(.x, locScale[["loc"]], locScale[["scale"]], imputeNA = FALSE, checkPars = list(coreOnly = TRUE)) %>% purrr::pluck("Xw")
mean_vec <- colMeans(X_wrap)
cov_matrix <- stats::cov(X_wrap)
}
if (any(is.na(cov_matrix))) {
stop("Covariance matrix contains NA values.")
}
Expand Down
100 changes: 77 additions & 23 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
out.width = "100%",
fig.retina = 2
)
```

Expand Down Expand Up @@ -45,6 +46,7 @@ You can install the development version of `ConfidenceEllipse` like so:
```{r message=FALSE, warning=FALSE}
library(magrittr)
library(tidyselect)
library(patchwork)
library(dplyr)
library(ggplot2)
library(purrr)
Expand All @@ -71,36 +73,93 @@ data("glass", package = "ConfidenceEllipse")
glass %>% glimpse()
```

### Confidence Ellipse
### Confidence Region
#### Ellipse

First, the `confidence_ellipse` function is used to compute coordinate points of the confidence ellipse and then the ellipse is plotted on a two-dimensional plot `x` and `y` of the data. Points that lie outside the ellipse are considered to be outliers, while points that lie within the ellipse are considered to be part of the underlying distribution with the specified confidence level `conf_level`.

```{r message=FALSE, warning=FALSE}
ellipse_99 <- confidence_ellipse(glass, x = SiO2, y = Na2O, conf_level = 0.99)
ellipse_95 <- confidence_ellipse(glass, x = SiO2, y = Na2O, conf_level = 0.95)
ellipse_90 <- confidence_ellipse(glass, x = SiO2, y = Na2O, conf_level = 0.90)
rob_ellipse_95 <- confidence_ellipse(glass, x = SiO2, y = Na2O, conf_level = 0.95, robust = TRUE)
```

```{r message=FALSE, warning=FALSE}
ellipse_99 %>% glimpse()
ellipse_95 %>% glimpse()
```

```{r}
ggplot() +
geom_path(data = ellipse_99, aes(x = x, y = y), color = "red", linewidth = 1L) +
cutoff <- qchisq(0.95, df = 2)
MDsquared <- glass %>%
select(SiO2, Na2O) %>%
as.matrix() %>%
mahalanobis(colMeans(.), cov(.), inverted = FALSE)
```

```{r}
plot1 <-
ggplot() +
geom_path(data = ellipse_95, aes(x = x, y = y), color = "blue", linewidth = 1L) +
geom_path(data = ellipse_90, aes(x = x, y = y), color = "green", linewidth = 1L) +
geom_point(data = glass, aes(x = SiO2, y = Na2O), color = "black", size = 3L) +
scale_color_brewer(palette = "Set1", direction = 1) +
geom_point(
data = glass %>%
mutate(md = MDsquared) %>%
filter(md <= cutoff),
aes(x = SiO2, y = Na2O),
shape = 21L, color = "black", fill = "lightblue", size = 3L) +
geom_point(
data = glass %>%
mutate(md = MDsquared) %>%
filter(md > cutoff),
aes(x = SiO2, y = Na2O),
shape = 21L, color = "black", fill = "gold", size = 3L) +
labs(
x = "SiO2 (wt.%)",
y = "Na2O (wt.%)",
title = "Classical confidence ellipse\nat 95% confidence level") +
theme_bw() +
theme(legend.position = "none")
```

```{r}
x_mcd <- glass %>%
select(SiO2, Na2O) %>%
as.matrix() %>%
robustbase::covMcd()
rob_MDsquared <- glass %>%
select(SiO2, Na2O) %>%
as.matrix() %>%
mahalanobis(x_mcd$center, x_mcd$cov)
```

```{r}
plot2 <-
ggplot() +
geom_path(data = rob_ellipse_95, aes(x = x, y = y), color = "blue", linewidth = 1L) +
geom_point(
data = glass %>%
mutate(md = rob_MDsquared) %>%
filter(md <= cutoff),
aes(x = SiO2, y = Na2O),
shape = 21L, color = "black", fill = "lightblue", size = 3L) +
geom_point(
data = glass %>%
mutate(md = rob_MDsquared) %>%
filter(md > cutoff),
aes(x = SiO2, y = Na2O),
shape = 21L, color = "black", fill = "gold", size = 3L) +
labs(
x = "SiO2 (wt.%)",
y = "Na2O (wt.%)",
title = "Confidence ellipses at 90%, 95%, and 99% levels") +
title = "Robust confidence ellipse\nat 95% confidence level") +
theme_bw() +
theme(legend.position = "none")
```

#### Grouping
```{r}
plot1 | plot2
```

##### Grouping

For grouping bivariate data, the `.group_by` argument can be used if the data contains an unique grouping variable (`.group_by = NULL` by default). When a grouping variable is provided, the function will compute the ellipses separately for each level of the factor, allowing you to explore potential differences or patterns within subgroups of the data.

Expand All @@ -117,30 +176,25 @@ rpca_scores <- glass %>%
```

```{r message=FALSE, warning=FALSE}
ellipse_pca <- rpca_scores %>%
confidence_ellipse(x = PC1, y = PC2, .group_by = glassType)
ellipse_pca <- rpca_scores %>% confidence_ellipse(x = PC1, y = PC2, .group_by = glassType)
```

```{r}
ggplot() +
geom_point(data = rpca_scores, aes(x = PC1, y = PC2, colour = glassType, shape = glassType), size = 3L) +
geom_path(data = ellipse_pca, aes(x = x, y = y, colour = glassType), linewidth = 1L) +
geom_point(data = rpca_scores, aes(x = PC1, y = PC2, color = glassType, shape = glassType), size = 3L) +
geom_path(data = ellipse_pca, aes(x = x, y = y, color = glassType), linewidth = 1L) +
scale_color_brewer(palette = "Set1", direction = 1) +
labs(
x = "PC1",
y = "PC2",
title = "Principal component analysis") +
labs(x = "PC1", y = "PC2", title = "Principal component analysis") +
theme_bw() +
theme(legend.position = "none")
```

### Confidence Ellipsoid
#### Ellipsoid

The `confidence_ellipsoid` function accepts an additional variable `z` and computes the ellipsoid for trivariate data.

```{r message=FALSE, warning=FALSE}
ellipsoid_grp <- glass %>%
confidence_ellipsoid(SiO2, Na2O, Fe2O3, glassType)
ellipsoid_grp <- glass %>% confidence_ellipsoid(SiO2, Na2O, Fe2O3, glassType)
```

```{r}
Expand Down
Loading

0 comments on commit 531f2b6

Please sign in to comment.