- Features
- Installation
- Quick Start: Offline vs Online Usage
- Notes
- Usage Examples
- Performance Comparison: Offline vs Online
- C++ Integration
- References
- Authors and Contributors
- License
focus is a high-performance R package for online changepoint detection in univariate and multivariate data streams. The package provides efficient C++ implementations of the focus and md-focus algorithms with R interfaces for real-time monitoring and offline analysis.
- Multiple distributions: supports a range of models for Gaussian change-in-mean, Poisson change-in-rate, Gamma/Exponential change-in-scale, Bernoulli change-in-probability, as well as Non-parametric detectors. New models and cost functions are easy to implement!
- Univariate and multivariate: detect changes in univariate or multivariate sequences
- Known or unknown pre-change parameters: flexible modeling of both the generalised likelihood-ratio test (unknown pre-change) and the Page–CUSUM test (known pre-change)
- One-sided and two-sided detection: detects increases or decreases in the parameters, or both
- C++ backend: Language agnostic backend optimized for speed and scalability
You can install the development version of focus from source with:
# If you have devtools installed:
devtools::install_github("gtromano/unified_focus", subdir = "focus")Or, if you have the package source directory:
install.packages("path/to/focus", repos = NULL, type = "source")The package provides two modes of operation with identical statistical results but different performance characteristics:
All cycles and updates are handled internally in C++ for maximum efficiency. This approach is ideal for:
- Benchmarking and performance testing
- Batch processing of complete datasets
- Computing full statistic trajectories
Key behaviors:
- By default, stops immediately when threshold is exceeded
- Use
threshold = Infto compute statistics for all observations (useful for visualization)
# Generate data with a changepoint
set.seed(123)
Y <- c(rnorm(500, mean = 0), rnorm(500, mean = 2))
# Run offline detection (all processing in C++)
res <- focus_offline(Y, threshold = 20, type = "univariate", family = "gaussian")
# Plot results
plot(res$stat, type = "l", main = "FOCuS Detection Statistic (Offline)",
xlab = "Time", ylab = "Statistic", lwd = 2)
abline(h = res$threshold, col = "red", lty = 2, lwd = 2)
if (!is.null(res$detection_time)) {
abline(v = res$detection_time, col = "blue", lty = 2, lwd = 2)
legend("topleft",
legend = c("Threshold", "Detection time"),
col = c("red", "blue"), lty = 2, lwd = 2)
}An online implementation is also available—allowing you to update the detector sequentially from R. This will be inherently slower, since each update involves a call from R into C++. The online interface is useful for:
- Real-time or streaming scenarios
- Integration with other R workflows
- Custom stopping rules (adaptive thresholds)
- Implementation of your own costs function
# Create detector
det <- detector_create(type = "univariate")
# Update sequentially
stat_trace <- numeric(length(Y))
threshold <- 20
for (i in seq_along(Y)) {
detector_update(det, Y[i])
result <- get_statistics(det, family = "gaussian")
# note that the online interface supports modern R piping
# result <- det |> detector_update(Y[i]) |> get_statistics(family = "gaussian")
stat_trace[i] <- result$stat
if (result$stat > threshold) {
cat("Detection at time", i, "with changepoint estimate τ =", result$changepoint, "\n")
stat_trace <- stat_trace[1:i] # Truncate
break
}
}Detection at time 510 with changepoint estimate τ = 500
# Plot results
plot(stat_trace, type = "l", main = "FOCuS Detection Statistic (Online)",
xlab = "Time", ylab = "Statistic", lwd = 2)
abline(h = threshold, col = "red", lty = 2, lwd = 2)Both approaches produce the same statistical results. See the performance comparison section below for runtime benchmarks. ## Available Functions
focus_offline(Y, threshold, type, family, ...)
Run the FOCuS detector in batch/offline mode with all cycles handled in C++ for maximum efficiency. Stops at detection by default; usethreshold = Infto compute statistics for all observations.Y: Observation data (vector or matrix)threshold: Detection threshold(s). Can be:- Scalar: Single threshold applied to all statistics
- Vector: (In case of multiple values returned per statistics, see Notes below)
type: One of"univariate","univariate_one_sided","multivariate", or"npfocus"family: Distribution family -"gaussian","poisson","bernoulli","gamma", or"npfocus"theta0: (Optional) Baseline parameter vector for cost computationshape: (Optional) Shape parameter forfamily = "gamma"(required for gamma)dim_indexes: (Optional) List of dimension index vectors for multivariate projectionsquantiles: (Optional) Quantile vector fortype = "npfocus"pruning_mult,pruning_offset: Pruning parameters (default: 2, 1)side: Pruning side -"right"or"left"(default:"right")- Returns: List with
stat(matrix where each column is a statistic),changepoint,detection_time,detected_changepoint,candidates,threshold,n,type,family, andshape(if gamma)
detector_create(type, ...)
Create a new online detector object. Returns an Info object pointer.type: One of"univariate","univariate_one_sided","multivariate", or"npfocus"dim_indexes: (Optional) List of dimension index vectors for multivariate projectionsquantiles: (Optional) Quantile vector fortype = "npfocus"pruning_mult,pruning_offset: Pruning parameters (default: 2, 1)side: Pruning side -"right"or"left"(default:"right")
detector_update(detector, y)
Update the detector with a new observation vectory.detector: Info object pointer fromdetector_create()y: Observation vector (length must match detector dimension)
get_statistics(detector, family, theta0 = NULL, shape = NULL)
Compute changepoint statistics for the current state.detector: Info object pointerfamily: Distribution family -"gaussian","poisson","bernoulli","gamma", or"npfocus"theta0: (Optional) Baseline parameter vector for cost computationshape: (Optional) Shape parameter forfamily = "gamma"(required for gamma)- Returns: List with
stopping_time,changepoint, andstatstatcan be a scalar (for single statistic families) or vector (for multi-statistic families like npfocus)
detector_info_n(detector)- Get number of observations processeddetector_info_sn(detector)- Get cumulative sum state (vector for multivariate)detector_pieces_len(detector)- Get number of candidate changepointsdetector_candidates(detector)- Get all candidate changepoints as a data frame- Returns: Data frame with columns for candidate positions and their sufficient statistics
- Multiple Statistics: Some detectors (e.g.,
family = "npfocus") return multiple statistics. Infocus_offline(), thestatreturn value is a matrix where each row corresponds to a time point and each column corresponds to a statistic.- For single-statistic families, the matrix has one column (R treats this as a vector in many contexts)
- For multi-statistic families, use vectorized thresholds or a single threshold (with warning)
- Gamma Family: When using
family = "gamma", you must provide a positiveshapeparameter. The gamma cost function assumes this shape is known.
When the pre-change mean is unknown, FOCuS estimates it from the data:
# Generate data: 1000 obs at mean 0, then 500 obs at mean -1
set.seed(45)
Y <- c(rnorm(1000, mean = 0), rnorm(500, mean = -1))
# Offline detection (stops at detection)
res <- focus_offline(Y, threshold = 20, type = "univariate", family = "gaussian")
cat("Detection time:", res$detection_time, "\n")Detection time: 1035
cat("Estimated changepoint:", res$detected_changepoint, "\n")Estimated changepoint: 991
# Plot
plot(res$stat, type = "l", main = "FOCuS: Unknown Pre-change Mean",
xlab = "Time", ylab = "Statistic", lwd = 2)
abline(h = res$threshold, col = "red", lty = 2, lwd = 2)
abline(v = res$detection_time, col = "blue", lty = 2, lwd = 2)
abline(v = 1000, col = "green", lty = 3, lwd = 2)
legend("topleft",
legend = c("Statistic", "Threshold", "Detection", "True changepoint"),
col = c("black", "red", "blue", "green"),
lty = c(1, 2, 2, 3), lwd = 2)When you know the pre-change mean, provide it via theta0 (this will
give more power to detect changes more quickly!):
set.seed(45)
theta0 <- 0
Y <- c(rnorm(1000, mean = theta0), rnorm(500, mean = theta0 - 1))
# With known pre-change parameter
res_known <- focus_offline(Y, threshold = Inf, type = "univariate",
family = "gaussian", theta0 = theta0)
# Compare with unknown case
res_unknown <- focus_offline(Y, threshold = Inf, type = "univariate",
family = "gaussian")
# Plot comparison
par(mfrow = c(1, 2))
plot(res_known$stat, type = "l", main = "Known θ₀ = 0",
xlab = "Time", ylab = "Statistic", lwd = 2, ylim = range(c(res_known$stat, res_unknown$stat)))
abline(v = 1000, col = "green", lty = 3, lwd = 2)
plot(res_unknown$stat, type = "l", main = "Unknown θ₀",
xlab = "Time", ylab = "Statistic", lwd = 2, ylim = range(c(res_known$stat, res_unknown$stat)))
abline(v = 1000, col = "green", lty = 3, lwd = 2)par(mfrow = c(1, 1))Detect only increases (right-sided) or decreases (left-sided):
set.seed(789)
Y_increase <- c(rnorm(800, mean = 0), rnorm(400, mean = 1.5))
# Right-sided: detect only increases
res_right <- focus_offline(Y_increase, threshold = 30,
type = "univariate_one_sided",
family = "gaussian", side = "right")
cat("Right-sided detection at:", res_right$detection_time, "\n")Right-sided detection at: 816
# Left-sided: detect only decreases (won't detect in this data)
res_left <- focus_offline(Y_increase, threshold = 30,
type = "univariate_one_sided",
family = "gaussian", side = "left")
cat("Left-sided detection:",
ifelse(is.null(res_left$detection_time), "None", res_left$detection_time), "\n")Left-sided detection: None
# Plot comparison
par(mfrow = c(1, 2))
plot(res_right$stat, type = "l", main = "Right-sided (stat increases)",
xlab = "Time", ylab = "Statistic", lwd = 2)
abline(h = 30, col = "red", lty = 2)
if (!is.null(res_right$detection_time)) {
abline(v = res_right$detection_time, col = "blue", lty = 2)
}
plot(res_left$stat, type = "l", main = "Left-sided (no detection)",
xlab = "Time", ylab = "Statistic", lwd = 2)
abline(h = 30, col = "red", lty = 2)par(mfrow = c(1, 1))For vector-valued observations:
set.seed(42)
n <- 1500
p <- 3
# Create data: changepoint at t=1000
Y_multi <- rbind(
matrix(rnorm(1000 * p, mean = 0), ncol = p),
matrix(rnorm(500 * p, mean = 1.2), ncol = p)
)
# Run detection
res_multi <- focus_offline(Y_multi, threshold = 30,
type = "multivariate", family = "gaussian")
cat("Detection time:", res_multi$detection_time, "\n")Detection time: 1012
cat("Estimated changepoint:", res_multi$detected_changepoint, "\n")Estimated changepoint: 1000
cat("True changepoint: 1000\n")True changepoint: 1000
# Plot
plot(res_multi$stat, type = "l", main = "Multivariate FOCuS (p=3)",
xlab = "Time", ylab = "Statistic", lwd = 2)
abline(h = res_multi$threshold, col = "red", lty = 2, lwd = 2)
abline(v = res_multi$detection_time, col = "blue", lty = 2, lwd = 2)
abline(v = 1000, col = "green", lty = 3, lwd = 2)
legend("topleft",
legend = c("Threshold", "Detection", "True changepoint"),
col = c("red", "blue", "green"), lty = c(2, 2, 3), lwd = 2)Note: When you have more than 5 dimensions, computing the convex hull can become quite slow and you may not prune many points. A practical approach is to use a low-dimensional approximation of the hull, which often gives very similar results at a fraction of the computation time.
Here’s an example comparing full vs low-dimensional hull computation:
set.seed(42)
n <- 1000
p <- 6
# Create data: changepoint at t=1000
Y_multi <- rbind(
matrix(rnorm(5000 * p, mean = -1, 1), ncol = p),
matrix(rnorm(500 * p, mean = 1.2), ncol = p)
)
# Full multivariate detection
system.time(
res_multi <- focus_offline(Y_multi, threshold = Inf,
type = "multivariate", family = "gaussian")
) user system elapsed
9.022 0.128 9.151
# Low-dimensional projection approximation
dim_indexes <- generate_projection_indexes(6, 2)
system.time(
res_multi_approx <- focus_offline(Y_multi, threshold = Inf,
type = "multivariate", family = "gaussian",
dim_indexes = dim_indexes)
) user system elapsed
0.135 0.000 0.134
# Verify similarity
all.equal(res_multi$stat, res_multi_approx$stat)[1] "Mean relative difference: 0.003782673"
The anomaly_intensity parameter focuses on detecting anomalies
(epidemic changepoints) and more transient changes, while ignoring
longer, less intense changes. This is particularly useful when we have
an expected background rate, and we seek any deviations from it.
When anomaly_intensity is set to a positive value, candidates are
retained only if they show sufficient “signal intensity” — i.e., the
magnitude of the change relative to the segment length is at least
anomaly_intensity. This filtering occurs during candidate pruning and
helps reduce the number of spurious changepoints.
The following example runs the detector sequentially, and flags an alarm whilist we’re in the anomalous period.
# Create synthetic data with brief anomalies
set.seed(999)
n <- 1000
Y_anom <- c(
rnorm(n/2, mean = 0),
rnorm(10, mean = -3), # Brief, weak anomaly
rnorm(n/2, mean = 0),
rnorm(10, mean = 5), # Stronger brief anomaly
rnorm(n/2, mean = 0)
)
# Online (sequential) detection
det <- detector_create(type = "univariate", anomaly_intensity = 2)
threshold <- 20
in_anom <- FALSE
starts <- integer(0)
ends <- integer(0)
stat_trace <- numeric(length(Y_anom))
for (i in seq_along(Y_anom)) {
detector_update(det, Y_anom[i])
res <- get_statistics(det, family = "gaussian", theta0 = 0)
stat <- res$stat
stat_trace[i] <- stat
# robustly handle NULL/NA
if (is.null(stat) || length(stat) == 0) stat <- 0
if (!in_anom && stat > threshold) {
in_anom <- TRUE
starts <- c(starts, res$changepoint)
cat("anomaly starting at", i, "\n")
}
if (in_anom && stat <= threshold) {
in_anom <- FALSE
ends <- c(ends, res$changepoint)
cat("anomaly ending at", i, "\n")
}
}anomaly starting at 503
anomaly ending at 513
anomaly starting at 1011
anomaly ending at 1037
# If we were still in an anomaly at the end, close it
if (in_anom) {
ends <- c(ends, length(Y_anom))
cat("anomaly ending at", length(Y_anom), "(end of series)", "\n")
}
# Plot the data and mark starts/ends
par(mfrow = c(2, 1))
plot(Y_anom, type = "l", main = "Data with Detected Anomalies",
xlab = "Time", ylab = "Value", lwd = 1.2)
if (length(starts) > 0) abline(v = starts, col = "green", lty = 2)
if (length(ends) > 0) abline(v = ends, col = "red", lty = 2)
legend("topright", legend = c("an. start", "an. end"), col = c("green", "red"), lty = 2)
plot(stat_trace, type = "l", main = "Statistics Trace",
xlab = "Time", ylab = "Statistic", lwd = 2)par(mfrow = c(2, 1))- Default behavior (
anomaly_intensity = NULL): All candidates are retained based on standard pruning rules. - When set to a positive value: Candidates are filtered based on the
infinity norm of the signal-to-length ratio. A candidate survives only
if at least one dimension has a signal magnitude ≥
anomaly_intensity.
This section collects examples for the exponential-family models
implemented in focus: Poisson, Bernoulli, and Gamma. All
three examples show how to run an offline detection using
focus::focus_offline() – but the online, sequential iplementation
would work as well (see below for an example).
Univariate and multivariate Bernoulli examples. Here each observation is a 0/1 draw and the success probability changes at the halfway point.
# univariate bernoulli example
set.seed(123)
n <- 2000
Y_bern <- c(rbinom(n/2, size = 1, prob = 0.2), rbinom(n/2, size = 1, prob = 0.5))
system.time({
res_bern <- focus::focus_offline(Y_bern, threshold = Inf, type = "univariate", family = "bernoulli")
}) user system elapsed
0.003 0.000 0.004
plot(res_bern$stat, main = "Bernoulli (univariate): change in success probability")# multivariate bernoulli example (two binary streams)
Y_bern_multi <- cbind(
c(rbinom(n/2, size = 1, prob = 0.2), rbinom(n/2, size = 1, prob = 0.5)),
c(rbinom(n/2, size = 1, prob = 0.3), rbinom(n/2, size = 1, prob = 0.6))
)
system.time({
res_bern_multi <- focus::focus_offline(Y_bern_multi, threshold = Inf, type = "multivariate", family = "bernoulli")
}) user system elapsed
0.02 0.00 0.02
plot(res_bern_multi$stat, main = "Bernoulli (multivariate): two streams")A simple Poisson change-in-rate example: counts switch from a low to a higher rate halfway through the sequence.
# Poisson change in rate example
set.seed(101)
n <- 2000
Y_pois <- c(rpois(n/2, lambda = 2), rpois(n/2, lambda = 6))
system.time({
res_pois <- focus::focus_offline(Y_pois, threshold = Inf, type = "univariate", family = "poisson")
}) user system elapsed
0.002 0.000 0.003
plot(res_pois$stat, main = "Poisson: change in rate (lambda)")The Gamma example requires specifying the shape parameter (positive
scalar). In the focus API the shape argument is passed to the cost
function; theta0 is used as the null value for the relevant Gamma
parameter (scale/rate depending on your parameterisation in the
implementation). Make sure shape > 0. Note: the shape parameter
is specific to the Gamma family. If you pass shape when using
another family it will be ignored (hopefully).
# gamma change in scale example
set.seed(124)
n <- 2000
# shape = 2; first half scale=2, second half scale=0.5
Y_gamma <- c(rgamma(n/2, shape = 2, scale = 2), rgamma(n/2, shape = 2, scale = 0.5))
system.time({
# note: shape = 2 must be provided for family='gamma'
res_gamma <- focus::focus_offline(Y_gamma, threshold = Inf, type = "univariate", family = "gamma", shape = 2, theta0 = 2)
}) user system elapsed
0.003 0.000 0.003
plot(res_gamma$stat, main = "Gamma: change in scale (shape = 2)")A key feature of the focus package is that the detector type (how candidates are managed) is completely independent of the statistical model (how costs are computed). This means you can create a detector with one data type and compute statistics using different distributional assumptions.
Here’s an example using Poisson-generated data but comparing Gaussian and Poisson statistics:
set.seed(2024)
# Generate Poisson count data with a rate change
Y_counts <- c(rpois(500, lambda = 10), rpois(500, lambda = 15))
# Create a univariate detector (same detector for both)
det <- detector_create(type = "univariate")
# Update with all data
for (i in seq_along(Y_counts)) {
detector_update(det, Y_counts[i])
}
# Compare Gaussian vs Poisson statistics on the SAME detector state
result_gaussian <- get_statistics(det, family = "gaussian")
result_poisson <- get_statistics(det, family = "poisson", theta0 = 10)
cat("Using Gaussian statistic:\n")Using Gaussian statistic:
cat(" Changepoint:", result_gaussian$changepoint, "\n") Changepoint: 500
cat(" Statistic:", round(result_gaussian$stat, 2), "\n\n") Statistic: 6426.22
cat("Using Poisson statistic (more appropriate for count data):\n")Using Poisson statistic (more appropriate for count data):
cat(" Changepoint:", result_poisson$changepoint, "\n") Changepoint: 500
cat(" Statistic:", round(result_poisson$stat, 2), "\n") Statistic: 521.28
# Compute full trajectories for comparison
det2 <- detector_create(type = "univariate")
stat_gaussian <- numeric(length(Y_counts))
stat_poisson <- numeric(length(Y_counts))
for (i in seq_along(Y_counts)) {
detector_update(det2, Y_counts[i])
stat_gaussian[i] <- get_statistics(det2, family = "gaussian")$stat
stat_poisson[i] <- get_statistics(det2, family = "poisson", theta0 = 10)$stat
}
# Plot comparison
par(mfrow = c(1, 2))
plot(stat_gaussian, type = "l", main = "Gaussian Statistic on Poisson Data",
xlab = "Time", ylab = "Statistic", lwd = 2, col = "blue")
abline(v = 500, col = "green", lty = 3, lwd = 2)
plot(stat_poisson, type = "l", main = "Poisson Statistic on Poisson Data",
xlab = "Time", ylab = "Statistic", lwd = 2, col = "red")
abline(v = 500, col = "green", lty = 3, lwd = 2)par(mfrow = c(1, 1))This flexibility allows you to:
- Test different statistical models on the same data
- Use the same detector infrastructure across different distributions
- Implement your own functions! The optimal change candidates are always accessible via:
head(detector_candidates(det2)) tau st side
1 0 0 right
2 437 4263 right
3 459 4481 right
4 461 4502 right
5 500 4916 right
6 521 5210 right
Additionally, we find the NPFOCUS implementation for non-parametric changepoint detection. This requires specifying quantiles to be used in the cost function. Generally this quantiles can be estimated from the data or specified based on domain knowledge.
Note that NPFOCuS returns two statistics: one based on summing over quantiles, and one based on taking the maximum over quantiles. In the offline version both statistics are returned in a matrix with two columns.
set.seed(123)
Y <- c(rnorm(1000), rcauchy(200))
quants <- qnorm(seq(0.01, .99, length.out = 5))
res <- focus_offline(
Y = Y,
threshold = c(80, 25), # detection threshold (one for sum, one for max)
type = "npfocus", # creates a NonparametricInfo
family = "npfocus", # uses compute_costs_npfocus
quantiles = quants, # REQUIRED for type == "npfocus"
)
par(mfrow = c(3, 1))
plot(Y[1:nrow(res$stat)])
plot(res$stat[, 1], type = "l", main = "NPFOCuS Detection Statistic (sum)",
xlab = "Time", ylab = "Statistic", lwd = 2)
plot(res$stat[, 2], type = "l", main = "NPFOCuS Detection Statistic (max)",
xlab = "Time", ylab = "Statistic", lwd = 2)par(mfrow = c(1, 1))In the online setting, the quantile vector must be provided when creating the detector. The statistic can be retrieved as usual, however it will be a vector of length 2 (sum and max statistics).
set.seed(123)
det <- detector_create("npfocus", quantiles = quants)
det <- det |>
detector_update(Y[1]) |>
detector_update(Y[2]) |>
detector_update(Y[3])
det |> get_statistics(family="npfocus")$stopping_time
[1] 3
$changepoint
NULL
$stat
[1] 3.819085 1.909543
The library also supports changepoint detection in AutoRegressive (AR) processes. This is useful for detecting changes in the mean of AR(p) processes while accounting for the temporal dependencies. The AR coefficients can be specified (or estimated from data). As in the iid cases, the pre-change mean can be provided if known, however this has to be done when creating the detector (as the parameter is necessary for pruning logic).
set.seed(123)
# Generate AR(2) process with changepoint
n_pre <- 500
n_post <- 500
ar_coefs <- c(0.7, -0.3) # AR(2) coefficients
Y_pre <- arima.sim(n = n_pre, model = list(ar = ar_coefs), sd = 1)
Y_post <- 2 + arima.sim(n = n_post, model = list(ar = ar_coefs), sd = 1)
Y <- c(Y_pre, Y_post)
# Estimate AR parameters from pre-change data (in practice, use historical data)
ar_model <- ar(arima.sim(n = 300, model = list(ar = ar_coefs), sd = 1), order.max = 2, method = "mle")
rho_est <- ar_model$ar # Estimated AR coefficients
# Run offline ARP detection
res <- focus_offline(
Y = Y,
threshold = 20,
type = "arp",
rho = rho_est
)
# Plot results
par(mfrow = c(2, 1))
plot(Y, type = "l", main = "AR(2) Process with Mean Shift",
xlab = "Time", ylab = "Value")
abline(v = n_pre, col = "red", lty = 2, lwd = 2) # True changepoint
plot(res$stat[, 1], type = "l", main = "ARP Detection Statistic",
xlab = "Time", ylab = "Statistic", lwd = 2)
abline(h = res$threshold, col = "blue", lty = 2)
if (!is.null(res$detection_time)) {
abline(v = res$detection_time, col = "red", lty = 2, lwd = 2)
}par(mfrow = c(1, 1))
# Show detection results
cat("Detection time:", res$detection_time, "\n")Detection time: 510
cat("Estimated changepoint:", res$detected_changepoint, "\n")Estimated changepoint: 500
cat("True changepoint:", n_pre, "\n")True changepoint: 500
And in the online setting:
set.seed(123)
# Create online ARP detector
det <- detector_create("arp", rho = rho_est)
stat_trace <- numeric(length(Y))
for (i in seq_along(Y)) {
detector_update(det, Y[i])
result <- get_statistics(det, family = "arp")
stat_trace[i] <- result$stat
if (result$stat > 20) {
cat("Detection at time", i, "with changepoint estimate τ =", result$changepoint, "\n")
stat_trace <- stat_trace[1:i] # trucate the trace
break
}
}Detection at time 511 with changepoint estimate τ = 500
# Plot results
plot(stat_trace, type = "l", main = "Online ARP Detection Statistic",
xlab = "Time", ylab = "Statistic", lwd = 2)Here’s a direct runtime comparison between offline and online modes on the same data:
# Generate larger dataset for benchmarking
set.seed(999)
n <- 100000
Y_bench <- c(rnorm(n/2, mean = 0), rnorm(n/2, mean = 1))
# Benchmark offline mode
cat("Offline mode (C++ loop):\n")Offline mode (C++ loop):
time_offline <- system.time({
res_offline <- focus_offline(Y_bench, threshold = Inf,
type = "univariate", family = "gaussian")
})
print(time_offline) user system elapsed
0.149 0.001 0.150
# Benchmark online mode
cat("\nOnline mode (R loop with C++ calls):\n")Online mode (R loop with C++ calls):
time_online <- system.time({
det <- detector_create(type = "univariate")
stat_online <- numeric(n)
for (i in seq_along(Y_bench)) {
detector_update(det, Y_bench[i])
result <- get_statistics(det, family = "gaussian")
stat_online[i] <- result$stat
}
})
print(time_online) user system elapsed
0.359 0.000 0.359
# Verify both produce identical results
cat("\nResults identical:", all.equal(as.vector(res_offline$stat), stat_online), "\n")Results identical: TRUE
# Speedup factor
speedup <- time_online["elapsed"] / time_offline["elapsed"]
cat("Offline mode is", round(speedup, 1), "x faster\n")Offline mode is 2.4 x faster
If you wish to use the library entirely in C++ (for maximum speed or integration into other C++ projects), you can do so by following the patterns in the source code. The key classes are:
Infoand derived classes (UnivariateInfo,MultivariateInfo)- Cost functions (
compute_costs_gaussian,compute_costs_poisson) ChangepointResultstructure
Example C++ usage:
#include "Info.h"
#include "Costs.h"
// Create detector
auto info = std::make_shared<UnivariateInfo>();
// Update with data
for (const auto& y : data) {
info->update({y});
auto result = compute_costs_gaussian(*info, {theta0});
if (result.stat.value() > threshold) {
// Detection!
break;
}
}Pishchagina, Liudmila, Gaetano Romano, Paul Fearnhead, Vincent Runge, and Guillem Rigaill. 2025. “Online Multivariate Changepoint Detection: Leveraging Links with Computational Geometry.” Journal of the Royal Statistical Society Series B: Statistical Methodology: qkaf046. https://doi.org/10.1093/jrsssb/qkaf046
Romano, Gaetano, Idris A. Eckley, and Paul Fearnhead. 2024. “A Log-Linear Nonparametric Online Changepoint Detection Algorithm Based on Functional Pruning.” IEEE Transactions on Signal Processing 72: 594–606. https://doi.org/10.1109/tsp.2023.3343550.
Romano, Gaetano, Idris A Eckley, Paul Fearnhead, and Guillem Rigaill. 2023. “Fast Online Changepoint Detection via Functional Pruning CUSUM Statistics.” Journal of Machine Learning Research 24 (81): 1–36. https://www.jmlr.org/papers/v24/21-1230.html.
Ward, Kes, Gaetano Romano, Idris Eckley, and Paul Fearnhead. 2024. “A Constant-Per-Iteration Likelihood Ratio Test for Online Changepoint Detection for Exponential Family Models.” Statistics and Computing 34 (3): 1–11.
-
Gaetano Romano: email (Author) (Maintainer) (Creator) (Translator)
-
Kes Ward: email (Author)
-
Yuntang Fan: email (Author)
-
Guillem Rigaill: email (Author)
-
Vincent Runge: email (Author)
-
Paul Fearnhead: email (Author)
-
Idris A. Eckley: email (Author)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/gpl-3.0.txt.
The package bundles code from code from Qhull (http://www.qhull.org/), from C.B. Barber and The Geometry Center. Qhull is free software and may be obtained via http from www.qhull.org. For details, see inst/COPYRIGHTS/qhull














