Skip to content

Convert tests from testthat to testit#625

Open
yihui wants to merge 33 commits into
mainfrom
convert-tests-to-testit
Open

Convert tests from testthat to testit#625
yihui wants to merge 33 commits into
mainfrom
convert-tests-to-testit

Conversation

@yihui
Copy link
Copy Markdown
Collaborator

@yihui yihui commented May 6, 2026

Summary

  • Replace all testthat tests with testit equivalents (60 R test files + 2 markdown snapshot files)
  • Snapshot tests (as_gt, as_rtf) converted to testit's markdown-based .md format with output embedded inline
  • DESCRIPTION updated: testthat (>= 3.0.0) replaced with testit
  • Key conversions: test_that()assert(), expect_equal()all.equal(), expect_identical()%==%, expect_error()has_error(), expect_true()(expr)

Motivation

> # testthat dependencies
> setdiff(unlist(tools::package_dependencies('testthat', recursive = TRUE)), xfun::base_pkgs())
 [1] "brio"      "callr"     "cli"       "desc"      "evaluate"  "jsonlite"  "lifecycle"
 [8] "magrittr"  "pkgload"   "praise"    "processx"  "ps"        "R6"        "rlang"    
[15] "waldo"     "withr"     "fs"        "glue"      "pkgbuild"  "rprojroot" "diffobj"  
[22] "crayon"   

> # testit dependencies (none)
> setdiff(unlist(tools::package_dependencies('testit', recursive = TRUE)), xfun::base_pkgs())
character(0)

Switching from testthat to testit

Part 1: Migration Guide

Test file structure

testthat testit
tests/testthat.R tests/*.R (any name)
tests/testthat/test-*.R tests/testit/test-*.R
tests/testthat/helper-*.R tests/testit/helper*.R
tests/testthat/_snaps/*.md tests/testit/test-*.md

R runs all .R scripts in tests/ during R CMD check. The filename does not matter — tests/testthat.R is merely a convention that testthat's tooling creates. testit likewise does not require any specific filename, and you can have multiple runner scripts. For example:

# tests/test-all.R
library(testit)
test_pkg("pkgname")

You can also split tests into multiple runner scripts, each calling test_pkg() with a different directory:

# tests/test-core.R
library(testit)
test_pkg("pkgname", dir = "core")

# tests/test-slow.R — only run when not on CRAN
library(testit)
if (identical(Sys.getenv("NOT_CRAN"), "true")) {
  test_pkg("pkgname", dir = "slow")
}

This provides a natural way to conditionally skip entire groups of tests (the testit equivalent of testthat's skip_on_cran()) — simply guard the test_pkg() call with a condition.

Core pattern

testthat:

test_that("description", {
  expect_true(condition)
  expect_equal(a, b)
})

testit:

assert("description", {
  (condition)
  (a == b)
})

In testit, any expression wrapped in () inside assert() is checked — if it evaluates to TRUE or a vector of TRUE values, it passes; anything else is a failure. The expression can be any R code: (x > 0), (is.data.frame(df)), (nrow(x) == 10), etc. For approximate numeric comparison, you can use (all.equal(a, b)) — it returns TRUE on success or a descriptive string on failure, both of which testit handles correctly.

Assertion mappings

testthat testit
expect_true(x) (x)
expect_false(x) (!x)
expect_equal(a, b) (all.equal(a, b))
expect_equal(a, b, tolerance = t) (all.equal(a, b, tolerance = t))
expect_identical(a, b) (identical(a, b))
expect_null(x) (is.null(x))
expect_length(x, n) (length(x) == n)
expect_s3_class(x, "cls") (inherits(x, "cls"))
expect_gt(a, b) (a > b)
expect_gte(a, b) (a >= b)
expect_lt(a, b) (a < b)
expect_lte(a, b) (a <= b)
expect_named(x, nms) (identical(names(x), nms))
expect_match(x, pat) (grepl(pat, x))
expect_no_match(x, pat) (!grepl(pat, x))
expect_error(expr) (has_error(expr))
expect_error(expr, "msg") (has_error(expr, "msg"))
expect_warning(expr) (has_warning(expr))
expect_warning(expr, "msg") (has_warning(expr, "msg"))
expect_message(expr) (has_message(expr))
expect_no_error(expr) (!has_error(expr))
expect_no_warning(expr) (!has_warning(expr))
expect_no_message(expr) (!has_message(expr))
expect_type(x, "t") (typeof(x) == "t")
expect_vector(x, ptype, size) (vctrs::vec_is(x, ptype, size))
expect_setequal(a, b) (setequal(a, b))
expect_in(x, table) (x %in% table)
expect_contains(x, expected) (expected %in% x)
expect_mapequal(a, b) (identical(a[order(names(a))], b[order(names(b))]))
expect_s4_class(x, "cls") (is(x, "cls"))
expect_output(expr, pat) (grepl(pat, paste(capture.output(expr), collapse = "\n")))

The %==% operator

testit provides %==% as an alias of identical(). The advantage over calling identical() directly is that when the assertion fails inside assert(), it prints str() for both the LHS and RHS, so you can immediately spot the differences:

assert("example", {
  (1:3 %==% 1:3)           # TRUE
  (c("a", "b") %==% c("a", "b"))  # TRUE
})

If (x %==% y) fails, you'll see output like:

x (LHS) ==>
 int [1:3] 1 2 3
----------
 int [1:3] 1 2 4
<== (RHS) y

Tolerance handling

testthat edition 2's expect_equal(a, b, tolerance = t) has subtle semantics: it passes if either the relative comparison via all.equal() passes OR an element-wise absolute check abs(a - b) < tolerance passes. If your package relies on this dual behavior, define a helper like:

all_equal <- function(target, current, tolerance = 1e-5, ...) {
  rel <- all.equal(target, current, tolerance = tolerance, ...)
  if (isTRUE(rel)) return(TRUE)
  abs <- all.equal(target, current, tolerance = tolerance, scale = 1, ...)
  if (isTRUE(abs)) return(TRUE)
  abs
}

For most testit tests, plain all.equal() with an appropriate tolerance is sufficient.

Snapshot tests

testthat stores snapshots in tests/testthat/_snaps/test-name/test-description.md. testit uses a simpler approach: a markdown file tests/testit/test-name.md alongside the .R file.

Format:

## `function_name()` description

```r
code_to_run()
```

```
expected output here
```

testit runs the R code block and compares its output to the following text block. If they differ, the test fails and shows a diff.

To initialize a snapshot test, you can omit the output block and only include the R source code. When you run the tests (execute the .R scripts under tests/, instead of running R CMD check), testit will automatically fill in the output — no need to copy and paste results manually.

Conditional test execution

testthat:

skip_on_cran()
skip_if_not_installed("pkg")

testit offers three levels of conditional execution:

Skip an entire test directory — guard the test_pkg() call in your runner script:

# tests/test-extended.R
library(testit)
if (identical(Sys.getenv("NOT_CRAN"), "true")) {
  test_pkg("pkgname", dir = "extended")
}

Skip a single assertion — wrap assert() in a condition:

if (requireNamespace("pkg", quietly = TRUE)) assert("uses pkg", {
  ...
})

Skip the rest of a test file — use an early return():

if (!requireNamespace("pkg", quietly = TRUE)) return()

Since testit files are sourced top-to-bottom, return() skips the rest of the file.

Setup and teardown

testthat's setup() and teardown() functions are superseded. The current recommended approach uses setup.R with withr::defer(..., teardown_env()):

# tests/testthat/setup.R
db <- connect_db()
withr::defer(disconnect_db(db), teardown_env())

testit — just use normal R patterns:

old <- options(warn = -1)
on.exit(options(old), add = TRUE)

Or place shared setup in helper.R (sourced before test files).

For file cleanup, test_pkg() automatically removes any newly generated files under the test directory after testing completes (controlled by options(testit.cleanup = TRUE), which is the default). This means your tests/ directory stays clean without manual teardown. Have you ever been annoyed by the stray Rplots.pdf in your test folder? You won't suffer from this problem with testit.

DESCRIPTION changes

- Suggests: testthat (>= 3.0.0)
+ Suggests: testit

Remove Config/testthat/edition: * if present.


Part 2: Why testit over testthat

Advantages

1. Radical simplicity

testit is ~500 lines of pure R code in total (including comments and blank lines). testthat is ~15,000 lines of R plus some C. testit has five core functions (assert(), test_pkg(), has_error(), has_warning(), has_message()) and one operator (%==%). There is no hidden machinery — you can read the entire source in a few minutes and understand exactly what happens when you run your tests.

2. Tests are just R

Every assertion in testit is a plain R expression. (x > 0) means exactly what it says. There is no DSL to learn, no expect_* vocabulary to memorize, no argument-order confusion between expect_equal(object, expected) vs all.equal(target, current). If you know R, you know testit.

3. No hidden tolerance semantics

testthat has gone through multiple editions with changing comparison behavior (edition 2 uses all.equal(), edition 3 uses waldo::compare()). The tolerance semantics differ between editions in subtle ways (relative vs. absolute, element-wise vs. mean). With testit, you call all.equal() directly with explicit arguments — what you write is what you get. No surprises when upgrading the test framework.

4. Fast installation and CI

testthat pulls in a dependency tree: rlang, waldo, cli, withr, lifecycle, praise, brio, desc, pkgload, ps, processx, callr, R6, evaluate, fansi, magrittr, glue, digest... testit has zero non-base dependencies. This means faster CI installs, fewer breakage vectors, and no transitive dependency conflicts.

5. Transparent failure messages

When a testit assertion fails, it prints the expression verbatim and its result. When (all.equal(a, b)) fails, you see "Mean relative difference: 0.05" — the actual return value of all.equal(). No formatter stands between you and the diagnostic.

6. Stable across R versions

testit uses only base R features that have been stable for decades. It will not break when R changes something, because it uses almost nothing beyond tryCatch(), withCallingHandlers(), and eval().

7. Snapshot tests are plain markdown

testit's .md snapshots are human-readable documents: a heading (optional), an R code block, and an output block. They can be reviewed in any markdown viewer, diffed with standard tools, and understood without any framework knowledge. testthat's snapshot infrastructure requires expect_snapshot(), snapshot_review(), snapshot_accept(), and produces files that only make sense within testthat's workflow.

Features testthat has that testit doesn't — and why

Mocking (local_mock(), with_mock())

Mocking is not a testing framework concern. If you need to substitute function behavior, use mockr, inject dependencies through function arguments, or use trace()/untrace(). A test framework asserting conditions and a system for intercepting function calls are orthogonal responsibilities.

Reporters (progress bars, JUnit XML, etc.)

A test either passes or fails. testit prints failures. If you need CI integration, the exit code (0 = pass, non-zero = fail) is the universal interface. Elaborate progress bars are nice during interactive development but irrelevant for correctness.

skip_on_cran(), skip_on_os(), skip_if_not_installed()

An if (...) assert() does the same thing with zero framework overhead. The skip_* family is syntactic sugar — useful sugar, but not worth the extra dependencies.

withr integration (local_options(), local_envvar(), etc.)

withr is a fine standalone package. You can use it with testit just as easily. But base R's on.exit() has done this job since R 1.0. old <- options(x = y); on.exit(options(old)) is one line, has no dependencies, and is immediately understandable.

expect_snapshot_file() for binary/file output

testit's .md snapshot mechanism handles text output. For file-based comparisons, read the file into a string and compare it in the .md block (as demonstrated with the RTF tests in this conversion). For binary files, (identical(readBin(f1), readBin(f2))) is explicit and obvious.

expect_no_error(), expect_no_warning(), expect_no_condition()

If code errors, the test already fails — the error propagates and assert() catches it. "Expect no error" is the default state. You only need has_error() when you want to assert that something does error.

Auto-generated test skeletons, use_test(), etc.

These are IDE/usethis conveniences, not framework features. A test file is a plain R script. Create it however you create R scripts.

Summary

testit embodies a philosophy: a test framework should assert conditions and get out of the way. Everything else — mocking, parallelism, reporting, environment management — belongs in separate, purpose-built tools or in base R itself. The result is a testing system that is trivial to understand, impossible to misconfigure, and stable indefinitely.

yihui and others added 13 commits May 6, 2026 13:42
Replace the testthat test infrastructure with testit:
- test_that() -> assert()
- expect_equal() -> isTRUE(all.equal())
- expect_identical() -> %==%
- expect_true/false() -> (expr) / (!(expr))
- expect_error/warning() -> has_error/has_warning()
- Snapshot tests (as_gt, as_rtf) converted to testit's .md format
- Test runner: tests/test-all.R using test_pkg()
- DESCRIPTION: testthat dependency replaced with testit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upstream added as.data.frame() wrapping in test comparisons;
resolved by applying the same change in testit syntax.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use the new `message` parameter in has_error() to verify
specific error messages, matching the original testthat tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Since testit's assert() checks whether the value is TRUE,
all.equal() already returns TRUE on success, making isTRUE()
redundant. Replace (isTRUE(all.equal(a, b))) with (all.equal(a, b)).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion 2

testthat edition 2's expect_equal() passes if EITHER the relative OR
absolute difference is within tolerance. Add an all_equal() helper
that replicates this behavior, and use it across all test files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yihui yihui requested review from LittleBeannie and jdblischak May 6, 2026 22:31
@yihui
Copy link
Copy Markdown
Collaborator Author

yihui commented May 7, 2026

@jdblischak @LittleBeannie This PR is ready for review.

Comment thread tests/testit/helper.R Outdated
Comment thread tests/testit/test-developer-ahr.R Outdated
For tests comparing exact values (integers, rounded data frames, NULL,
lists), use testit's %==% operator instead of all_equal(). This gives
better failure diagnostics via str() output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@jdblischak jdblischak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that the test coverage reported locally by covr::package_coverage() before and after is identical (93.12%) 🎉

A few ergonomic questions:

  • Why does {testit} need to be loaded in order to run test_pkg()? I couldn't find this in the docs
packageVersion("testit")
## [1] ‘0.18.5’
testit::test_pkg()
## Error from sys.source2(r, envir = env, top.env = ns)
## Error in assert("unstratified population, compared with old version",  :
##   could not find function "assert"
  • Is it possible to suppress the many error messages printed to the R console? I assume these are from tests that use has_error()
library("testit")
test_pkg()
# Lots of error messages printed to the console, eg
## Error: missing value where TRUE/FALSE needed
## Error: missing value where TRUE/FALSE needed
## Error: `times` (c(1, 2, 1)) must be positive and strictly increasing!
## Error: `survival` (c("0.5", "NA")) must be positive!
## Error: `survival` must be of same length as `times`
## Error: `survival` (c(0.5, -0.1)) must be positive!
## Error: `survival` must be non-increasing
## Error: `survival` must be non-increasing

# But they all passed
.Last.value
## NULL
  • I observed that {testit} stops on the first error. Is there any plan to enable {testit} to collect test errors to spot patterns, or is this out of scope?

  • For quick feedback, I often use the argument filter of devtools::test(), eg devtools::test(filter = "npe"). Is it possible to do something similar with {testit}?

Comment thread .github/workflows/R-CMD-check.yaml Outdated
Comment thread tests/test-all.R
yihui and others added 4 commits May 7, 2026 16:08
- Tighten some all.equal() tests to use %==% for exact equality
- Extract lengthy LHS/RHS expressions of %==% and all.equal() into
  named variables (res, expected) for clarity
- Raise () assertions from inside loops to top-level using vapply()
- Break excessively long lines (>120 chars) into multiple lines
- Add parentheses for operator precedence with %==% (e.g., 3L * 5L)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yihui yihui force-pushed the convert-tests-to-testit branch from 53a54f1 to 4d9628c Compare May 8, 2026 05:38
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yihui yihui force-pushed the convert-tests-to-testit branch 5 times, most recently from 8f9ffd3 to 98e7a7c Compare May 8, 2026 06:52
@yihui yihui force-pushed the convert-tests-to-testit branch 2 times, most recently from a856dc8 to 49baf85 Compare May 8, 2026 15:30
yihui and others added 2 commits May 8, 2026 11:38
…ackages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yihui yihui force-pushed the convert-tests-to-testit branch from 49baf85 to 29c5d05 Compare May 8, 2026 15:40
yihui and others added 3 commits May 8, 2026 11:52
… comparisons

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove explicit tolerance where default (~1.5e-8) suffices; use scale = 1
for probability comparisons; set minimal explicit tolerance only where
cross-implementation differences require it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yihui
Copy link
Copy Markdown
Collaborator Author

yihui commented May 9, 2026

@jdblischak All ergonomic issues have been addressed in testit. May thanks to all your great suggestion! I think they improved the usability of this package by 10 times.

yihui and others added 2 commits May 8, 2026 23:45
The test "s2pwe fails to identify infinity value" used `times2` (c(1, NA)
from a previous test block) instead of `times3` (c(1, Inf)). It appeared
to pass because `s2pwe(times = c(1, NA), ...)` does error, but the intent
was to test Inf handling. With the correct variable, s2pwe(times = c(1,
Inf), ...) returns a valid result (Inf is numeric and positive), so the
test is invalid — there is no Inf validation in s2pwe.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Faithfully translate expect_error(expr, "message") from the original
testthat tests to has_error(expr, "message") in testit, preserving the
original message strings where the function still produces them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@jdblischak jdblischak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the recent updates to {testit} (thanks @yihui!), I am supportive of this migration of the testing framework.

I observed a slight reduction in the code coverage locally via covr::package_coverage() (93.12% to 92.91%), but according to Codev the coverage is unchanged at 92.90%.

I would like to delay merging until 1) we switch to tagged versions of yihui/actions, and 2) the updated version of {testit} is uploaded to CRAN.

@yihui yihui force-pushed the convert-tests-to-testit branch from 218e0be to 9022c8a Compare May 11, 2026 22:01
@yihui yihui force-pushed the convert-tests-to-testit branch from 9022c8a to eb40106 Compare May 11, 2026 22:05
@yihui
Copy link
Copy Markdown
Collaborator Author

yihui commented May 11, 2026

@jdblischak Actions have been tagged. The CRAN release of testit can be made at any time if you don't have further suggestions or requests.

Copy link
Copy Markdown
Collaborator Author

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is ready for review. I'll release testit v1.0 to CRAN later today and switch to that version accordingly once it lands on CRAN.

To help reviewers understand the changes better, I need to clarify one extra thing I did in this PR, which I should have saved for another PR but I was not sure if it'd be worth torturing you one more time :) The extra thing was that I tightened the equality tests (i.e. expect_equal()) as much as possible for greater rigor. The logic is the following:

  1. Switch to identical() (or equivalently, %==% from testit) whenever possible. If the two objects are strictly identical to each other, we use %==% instead of testing for approximate equality.
  2. When we test things like nrow() or ncol() that are integers, we also try identical testing by changing the target to an integer, e.g., expect_equal(nrow(z1$`_footnotes`), 1) is changed to (nrow(z1$`_footnotes`) %==% 1L) (note the change from double 1 to integer 1L); this is because 1 is not identical to 1L although they are "equal".
  3. Then for the rest of equality tests, try all.equal() with default tolerance (i.e. sqrt(.Machine$double.eps)). If that works, we just use the default all.equal().
  4. If the default tolerance is too tight, we raise it to the nearly smallest tolerance possible to make the test pass, e.g., if the actual difference is 0.00085, we use 0.001. In our original tests, tolerances (if explicitly provided) are often too high, e.g., when 0.003 works but we used 0.01.
  5. When comparing probabilities, we usually use the argument scale = 1 for testing the absolute difference between two probabilities. This gives us a crystal clear idea about how much the two probabilities differ.

BTW, if test coverage is a concern, I think we can easily address it in the next PR.

I'll also add a skill to give AI models instructions on how to write tests with testit in future.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@jdblischak jdblischak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The improvements to the tolerances are very welcome! These have long been a source of frustration for me.

Let's merge once the latest {testit} is available from CRAN.

Comment thread .claude/skills/write-tests.md
Comment thread .claude/skills/write-tests.md Outdated
Comment thread .github/workflows/R-CMD-check.yaml Outdated
Comment thread DESCRIPTION Outdated
Copy link
Copy Markdown
Collaborator

@jdblischak jdblischak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@jdblischak
Copy link
Copy Markdown
Collaborator

Just noticed this NOTE. Could you please add .claude to .Rbuildignore?

* checking for hidden files and directories ... NOTE
Found the following hidden files and directories:
  .claude
These were most likely included in error. See section ‘Package
structure’ in the ‘Writing R Extensions’ manual.

@yihui
Copy link
Copy Markdown
Collaborator Author

yihui commented May 13, 2026

Sure. Done.

@jdblischak
Copy link
Copy Markdown
Collaborator

Reminder to please squash and merge this PR

@yihui
Copy link
Copy Markdown
Collaborator Author

yihui commented May 13, 2026

I'm not sure who are admins of this repo, but they can go to https://github.com/Merck/gsDesign2/settings and disable "merge commits" and "rebase merging", which is what I do for all my repositories since I rarely need the full commit history of a PR and I always squash and merge:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants