Update R pre-commit package version and fix broken tests by jeancochrane · Pull Request #64 · ccao-data/ptaxsim

jeancochrane · 2026-01-05T22:44:04Z

Background

This PR implements two fixes for problems with our automated checks that are preventing us from merging PRs:

The R pre-commit package version that we use is out of date, which causes it to try to install R dependencies that are no longer compatible with R 4.5.x (most notably digest). This PR updates our R pre-commit package version to get it working with R 4.5.x.
Two snapshot tests for the lookup_agency() function are failing for reasons that I can't quite figure out. This PR switches to a more modern version of the snapshot test that resolves the error and will provide better output if it ever fails again in the future. (See the section below for more details.)

Failing snapshot tests

We use two snapshot tests in order to test that the lookup_agency() function returns exactly the same output for two medium-sized queries. Those tests are currently implemented using the testthat assertion function expect_known_hash():

ptaxsim/tests/testthat/test-lookup.R

Lines 229 to 236 in eb1e3e8

    
           expect_known_hash( 
        
             lookup_agency(2014:2019, "12064"), 
        
             "cf6dcb93bf" 
        
           ) 
        
           expect_known_hash( 
        
             lookup_agency(sum_df$year, sum_df$tax_code), 
        
             "30ede4ede0" 
        
           )

For some reason, these two lookup_agency() calls return a different hash on CI than they do locally (see here for example CI logs). The hash matches the expected value in the test when I run it on my local machine, but the hash is different when the test runs on the test-coverage GitHub workflow.

I spent a few hours trying to figure out the source of the discrepancy but I couldn't quite get it. During my investigation, I wrote a script to confirm that the local and CI dataframes have exactly the same contents but different object hashes. In order to run this script, you'll need to manually download the following CI artifacts and save them to the corresponding filename in your ptaxsim/ directory (I didn't bother scripting this download because it requires a GitHub auth token):

agency-2014-2019-ci.zip: https://github.com/ccao-data/ptaxsim/actions/runs/20830893132/artifacts/5068218666
agency-summary-ci.zip: https://github.com/ccao-data/ptaxsim/actions/runs/20830893132/artifacts/5068218740

Click here to expand a hidden section containing the script code

library(ptaxsim)

Sys.setenv(PTAXSIM_DB_PATH = "ptaxsim.db")
ptaxsim_db_conn <- DBI::dbConnect(
  RSQLite::SQLite(),
  Sys.getenv("PTAXSIM_DB_PATH")
)
assign("ptaxsim_db_conn", ptaxsim_db_conn, envir = .GlobalEnv)

# Download these .zip files from CI and save them to the current working directory
agency_2014_to_2019_ci_zip_url <- "https://github.com/ccao-data/ptaxsim/actions/runs/20830893132/artifacts/5068218666"
agency_summary_ci_zip_url <- "https://github.com/ccao-data/ptaxsim/actions/runs/20830893132/artifacts/5068218740"

# Function to extract the RDS file from the CI .zip files whose paths are listed above
extract_rds_from_zip <- function(zip_path, extract_dir) {
  unzip(zip_path, exdir = extract_dir)
  rds_files <- list.files(extract_dir, pattern = "\\.rds$", full.names = TRUE, recursive = TRUE)
  if (length(rds_files) == 0) stop("No RDS files found in: ", zip_path)
  rds_files[1]
}

# Extract and read the CI RDS files
dir.create("agency-2014-2019-ci", showWarnings = FALSE)
agency_2014_to_2019_ci_rds_path <- extract_rds_from_zip(
  file.path("agency-2014-2019-ci.zip"),
  file.path("agency-2014-2019-ci")
)
agency_2014_to_2019_ci_df <- readRDS(agency_2014_to_2019_ci_rds_path)

dir.create("agency-summary-ci", showWarnings = FALSE)
agency_summary_ci_rds_path <- extract_rds_from_zip(
  "agency-summary-ci.zip",
  "agency-summary-ci"
)
agency_summary_ci_df <- readRDS(agency_summary_ci_rds_path)

# Load the local data frames
agency_2014_to_2019_local_df <- lookup_agency(2014:2019, "12064")
agency_summary_local_df <- lookup_agency(
  sample_tax_bills_summary$year,
  sample_tax_bills_summary$tax_code
)

# Compare column names
if (!identical(names(agency_2014_to_2019_ci_df), names(agency_2014_to_2019_local_df))) {
  cat("CI columns:    ", paste(names(agency_2014_to_2019_ci_df), collapse = ", "), "\n")
  cat("Local columns: ", paste(names(agency_2014_to_2019_local_df), collapse = ", "), "\n")
  stop("agency_2014_to_2019: Column names do not match (see above for info)")
}

# Compare column types
ci_types <- sapply(agency_2014_to_2019_ci_df, class)
local_types <- sapply(agency_2014_to_2019_local_df, class)
if (!identical(ci_types, local_types)) {
  cat("agency_2014_to_2019: Column types do not match\n")
  cat("CI types:    ", paste(ci_types, collapse = ", "), "\n")
  cat("Local types: ", paste(local_types, collapse = ", "), "\n")
  stop("agency_2014_to_2019: Column types do not match")
}

# Compare values
if (!isTRUE(all.equal(agency_2014_to_2019_ci_df, agency_2014_to_2019_local_df))) {
  cat("agency_2014_to_2019: Column values are not identical\n")
  diff_rows <- which(as.matrix(agency_2014_to_2019_ci_df) != as.matrix(agency_2014_to_2019_local_df), arr.ind = TRUE)
  cat("First few differences (row, col):\n")
  print(head(diff_rows))
  stop("agency_2014_to_2019: Column values are not identical")
}

# Repeat checks for agency_summary
if (!identical(names(agency_summary_ci_df), names(agency_summary_local_df))) {
  cat("agency_summary: Column names do not match\n")
  cat("CI columns:    ", paste(names(agency_summary_ci_df), collapse = ", "), "\n")
  cat("Local columns: ", paste(names(agency_summary_local_df), collapse = ", "), "\n")
  stop("agency_summary: Column names do not match")
}

ci_types <- sapply(agency_summary_ci_df, class)
local_types <- sapply(agency_summary_local_df, class)
if (!identical(ci_types, local_types)) {
  cat("agency_summary: Column types do not match\n")
  cat("CI types:    ", paste(ci_types, collapse = ", "), "\n")
  cat("Local types: ", paste(local_types, collapse = ", "), "\n")
  stop("agency_summary: Column types do not match")
}

if (!isTRUE(all.equal(agency_summary_ci_df, agency_summary_local_df))) {
  cat("agency_summary: Column values are not identical\n")
  diff_indices <- which(as.matrix(agency_summary_ci_df) != as.matrix(agency_summary_local_df), arr.ind = TRUE)
  cat("First few differences (row, col):\n")
  print(head(diff_indices))
  stop("agency_summary: Column values are not identical")
}

# Print hashes, as a final check to demonstrate that the objects are different
# even though their contents are identical
cat("agency_2014_to_2019 local hash: ", digest::digest(agency_2014_to_2019_local_df), "\n")
cat("agency_2014_to_2019 CI hash:    ", digest::digest(agency_2014_to_2019_ci_df), "\n")
cat("agency_summary local hash:      ", digest::digest(agency_summary_local_df), "\n")
cat("agency_summary CI hash:         ", digest::digest(agency_summary_ci_df), "\n")

We shouldn't even really be using expect_known_hash() for these tests anymore, because it is deprecated in the latest version of testthat. Instead, testthat now recommends using expect_snapshot_output() and expect_snapshot_value() for snapshot tests. These new tests are not only recommended, they also provide verbose error output that shows exactly which rows are mismatching in the case of a snapshotted dataframe. This is nice for our lookup_agency() tests -- it's very difficult to debug a test failure based on the object hash changing (as my script demonstrates above), but since the new expect_snapshot_*() tests work with archived output rather than object hashes, they will be able to show us exactly why the output differs from the snapshot if the test fails in the future.

This snapshotting stuff may be unfamiliar so I'm happy to talk it through in person if it would be helpful!

…debugging

This reverts commit 128edb9.

jeancochrane · 2026-01-08T23:23:34Z

-
-

I couldn't find any documentation of this change, but it seems that the latest version of styler bundled with pre-commit is now enforcing a max of two newlines between code blocks. (See here for failing pre-commit logs.) It's possible we could choose to update our styler config to override this setting, but I personally agree that two newlines should be the maximum amount of space between code blocks, so I decided to just implement it across the files that are currently using a max of four newlines.

jeancochrane · 2026-01-08T23:29:56Z

@@ -0,0 +1,70 @@
+# lookup values/data are correct


This is an example of a snapshot file -- expect_snapshot_value() generates it automatically the first time it runs, and then on subsequent runs it compares the output of the lookup_agency() function to this file.

jeancochrane · 2026-01-08T23:30:57Z

-    lookup_agency(2014:2019, "12064"),
-    "cf6dcb93bf"
+
+  local_edition(3) # Enable snapshot testing


This is required since the new snapshot tests are part of the 3rd edition of testthat, which is opt-in only. I'm choosing to only opt-in for this one test, since I suspect we'll need to migrate other tests to meet the new standard and I don't want to bother with that right now.

jeancochrane · 2026-01-08T23:36:52Z

+    return_linter = NULL,
+    commented_code_linter = NULL,
+    pipe_consistency_linter = pipe_consistency_linter(c("auto"))


These defaults have all changed in the latest version of lintr. It might be worth conforming to the new lintr defaults at some point, but I don't want to deal with it right now, so I'm just reverting to the previous defaults.

jeancochrane · 2026-01-08T23:37:29Z

    hooks:
    -   id: check-added-large-files
-        args: ['--maxkb=200']
+        args: ['--maxkb=500']


One of the snapshot test files is slightly larger than 200kb. We probably shouldn't make a habit of committing large files to the repo on a regular basis, but I think one medium-sized snapshot file is fine, so I'm bumping this limit to allow my PR to pass this check.

kyrasturgill

Everything makes sense to me! Thanks for the thorough explanation of these updates.

jeancochrane added 14 commits January 5, 2026 16:30

Update pre-commit version

34602cf

Add pipe_consistency_linter to lintr ignores

a3e291d

Update pipe_consistency_linter to match current practice for the repo

1247c26

Fix quadruple whitespace that is no longer allowed by style-files

d4c30ae

Tweak failing tests to print output on failure

50ec63f

Change print statements to cat CSV during test debugging

5ed5e56

Temporarily set up tmate session in test-coverage workflow for debugging

c9bfa62

Upload lookup_agency output to GitHub workflow artifacts for further …

6c9ceb8

…debugging

Instead of uploading lookup_agency output, print debug info

128edb9

Revert "Instead of uploading lookup_agency output, print debug info"

d77e95d

This reverts commit 128edb9.

Write agency summary lookup to CSV for debugging

4e46a3a

Save RDS instead of CSV for debugging

64f97ba

Remove debugging from test-coverage.yaml

27605c8

Change snapshot tests to use expect_snapshot_value

1aed6fe

jeancochrane changed the title ~~Update R pre-commit package version~~ Update R pre-commit package version and fix broken tests Jan 8, 2026

jeancochrane commented Jan 8, 2026

View reviewed changes

Add comment for json2 serialization to expect_snapshot_value test

3aff22b

jeancochrane marked this pull request as ready for review January 8, 2026 23:48

jeancochrane requested a review from kyrasturgill as a code owner January 8, 2026 23:48

kyrasturgill approved these changes Jan 13, 2026

View reviewed changes

jeancochrane changed the base branch from master to 2024-data-update January 14, 2026 16:08

jeancochrane merged commit afae33d into 2024-data-update Jan 14, 2026
9 checks passed

jeancochrane deleted the jeancochrane/fix-pre-commit branch January 14, 2026 16:17

jeancochrane mentioned this pull request Jan 28, 2026

Fix broken expect_known_hash test by switching to new snapshot interface ccao-data/ccao#48

Merged

jeancochrane mentioned this pull request Apr 15, 2026

Update sample tax bills for 2024 #79

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update R pre-commit package version and fix broken tests#64

Update R pre-commit package version and fix broken tests#64
jeancochrane merged 15 commits into
2024-data-updatefrom
jeancochrane/fix-pre-commit

jeancochrane commented Jan 5, 2026 •

edited

Loading

Uh oh!

jeancochrane Jan 8, 2026

Uh oh!

jeancochrane Jan 8, 2026

Uh oh!

jeancochrane Jan 8, 2026

Uh oh!

jeancochrane Jan 8, 2026

Uh oh!

jeancochrane Jan 8, 2026

Uh oh!

kyrasturgill left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	expect_known_hash(
	lookup_agency(2014:2019, "12064"),
	"cf6dcb93bf"
	)
	expect_known_hash(
	lookup_agency(sum_df$year, sum_df$tax_code),
	"30ede4ede0"
	)

Conversation

jeancochrane commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Failing snapshot tests

Uh oh!

jeancochrane Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

jeancochrane Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

jeancochrane Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

jeancochrane Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

jeancochrane Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

kyrasturgill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jeancochrane commented Jan 5, 2026 •

edited

Loading