Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior when using custom read function #8

Closed
FedericoComoglio opened this issue Oct 11, 2023 · 3 comments
Closed

Unexpected behavior when using custom read function #8

FedericoComoglio opened this issue Oct 11, 2023 · 3 comments

Comments

@FedericoComoglio
Copy link

Hi @DavZim,

in the following example from the README

dataverifyr/README.md

Lines 157 to 176 in f0e718f

One helpful use case is to use this functionality to assert that your
data has the right values in a custom read function like so:
``` r
read_custom <- function(file, rules) {
data <- read.csv(file) # or however you read in your data
# if the check_data detects a fail: the read_custom function will stop
check_data(data, rules, xname = file,
fail_on_error = TRUE, fail_on_warn = TRUE)
# ...
data
}
# nothing happens when the data matches the rules
data <- read_custom("correct_data.csv", rules)
# an error is thrown when warnings or errors are found
data <- read_custom("wrong_data.csv", rules)
#> Error in check_data(data, rules, fail_on_error = TRUE, fail_on_warn = TRUE) :
#> In dataset 'wrong_data.csv' found 1 warnings and 1 errors
```

one would expect check_data to abort parsing when a data point/observation/row fails verification. However, this doesn't seem to be the implemented behavior, which relies on warnings and/or errors being thrown instead. To illustrate my point, see the following reprex.

renv::use("DavZim/dataverifyr")
renv::use("dplyr@1.1.3")
renv::use("magrittr@2.0.3")

library(dataverifyr)
library(magrittr)

rules <- ruleset(
  rule(mpg > 10 & mpg < 30), # mpg goes up to 34
  rule(cyl %in% c(4, 8)), # missing 6 cyl
  rule(vs %in% c(0, 1), allow_na = TRUE)
)

# stop parsing data when verification fails
read_custom <- function(file, rules) {
  data <- readr::read_csv(file) # or however you read in your data
  # expected: if the check_data detects a fail: the read_custom function will stop
  check_data(data, rules, fail_on_error = TRUE, fail_on_warn = TRUE)
  data
}

data_pass <- mtcars %>%
  dplyr::filter(mpg > 10 & mpg < 30, cyl %in% c(4, 8))

data_fail <- mtcars %>%
  dplyr::filter(cyl == 6)

readr::write_csv(data_pass, "data_pass.csv")
readr::write_csv(data_fail, "data_fail.csv")

# nothing happens when the data matches the rules
read_custom("data_pass.csv", rules)

# this one should fail, but doesn't
read_custom("data_fail.csv", rules)

# no warning/error returned (expected)
data <- readr::read_csv("data_fail.csv")
check_data(data, rules)

Let me know if this makes sense and/or if I'm missing something. Thank you!

Finally, also note that

check_data(data, rules, xname = file,
is expected to fail (xname no longer an argument).

@DavZim
Copy link
Owner

DavZim commented Oct 13, 2023

Thanks for pointing this out. There was indeed a logical error, as fail_on_error would only stop when an error was thrown in one of the rule evaluations. I have changed the names of the argument to stop_on_error etc, and added stop_on_fail to make the functionality more obvious.

@DavZim DavZim closed this as completed in 283eaf5 Oct 13, 2023
@FedericoComoglio
Copy link
Author

Thank you very much, @DavZim. I tested the latest devel version and I confirm it now works as expected. I find the new argument names more informative and clearly scoped. Thanks!

@DavZim
Copy link
Owner

DavZim commented Oct 16, 2023

That's good to hear. I'll wait a couple of days then I'll push the updates to cran.
If you have any other feedback, let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants