Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The max_atomic arg might generate failing code #29

Closed
andreranza opened this issue Aug 31, 2022 · 6 comments
Closed

The max_atomic arg might generate failing code #29

andreranza opened this issue Aug 31, 2022 · 6 comments

Comments

@andreranza
Copy link
Member

library(tidyverse)
library(readxl)
library(rstudioapi)

sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur/Monterey 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] rstudioapi_0.14 readxl_1.4.1    forcats_0.5.2   stringr_1.4.1  
#>  [5] dplyr_1.0.9     purrr_0.3.4     readr_2.1.2     tidyr_1.2.0    
#>  [9] tibble_3.1.8    ggplot2_3.3.6   tidyverse_1.3.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] lubridate_1.8.0     assertthat_0.2.1    digest_0.6.29      
#>  [4] utf8_1.2.2          R6_2.5.1            cellranger_1.1.0   
#>  [7] backports_1.4.1     reprex_2.0.2        evaluate_0.16      
#> [10] httr_1.4.4          highr_0.9           pillar_1.8.1       
#> [13] rlang_1.0.4         googlesheets4_1.0.1 R.utils_2.12.0     
#> [16] R.oo_1.25.0         rmarkdown_2.16      styler_1.7.0       
#> [19] googledrive_2.0.0   munsell_0.5.0       broom_1.0.0        
#> [22] compiler_4.2.0      modelr_0.1.9        xfun_0.32          
#> [25] pkgconfig_2.0.3     htmltools_0.5.3     tidyselect_1.1.2   
#> [28] fansi_1.0.3         crayon_1.5.1        tzdb_0.3.0         
#> [31] dbplyr_2.2.1        withr_2.5.0         R.methodsS3_1.8.2  
#> [34] grid_4.2.0          jsonlite_1.8.0      gtable_0.3.0       
#> [37] lifecycle_1.0.1     DBI_1.1.3           magrittr_2.0.3     
#> [40] scales_1.2.1        cli_3.3.0           stringi_1.7.8      
#> [43] fs_1.5.2            xml2_1.3.3          ellipsis_0.3.2     
#> [46] generics_0.1.3      vctrs_0.4.1         tools_4.2.0        
#> [49] R.cache_0.16.0      glue_1.6.2          hms_1.1.2          
#> [52] fastmap_1.1.0       yaml_2.3.5          colorspace_2.0-3   
#> [55] gargle_1.2.0        rvest_1.0.3         knitr_1.40         
#> [58] haven_2.5.1

devtools::install_github("cynkra/constructive")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'constructive' from a github remote, the SHA1 (753b2da6) has not changed since last install.
#>   Use `force = TRUE` to force installation

sanl_path <- file.path(Sys.getenv("FBEA_DATADIR"), "SANL/SANL_2020.xlsx")

raw <- suppressMessages(suppressWarnings(read_xlsx(sanl_path))) 

one_dttm_of_sanl <- slice(raw, 1) %>% 
  select(11) %>% 
  rename_with(.fn = ~ "a_dttm")

ptype_sanl <- constructive::construct(one_dttm_of_sanl, max_atomic = 0)

sendToConsole(ptype_sanl)
#> Error in if (nzchar(tz)) attr(res, "tzone") <- tz : 
#>  argument is of length zero

Created on 2022-08-31 with reprex v2.0.2

@moodymudskipper
Copy link
Collaborator

Minimal example and comments inline :

time <- as.POSIXct("2022-01-01 00:00:00", "UTC")
constructive::construct(time, max_atomic = 0)
#> as.POSIXct(character(0), tz = character(0))

# Fails because we need a TZ
as.POSIXct(character(0), tz = character(0))
#> Error in if (nzchar(tz)) attr(res, "tzone") <- tz: argument is of length zero

# For a prototype we might expect
as.POSIXct(character(0), tz = "UTC")
#> POSIXct of length 0
identical(as.POSIXct(character(0), tz = "UTC"), vctrs::vec_ptype(time))
#> [1] TRUE

# But that shows the definitions of a prototype and of max_atomic = 0 might be different
# Is this the right max_atomic = 0 version ?
as.POSIXct(character(0)) |>
  structure(tzone = character(0))
#> POSIXct of length 0

# or can we say that we can't guarante code to run when max_atomic is set (because we can't)
# and not go through a rabbit hole of exceptions ? 

# Also what about objects with a class and no implemented constructor ? 
x <- 1
class(x) <- "foo"
constructive::construct(x, max_atomic = 0)
#> numeric(0) |>
#>   structure(class = character(0))

In the general case we can't make an exception for class and attributes, because they can be complex (classes themselves can have attributes), so the latter probably has to stay this way. But's important to know that while the standard use enforces equivalence of original and constructed objects, max_atomic trims it in a way that is dependent on the implemented constructors, and thus the output might change as the package gets more constructors.

A corollary is that prototypes are defined by arbitrary rules, or at least rules
that I haven't formalized yet (Since the rule "everything" length 0 is not accurate), so if we want to construct a prototype, use vctrs::vec_ptype() first (it made and knows the rules), then construct in a standard way.


TLDR :

  • We can't say max_atomic = 0 means prototype
  • The correct way to handle a POSIXct with max_atomic = 0 might be as.POSIXct(character(0)) |> structure(tzone = character(0))
  • OR the current way might be right, but must be better documented.
  • If we need to construct a prototype we should use constructive::construct(vctrs::vec_ptype(data)) without max_atomic arg (@andreranza that's the part for you)

@moodymudskipper
Copy link
Collaborator

If we don't change the code construction, which I currently tend to believe we shouldn't, it might still be good to try by default to run the code, even if check is FALSE, and warn the user when we output failing code.

@andreranza
Copy link
Member Author

Thanks!

Just to recall, from dput_small1() you got as.POSIXct(character(0)) |> structure(tzone = character(0)) when dealing with POSIXct, so at that time you might had this representation in mind (?).

Just for curiosity, is there a theoretical reason for which you would tend to avoid as.POSIXct(character(0)) |> structure(tzone = character(0))?

I see the point that {vctrs} seems to have a different opinion on actually "what is a prototype for a POSIXct". What I see is that for {vctrs} a valid time zone should be provided. And, there might actually be a valid "theoretical" reason that underpins this "choice" that I am not aware of, or ignore (btw if there is a theoretical reason there this not even a "choice", but just a fact that follows, right?).

So, I cannot see, due to my illiteracy on this topic, what is the degree of "subjectivity" involved in the definition of a "valid prototype".

As far as I am concerned, as.POSIXct(character(0)) |> structure(tzone = character(0)) would be a valid prototype. I don't see the tzone necessary. But this is just my "illiterate" opinion 😅 driven but the following mere "practical" reason:

as.POSIXct(character(0)) |> structure(tzone = character(0)) runs on the console, whereas as.POSIXct(character(0), tz = character(0)) not.

@moodymudskipper
Copy link
Collaborator

Thanks!

Just to recall, from dput_small1() you got as.POSIXct(character(0)) |> structure(tzone = character(0)) when dealing with POSIXct, so at that time you might had this representation in mind (?).

I think it was just using calling dput on zero length elements

Just for curiosity, is there a theoretical reason for which you would tend to avoid as.POSIXct(character(0)) |> structure(tzone = character(0))?

Because I think max_atomic's role might be just to show less, not to change the way it's shown. The core feature is not to modify the input but to represent it. I think it's something I oversaw when starting to talk about building prototypes with this package and it's probably a mistake.
In the code max_atomic is only leveraged when attempting to construct atomic types, the rest of the skeleton is built from the real input.

Also if we did this for POSIXct we'd need to consider exceptions for all constructors to try to build something that works from zero length vectors, and for some it might be impossible anyway.

I see the point that {vctrs} seems to have a different opinion on actually "what is a prototype for a POSIXct". What I see is that for {vctrs} a valid time zone should be provided. And, there might actually be a valid "theoretical" reason that underpins this "choice" that I am not aware of, or ignore (btw if there is a theoretical reason there this not even a "choice", but just a fact that follows, right?).
So, I cannot see, due to my illiteracy on this topic, what is the degree of "subjectivity" involved in the definition of a "valid prototype".

I think by default vctrs::vec_type() reduces the vec_size to zero (approx same as length() except for data frames where it's now()) but keeps attributes intact, in this case it's a simple rule, but vec_ptype is a generic so any more sophisticated behaviours can be implemented.

So construct(vec_ptype()) is the proper idiom if we want a prototype to use with vec_assert() or vec_cast()

Maybe the two latter functions in the end just check names, class and attributes though.

As far as I am concerned, as.POSIXct(character(0)) |> structure(tzone = character(0)) would be a valid prototype. I don't see the tzone necessary. But this is just my "illiterate" opinion 😅 driven but the following mere "practical" reason:

as.POSIXct(character(0)) |> structure(tzone = character(0)) runs on the console, whereas as.POSIXct(character(0), tz = character(0)) not.

I don't say as.POSIXct(character(0), tz = character(0)) is a prototype, it's not.
It's the short representation of a POSIXct vector with a non NULL timezone however. And short representations don't have to run they need to have the same call structure as the complete representation and be shorter.

We will dissociate completely the concept of max_atomic = 0 with the concept of prototype.

@moodymudskipper moodymudskipper changed the title Error in if (nzchar(tz)) attr(res, "tzone") <- tz : argument is of length zero The max_atomic arg might generate failing code Sep 1, 2022
@moodymudskipper
Copy link
Collaborator

Let's deal with this in #19

@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2023

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants