Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

by groups which are actually in the data #84

Closed
om9391 opened this issue Jul 20, 2022 · 2 comments
Closed

by groups which are actually in the data #84

om9391 opened this issue Jul 20, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@om9391
Copy link

om9391 commented Jul 20, 2022

Description

Desired records could be filtered during post-processing, but is there an option to create only those by groups, which are actually available in the dataset?

Steps to Reproduce (Bug Report Only)

library(Tplyr)
library(tibble)
library(dplyr)

adsl <-
  tribble(~ USUBJID, ~ SAFFL, ~TRT01AN, ~TRT01A,
          "101-001", "Y",     1,        "TRT A",
          "101-002", "Y",     2,        "TRT B",
          "101-003", "Y",     1,        "TRT A",
          "101-004", "Y",     2,        "TRT B")

adpe <- tribble(~USUBJID,  ~AVISITN, ~AVISIT,     ~PARAMN, ~PARAM,   ~TRT01AN, ~TRT01A,    ~AVALC,   ~SAFFL,
                "101-001",  -10,      "Screening",  1,      "Head",    1,     "TRT A",  "Normal", "Y",
                "101-001",  -10,      "Screening",  2,      "Lungs",   1,     "TRT A",  "Normal", "Y",
                "101-001",  -1,       "Day -1",     2,      "Lungs",   1,     "TRT A",  "Normal", "Y",
                "101-001",   5,       "Day 5",      2,      "Lungs",   1,     "TRT A",  "Normal", "Y",
                
                "101-002",  -10,      "Screening",  1,      "Head",    2,     "TRT B",  "Normal", "Y",
                "101-002",  -10,      "Screening",  2,      "Lungs",   2,     "TRT B",  "Normal", "Y",
                "101-002",  -1,       "Day -1",     2,      "Lungs",   2,     "TRT B",  "Normal", "Y",
                "101-002",   5,       "Day 5",      2,      "Lungs",   2,     "TRT B",  "Normal", "Y",
                
                "101-003",  -10,      "Screening",  1,      "Head",    1,     "TRT A",  "Normal", "Y",
                "101-003",  -10,      "Screening",  2,      "Lungs",   1,     "TRT A",  "Abnormal", "Y",
                "101-003",  -1,       "Day -1",     2,      "Lungs",   1,     "TRT A",  "Normal", "Y",
                "101-003",   5,       "Day 5",      2,      "Lungs",   1,     "TRT A",  "Abnormal", "Y",
)

adpe$AVALC <- factor(adpe$AVALC, levels = c("Normal", "Abnormal"))

t <- tplyr_table(adpe, TRT01A, where = SAFFL == 'Y') %>%
  set_pop_data(adsl) %>%
  set_pop_treat_var(TRT01A) %>%
  set_pop_where(SAFFL =="Y") %>%
  add_layer(
    group_count(AVALC, by = vars(PARAM, AVISIT)) %>%
     set_distinct_by(USUBJID) %>%
     set_denoms_by(TRT01A) 
  ) 

dat<- build(t) %>%
  arrange(ord_layer_1, ord_layer_2, ord_layer_3)

Expected behavior: [What you expected to happen]

For PARAM = 'Head' records created only for 'Screening' but not for 'Day -1' and 'Day 5'.

Actual behavior: [What actually happened]

Cartesian product of visits and params

@mstackhouse mstackhouse added the enhancement New feature or request label Oct 10, 2022
@mstackhouse
Copy link
Contributor

Thanks, @om9391 - I can think of a couple different workarounds for this (like you could pre-process by concatenating visits and PARAMCD together then post process to separate it), but that's clearly not ideal.

You're right that right now it's basically a cartesian join. @elimillera uses dplyr::complete() with the by variables to populate everything. As far as implementing, I can also see use cases for it being more than just data driven. For example, you might want to provide a mock-up of appropriate visits as dictated by the protocol. So this changes the framework for how row labels are generally created.

The other thought here is that the target variable might need to be treated differently. For example, in this case you want normal/abnormal taken from the factor levels and applied to each result, but you'd want VISIT restricted by PARAMCD. So I'm thinking that by variables should be treated independently of the target variables - but I want to make sure that's intuitive.

@om9391 do you have suggestions of how you'd like the syntax itself to feel?

@mstackhouse
Copy link
Contributor

Closed via #174

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants