Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: avoid NULL if no fks after disentangle #1012

Merged
merged 8 commits into from
Jun 3, 2022

Conversation

TSchiefer
Copy link
Member

@TSchiefer TSchiefer commented Jun 2, 2022

NULL appears in def$fks, need to avoid, otherwise something like this fails (and in general no valid dm is produced):

dm_disentangle(dm_nycflights13(cycle = TRUE), flights) %>% 
  dm_select_tbl(-`airports-1`)

@TSchiefer TSchiefer requested a review from krlmlr June 2, 2022 10:24
@TSchiefer
Copy link
Member Author

actually validate_dm() doesn't even detect the NULL in def$fks. The class of the column is correct (<list<tibble[,4]>>). Probably needs a dedicated check in validate_dm().

Copy link
Collaborator

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Could this be considered a bug in vctrs?

@krlmlr
Copy link
Collaborator

krlmlr commented Jun 2, 2022

Should we fix validate_dm() here too?

@TSchiefer
Copy link
Member Author

Good point about the vctrs requirements that are not actually met. I'll try to make a reprex for a potential issue there, let's see

@krlmlr
Copy link
Collaborator

krlmlr commented Jun 2, 2022

Scratch that, I'm now pretty sure it's the correct vctrs behavior.

@TSchiefer
Copy link
Member Author

hm, ok, I'd have this here, but it is true that vctrs couldn't necessarily avoid that:

suppressPackageStartupMessages({
  library(vctrs)
  library(tibble)
  library(dplyr)
})

a <- tibble(col_a = 1:3)
b <- tibble(col_a = 2:4, col_b = as_list_of(as.list(1:3)))
join_res <- left_join(a, b, by = "col_a")
join_res
#> # A tibble: 3 × 2
#>   col_a       col_b
#>   <int> <list<int>>
#> 1     1            
#> 2     2         [1]
#> 3     3         [1]
join_res$col_b
#> <list_of<integer>[3]>
#> [[1]]
#> NULL
#> 
#> [[2]]
#> [1] 1
#> 
#> [[3]]
#> [1] 2

Created on 2022-06-02 by the reprex package (v2.0.1.9000)

Ok, I'll leave it at that

@krlmlr
Copy link
Collaborator

krlmlr commented Jun 2, 2022

dm_financial() is not available now. Can you please fix the test so that it is skipped if the DB is not available, here or in a separate PR?

@TSchiefer
Copy link
Member Author

TSchiefer commented Jun 2, 2022

Do you think vctrs could give a warning/message that a join might result in NULL entries in columns which are otherwise strictly controlled regarding the class? (guess that would require a change in dplyr rather than in vctrs)
Actively checking the class might be too expensive(?)

@krlmlr krlmlr changed the title avoid NULL if no fks after disentangle FIX: avoid NULL if no fks after disentangle Jun 3, 2022
@krlmlr krlmlr merged commit 05de221 into main Jun 3, 2022
@krlmlr krlmlr deleted the b-NULL-in-def-fks-from-disentanle branch June 3, 2022 02:07
@krlmlr
Copy link
Collaborator

krlmlr commented Jun 3, 2022

Thanks!

@krlmlr
Copy link
Collaborator

krlmlr commented Jun 3, 2022

Regarding list(NULL), I think this is the canonical NA value for all cases of list vectors, including list_of:

library(vctrs)

as_list_of(list(1L, NULL))
#> <list_of<integer>[2]>
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> NULL
as_list_of(list(1L, NA))
#> <list_of<integer>[2]>
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] NA
as_list_of(list(1L, "2"))
#> Error in `list_of()`:
#> ! Can't combine `..1` <integer> and `..2` <character>.

Created on 2022-06-03 by the reprex package (v2.0.1)

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants