Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Natural_Join on columns with different names. #12

Closed
NompuK opened this issue Jun 26, 2019 · 4 comments
Closed

Feature Request: Natural_Join on columns with different names. #12

NompuK opened this issue Jun 26, 2019 · 4 comments

Comments

@NompuK
Copy link

NompuK commented Jun 26, 2019

When I try to do this:

dbopts <- rq_connection_tests(con)
db <- rquery_db_info(connection = con,
                     is_dbi = TRUE,
                     connection_options = dbopts)

table1_td <- db_td(db, "table1 ", qualifiers = c(schema = "schema1"))
table2_td <- db_td(db, "table2 ", qualifiers = c(schema = "schema2"))

join_sql  <- table1_td %.>%
                  natural_join(. , table2_td , by = c("id1" = "id2")) 

I get this error:

Error in natural_join.relop(., table2_td , by = c(id1 = "id2")) :
rquery::natural_join.relop all tables must have all join keys, the following keys are not in some tables: id2

From the error message I take the insight that it is not possible to join two tables on columns with different names. Trying with theta_join() didn't work either, when using the = predicate (using any other it worked). My workaround was so far to rename the columns in one table for them to match the column names in the other table.

If I'm wrong, I'd be happy to be shown a solution. If that's indeed the case then this is probably a feature request.

@JohnMount
Copy link
Member

JohnMount commented Jun 26, 2019

You are correct: for natural join the keys must have the same name in both tables (also true for theta-join). I'll think on this as a feature request, I have some ideas how to fit it in naturally.

@JohnMount
Copy link
Member

JohnMount commented Jun 26, 2019

Thanks for your help and patience with this project.

I am liking a solution where the software inserts an explicit rename() node for the right table if needed. Here is an example of the new (rquery 1.3.6) capability.

library("rquery")
library("wrapr")

raw_connection <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
RSQLite::initExtension(raw_connection)
db <- rquery_db_info(
  connection = raw_connection,
  is_dbi = TRUE,
  connection_options = rq_connection_tests(raw_connection))

t1 <- rq_copy_to(db, "t1", data.frame(a = c(1, 2), b = c(3, 4)))
t2 <- rq_copy_to(db, "t2", data.frame(c = c(2, 1), d = c(6, 5)))

ops <- natural_join(t1, t2, by = c("a" = "c"))
cat(format(ops))
#> table(`t1`; 
#>   a,
#>   b) %.>%
#>  natural_join(.,
#>   table(`t2`; 
#>     c,
#>     d) %.>%
#>    rename(.,
#>     c('a' = 'c')),
#>   j= INNER, by= a)

execute(db, ops)
#>   a b d
#> 1 1 3 5
#> 2 2 4 6

DBI::dbDisconnect(raw_connection)

@NompuK
Copy link
Author

NompuK commented Jun 27, 2019

Awesome. I think that is a good solution. It's intuitive and people know it from dplyr.

@JohnMount
Copy link
Member

Great, the feature is committed and will go to CRAN in the next release (probably a month).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants