Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when unequal number of rows #71

Closed
maciejmotyka opened this issue Dec 25, 2019 · 1 comment · Fixed by #78
Closed

Error when unequal number of rows #71

maciejmotyka opened this issue Dec 25, 2019 · 1 comment · Fixed by #78
Labels

Comments

@maciejmotyka
Copy link

I'm looking for a way to find mismatches between two data frames that may have an unequal number of rows. Something along the lines you might get from running anti_join(df1, df2) followed by anti_join(df2, df1). I hoped that dataCompareR would do this, but apparently it's not possible.

df2 <- tibble(col1 = c("cat", "dog", "mouse", "fly"))
df1 <- tibble(col1 = c("cat", "dog", "rat"))
dataCompareR::rCompare(df1, df2)
Running rCompare...
Coercing input data to data.frame
Error in (nrow(df_a_subset) + 1):nrow(df_a) : argument of length 0

What do you think about adding this functionality to dataCompareR?
Or maybe I'm missing some other obvious way to do this kind of comparison?

@robne1982
Copy link
Collaborator

It looks like this is actually a bug with single column dataframes. What you're trying to do is exactly what the package is intended to be used for.

If you add a second column

df1 <- tibble(col1 = c("cat", "dog", "rat"), col2 = c(1, 2, 3))
df2 <- tibble(col1 = c("cat", "dog", "mouse", "fly"), col2 = c(1, 2, 3, 4))

dataCompareR::rCompare(df1, df2)

Running rCompare...
Coercing input data to data.frame
All columns were compared, 1 row(s) were dropped from comparison
There are  1 mismatched variables:
First and last 5 observations for the  1 mismatched variables
  rowNo valueA valueB variable  typeA  typeB diffAB
1     3    rat  mouse     COL1 factor factor   

I'm no longer directly involved in this project. but I'll take a look as it should be an easy fix.

In the meantime, you can work around this issue for single column dataframes by adding a second dummy column to both dataframes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants