Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGZILLA #17897] all.equal(x,y) wrongly reports TRUE for factors containing different sorts of NAs #7068

Closed
github-actions bot opened this issue Aug 27, 2020 · 8 comments

Comments

@github-actions
Copy link

x <- structure(3:1, .Label = c("a", "b", NA), class = "factor")
y <- structure(c(NA, 2L, 1L), .Label = c("a", "b", NA), class = "factor")
all.equal(x, y)
# [1] TRUE

But is.na() gives different results for x and y.
all.equal(is.na(x), is.na(y))
#[1] "1 element mismatch"
is.na(x)
#[1] FALSE FALSE FALSE
is.na(y)
#[1] TRUE FALSE FALSE

I am not sure if ordinary R functions, aside from structure, would ever produce 'y', but the vroom package does and its tests use all.equal() to conclude inappropriately that x and y are equivalent. See tidyverse/vroom#262 for details.


METADATA

  • Bug author - Bill Dunlap
  • Creation time - 2020-08-26 20:02:35 UTC
  • Bugzilla link
  • Status - UNCONFIRMED
  • Alias - None
  • Component - Accuracy
  • Version - R 4.0.x
  • Hardware - All All
  • Importance - P5 normal
  • Assignee - R-core
  • URL -
@github-actions
Copy link
Author

NA


METADATA

  • Comment author - Benjamin Tyner
  • Timestamp - 2020-09-12 20:47:41 UTC

@github-actions
Copy link
Author

'y' can be produced by
factor(c("c", "b", "a"), levels = c("a", "b", NA), exclude = NULL)
factor(c("c", "b", "a"), levels = c("a", "b", "c", NA), exclude = "c")


METADATA

  • Comment author - Suharto Anggono
  • Timestamp - 2020-09-12 22:08:11 UTC

@github-actions
Copy link
Author

NA


METADATA

  • Comment author - elin.waring
  • Timestamp - 2020-09-14 19:03:21 UTC

@github-actions
Copy link
Author

It looks like the current behavior may be intentional, based on the log of revision 56668,
------------------------------------------------------------------------
r56668 | ripley | 2011-08-08 13:36:08 -0400 (Mon, 08 Aug 2011) | 1 line

all.equal.factor predates character NAs
------------------------------------------------------------------------


METADATA

  • Comment author - Benjamin Tyner
  • Timestamp - 2020-09-15 00:13:54 UTC

@github-actions
Copy link
Author

NA


METADATA

  • Comment author - elin.waring
  • Timestamp - 2020-09-15 01:02:54 UTC

@github-actions
Copy link
Author

Since the 2 factors act differently, e.g. in modelling functions, I think all.equal should report that they are not equivalent.

x1 <- structure(rep(3:1,3:1), .Label = c("a", "b", NA), class = "factor")
x2 <- structure(rep(c(NA, 2L, 1L),3:1), .Label = c("a", "b", NA), class =
"factor")
all.equal(x1,x2)

[1] TRUE

y <- 11:16
summary(lm(y~x1))

Call:
lm(formula = y∼ x1)

Residuals:
1 2 3 4 5 6
-1.000e+00 -2.498e-15 1.000e+00 -5.000e-01 5.000e-01 -4.441e-16

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.0000 0.9129 17.527 0.000405 ***
x1b -1.5000 1.1180 -1.342 0.272228
x1NA -4.0000 1.0541 -3.795 0.032119 *
---
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9129 on 3 degrees of freedom
Multiple R-squared: 0.8571, Adjusted R-squared: 0.7619
F-statistic: 9 on 2 and 3 DF, p-value: 0.05399

summary(lm(y~x2))

Call:
lm(formula = y∼ x2)

Residuals:
4 5 6
-0.5 0.5 0.0

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.0000 0.7071 22.627 0.0281 *
x2b -1.5000 0.8660 -1.732 0.3333
---
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7071 on 1 degrees of freedom
(3 observations deleted due to missingness)
Multiple R-squared: 0.75, Adjusted R-squared: 0.5
F-statistic: 3 on 1 and 1 DF, p-value: 0.3333


METADATA

  • Comment author - Bill Dunlap
  • Timestamp - 2020-09-15 01:21:13 UTC

@github-actions
Copy link
Author

NA


METADATA

  • Comment author - elin.waring
  • Timestamp - 2020-09-15 12:29:40 UTC

@github-actions
Copy link
Author

NA


METADATA

  • Comment author - Martin Maechler
  • Timestamp - 2020-09-16 22:11:44 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

0 participants