Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

foverlaps error message could be more informative #3007

Closed
msummersgill opened this issue Aug 21, 2018 · 2 comments
Closed

foverlaps error message could be more informative #3007

msummersgill opened this issue Aug 21, 2018 · 2 comments
Labels
enhancement non-equi joins rolling, overlapping, non-equi joins

Comments

@msummersgill
Copy link

msummersgill commented Aug 21, 2018

I came across a valid error when using foverlaps(), and I was able to figure out the solution pretty quickly by googling the error message and finding this Stack Overflow question: Foverlaps error: Error in if (any(x[[xintervals[2L]]] - x[[xintervals[1L]]] < 0L)) stop

It seems like an error message along the line of Error in foverlaps: NA values in x table interval column; all rows with NA values in the range columns must be removed for foverlaps to work might be helpful to users and get them pointed in the right direction without a trip to Stack Overflow.

Skimming foverlaps.R it looks like this might require an extra step in foverlaps() to check for NA values before executing -- if that is the case then I'm not sure the added overhead in the function call would be worth it or not, but if the message could be changed without slowing down the function call this could be an improvement.

Thoughts on feasibility/value?

library(data.table)

DT_x <- data.table(x1 = c(1,7,11,20),
                   x2 = c(2,8,NA,22),
                   xval = c("x_a","x_b","x_c","x_d"),
                   key = c("x1","x2"))

DT_y <- data.table(y1 = c(1,10),
                   y2 = c(9,50),
                   yval = c("y_a","y_b"),
                   key = c("y1","y2"))

foverlaps(DT_x,DT_y)

# Error in if (any(x[[xintervals[2L]]] - x[[xintervals[1L]]] < 0L)) stop("All entries in column ",  : 
#                                                                          missing value where TRUE/FALSE needed

sessionInfo() output:

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.4

loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3    yaml_2.1.16   
@msummersgill
Copy link
Author

msummersgill commented Aug 21, 2018

Adding the following input checks seems to do the job, so I guess it might just be a matter of whether the overhead performance hit is acceptable. File diff here between fork branch msummersgill/data.table/foverlaps-NA-eror and Rdatatable/data.table/master reflects where they might fit into the rest of the input checks.

  if ( anyNA(x[[xintervals[1L]]]) )
    stop("NA values in data.table 'x' start column: '", xintervals[1L],"'. All rows with NA values in the range columns must be removed for foverlaps() to work")
  if ( anyNA(x[[xintervals[2L]]]) )
    stop("NA values in data.table 'x' end column: '", xintervals[2L],"'. All rows with NA values in the range columns must be removed for foverlaps() to work")
  if ( anyNA(y[[yintervals[1L]]]) )
    stop("NA values in data.table 'y' start column: '", yintervals[1L],"'. All rows with NA values in the range columns must be removed for foverlaps() to work")
  if ( anyNA(y[[yintervals[2L]]]) )
    stop("NA values in data.table 'y' end column: '", yintervals[2L],"'. All rows with NA values in the range columns must be removed for foverlaps() to work") 

With the added checks, the same example above runs as follows:

foverlaps(DT_x,DT_y)

# Error in foverlaps(DT_x, DT_y) : 
#   NA values in data.table 'x' end column: 'x2'. All rows with NA values in the range columns must be removed for foverlaps() to work

@jangorecki
Copy link
Member

maybe check could be added on C level after overlapping join has finished, then it should not add overhead.

arunsrinivasan added a commit that referenced this issue Feb 18, 2019
* type = ‘equal’ support for foverlaps

* Add debugging info with LLDB for future-me.

* Rename len1 to count and len2 to type_count.

* Better error messages when interval cols have NAs, #3007

Closes #3007.
arunsrinivasan added a commit that referenced this issue Feb 18, 2019
* type = ‘equal’ support for foverlaps

* Add debugging info with LLDB for future-me.

* Rename len1 to count and len2 to type_count.

* Better error messages when interval cols have NAs, #3007

Closes #3007.

* Better error/warnings for POSIXct intervals, #1143
@jangorecki jangorecki added the non-equi joins rolling, overlapping, non-equi joins label Apr 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement non-equi joins rolling, overlapping, non-equi joins
Projects
None yet
Development

No branches or pull requests

3 participants