Skip to content

Add query condition support#261

Merged
eddelbuettel merged 13 commits intomasterfrom
de/ch8049/query_condition
Jun 21, 2021
Merged

Add query condition support#261
eddelbuettel merged 13 commits intomasterfrom
de/ch8049/query_condition

Conversation

@eddelbuettel
Copy link
Copy Markdown
Contributor

This PR adds a first layer of support for query conditions by wrapping the available functionality in the TileDB Embedded library. Corresponding documentation and tests have been added, additional feature may be added in subsequent PRs.

@shortcut-integration
Copy link
Copy Markdown

This pull request has been linked to Clubhouse Story #8049: Query Condition support.

@eddelbuettel
Copy link
Copy Markdown
Contributor Author

eddelbuettel commented Jun 19, 2021

Added the usual #if preprocessor checks to shield QueryCondition code from build in TileDB Embedded older than < 2.3.0, and similar added an assertion that the object is not created on such builds:

> library(tiledb)
TileDB R 0.9.3.3 with TileDB Embedded 2.2.9. See https://tiledb.com for more information.
> qc <- tiledb_query_condition()
Error in tiledb_query_condition() : needs TileDB 2.3.0 or newer
> 

(Here the added .3 is just a local modifier I keep while working on branches.)

With 2.3.0 or 2.4.0 this of course works as it all did prior to the last commit.

@ihnorton
Copy link
Copy Markdown
Member

If I modify line 105 in the test as follows

...
tiledb_query_set_buffer(qry, "a", cola)
tiledb_query_set_buffer(qry, "b", colb)
qc <- tiledb_query_condition_init("b", 121.5000, "FLOAT64", "EQ") # <- change 115.5 to 121.5
qry <- tiledb_query_set_condition(qry, qc)
tiledb_query_submit(qry)
...

then I get:

> ndf
  rows a b
1    0 0 0

Is that expected for an empty result set?

@eddelbuettel
Copy link
Copy Markdown
Contributor Author

Could be a side-effect of returning a list and turning the list into a data.frame. I have work out where the empty becomes a one-row element.

The tests were written when we did not yet have an interface for the tiledb_array objects and the data.frame return. Using the same array but the newer code:

> arr <- tiledb_array(uri, as.data.frame=TRUE)    # full array, as data.frame
> arr[]
   __tiledb_rows  a     b
1              1  1 101.5
2              2  2 102.5
3              3  3 103.5
4              4  4 104.5
5              5  5 105.5
6              6  6 106.5
7              7  7 107.5
8              8  8 108.5
9              9  9 109.5
10            10 10 110.5
11            11 11 111.5
12            12 12 112.5
13            13 13 113.5
14            14 14 114.5
15            15 15 115.5
16            16 16 116.5
17            17 17 117.5
18            18 18 118.5
19            19 19 119.5
20            20 20 120.5
> 
> 
> qc <- tiledb_query_condition_init("b", 115.5, "FLOAT64", "EQ")  
> query_condition(arr) <- qc   # impose the original equality condition
> arr[]
  __tiledb_rows  a     b
1            15 15 115.5
> qc <- tiledb_query_condition_init("b", 121.5, "FLOAT64", "EQ")
> query_condition(arr) <- qc    # impose the modified equality condition
> arr[]
[1] __tiledb_rows a             b            
<0 rows> (or 0-length row.names)
> 

@eddelbuettel
Copy link
Copy Markdown
Contributor Author

eddelbuettel commented Jun 21, 2021

You just ran into a standard R "misbehaviour" and why e.g. the colon operator is discouraged in for loops as it can go negative. You ran

n <- tiledb_query_result_buffer_elements(qry, "a")
ndf <- data.frame(rows=rows,a=cola,b=colb)[1:n,]

and that construct is simply not safe for n==0 and we don't check it in the short snippet you modified ad-hoc.

@eddelbuettel
Copy link
Copy Markdown
Contributor Author

The seq_len(n) alternative to 1:n protects the idiom too:

> data.frame(rows=rows,a=cola,b=colb)[1:n,]
  rows a b
1    0 0 0
> data.frame(rows=rows,a=cola,b=colb)[seq_len(n),]
[1] rows a    b   
<0 rows> (or 0-length row.names)
> 

The test could have been written in this second form and would then return a proper zero-row data.frame. Good observation!

Copy link
Copy Markdown
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one small typo in the new docs otherwise LGTM!

Comment thread R/QueryCondition.R Outdated
#' Initialize a 'tiledb_query_condition' object
#'
#' Initializes (and possibly allocates) a query condition object using a triplet of
#' attribute name, comparison value, and operator. Six types of condition are supported,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing s after condition

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix, thanks!

@ihnorton
Copy link
Copy Markdown
Member

You just ran into a standard R "misbehaviour" and why e.g. the colon operator is discouraged in for loops as it can go negative.

Fair enough, thanks for the explanation.

@eddelbuettel eddelbuettel merged commit a7fc520 into master Jun 21, 2021
@eddelbuettel eddelbuettel deleted the de/ch8049/query_condition branch June 21, 2021 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants