Skip to content

Describe a schema in an R object#344

Merged
eddelbuettel merged 11 commits intomasterfrom
de/sc-13273/schema_object
Jan 13, 2022
Merged

Describe a schema in an R object#344
eddelbuettel merged 11 commits intomasterfrom
de/sc-13273/schema_object

Conversation

@eddelbuettel
Copy link
Copy Markdown
Contributor

This PR gather schema information we can use to both print schema creation command, and to summarize array objects directly for more fine-grained formatting. It is a rather unfortunate this only comes together now as @johnkerl could most likely have saved some time had I put this together earlier.

I will leave this as a draft for now. It currently returns a list with two data frames for array (high-level) descriptives and then one for all 'data' columns. As dimensions and attributes are in fact a little distinct it may be beneficial to return one each for dimensions and attributes.

Current output format showing the two data frames directly on two sample arrays:

edd@rob:~/git/tiledb-r(de/sc-13273/schema_object)$ r -ltiledb -e'arr <- tiledb_array("/tmp/tiledb/quickstart_dense"); df <- tiledb_schema_object(arr); print(df)'
$array
                           uri  type cell_order tile_order capacity allow_dupes coord_filters        coord_options offset_filters     offset_filters.1
1 /tmp/tiledb/quickstart_dense dense  COL_MAJOR  COL_MAJOR    10000       FALSE          ZSTD COMPRESSION_LEVEL=-1           ZSTD COMPRESSION_LEVEL=-1

$data
     column names datatypes nullable varnum domains extend nfilters filters filtopts fillvalue
1 dimension  rows     INT32    FALSE      1   [1,4]      4        0                           
2 dimension  cols     INT32    FALSE      1   [1,4]      4        0                           
3 attribute     a     INT32    FALSE      1                       0                         NA

edd@rob:~/git/tiledb-r(de/sc-13273/schema_object)$ r -ltiledb -e'arr <- tiledb_array("/tmp/tiledb/penguins"); df <- tiledb_schema_object(arr); print(df)'
$array
                   uri   type cell_order tile_order capacity allow_dupes coord_filters        coord_options offset_filters     offset_filters.1
1 /tmp/tiledb/penguins sparse  COL_MAJOR  COL_MAJOR    10000        TRUE          ZSTD COMPRESSION_LEVEL=-1           ZSTD COMPRESSION_LEVEL=-1

$data
     column             names datatypes nullable varnum     domains extend nfilters filters             filtopts fillvalue
1 dimension           species     ASCII    FALSE     NA (null,null)   null        0                                       
2 dimension            island     ASCII    FALSE     NA (null,null)   null        0                                       
3 attribute    bill_length_mm   FLOAT64     TRUE      1                           1    ZSTD COMPRESSION_LEVEL=-1          
4 attribute     bill_depth_mm   FLOAT64     TRUE      1                           1    ZSTD COMPRESSION_LEVEL=-1          
5 attribute flipper_length_mm     INT32     TRUE      1                           1    ZSTD COMPRESSION_LEVEL=-1          
6 attribute       body_mass_g     INT32     TRUE      1                           1    ZSTD COMPRESSION_LEVEL=-1          
7 attribute               sex     ASCII     TRUE     NA                           1    ZSTD COMPRESSION_LEVEL=-1          
8 attribute              year     INT32    FALSE      1                           1    ZSTD COMPRESSION_LEVEL=-1        NA

edd@rob:~/git/tiledb-r(de/sc-13273/schema_object)$

@shortcut-integration
Copy link
Copy Markdown

This pull request has been linked to Shortcut Story #13273: Wrap ArraySchema as high-level object.

@eddelbuettel eddelbuettel force-pushed the de/sc-13273/schema_object branch from 9f9f353 to a4bf68e Compare January 10, 2022 20:20
@eddelbuettel eddelbuettel force-pushed the de/sc-13273/schema_object branch 4 times, most recently from 847a7a8 to cf154e1 Compare January 11, 2022 16:06
@eddelbuettel eddelbuettel marked this pull request as ready for review January 13, 2022 16:39
@eddelbuettel
Copy link
Copy Markdown
Contributor Author

Some more work here to make it more like the Python sibbling that describes in array schema in 'code'. A new function describe() was added to do just that. We may want to embed it in show() instead, via an option. That is easy to adjust.

This is now mostly feature complete, I have don't full round-trips yet to see if all ascii representations of enums of mapping fully (there may be some cases of, say, "ASCII" != "TILEDB_ASCII").

For example for the Penguins array (with NAs) we now get this:

> library(tiledb)
TileDB R 0.10.2 with TileDB Embedded 2.7.0. See https://tiledb.com for more information.
> arr <- tiledb_array("/tmp/tiledb/penguins/")
> describe(arr)
dims <- c(tiledb_dim(name="species", domain=c(NULL,NULL), tile=NULL, type="ASCII"),
          tiledb_dim(name="island", domain=c(NULL,NULL), tile=NULL, type="ASCII")))
dom <- tiledb_domain(dims=dims)
attrs <- c(tiledb_attr(name="bill_length_mm", type="FLOAT64", ncells=1, nullable=TRUE, filter_list=c(tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))))),
           tiledb_attr(name="bill_depth_mm", type="FLOAT64", ncells=1, nullable=TRUE, filter_list=c(tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))))),
           tiledb_attr(name="flipper_length_mm", type="INT32", ncells=1, nullable=TRUE, filter_list=c(tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))))),
           tiledb_attr(name="body_mass_g", type="INT32", ncells=1, nullable=TRUE, filter_list=c(tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))))),
           tiledb_attr(name="sex", type="ASCII", ncells=NA, nullable=TRUE, filter_list=c(tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))))),
           tiledb_attr(name="year", type="INT32", ncells=1, nullable=FALSE, filter_list=c(tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))))))
sch <- tiledb_array_schema(domain=dom, attrs=attrs, cell_order="COL_MAJOR", tile_order="COL_MAJOR", sparse=TRUE, capacity=10000, allow_dupes=TRUE, 
                           coord_filters=filter_list=c(tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))))), 
                           offset_filters=filter_list=c(tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))))))
> 

after which sch can be used with a suitable uri in tiledb_array_create(uri, sch).

Copy link
Copy Markdown
Contributor

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome @eddelbuettel !!

one little fine-tune opportunity, i ran this for a few arrays then vim'ed the result & i am seeing some parenthesis imbalances --
Screen Shot 2022-01-13 at 4 02 42 PM

@eddelbuettel eddelbuettel force-pushed the de/sc-13273/schema_object branch from 367b0ad to 1a9379c Compare January 13, 2022 21:33
Copy link
Copy Markdown
Contributor

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢
🎉

@eddelbuettel
Copy link
Copy Markdown
Contributor Author

We all may need to gab a little in a little while (as per my chat with @ihnorton) as we a) probably want to restore the object dump from core as an optional feature and b) need to work out if we want these verbose code pretty printers or the "look like core but ain't" you added as default as two may seem like one too many. Not urgent, but one of those things where a little chat prior to unannounced PRs can work wonders.

Also note that the code in the PR so far 'only looks pretty' but hasn't been to the dance yet. I haven't done any round-robin tests yet. I am sure there may be a many-legged creature be hiding in a corner or two.

@eddelbuettel eddelbuettel merged commit 8c87c71 into master Jan 13, 2022
@eddelbuettel eddelbuettel deleted the de/sc-13273/schema_object branch January 13, 2022 21:55
@eddelbuettel eddelbuettel mentioned this pull request Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants