Skip to content

Update show() methods#355

Merged
eddelbuettel merged 9 commits intomasterfrom
de/sc-13744/show_update
Jan 24, 2022
Merged

Update show() methods#355
eddelbuettel merged 9 commits intomasterfrom
de/sc-13744/show_update

Conversation

@eddelbuettel
Copy link
Copy Markdown
Contributor

This PR updates the newer show() methods. They now work better in isolation as well as when composed, are more compact and a little closer to the Python equivalents.

@shortcut-integration
Copy link
Copy Markdown

@johnkerl
Copy link
Copy Markdown
Contributor

Awesome! Let me give this a spin in Jupyter just as a road-test. :)

@johnkerl
Copy link
Copy Markdown
Contributor

OK, test-drove in Jupyter!

My thinking (perhaps overwrought) was to make sure if someone prints the entire schema, it comes out OK -- and likewise if they just print the domain, etc.

Full code:

library("tiledb")
uri <- "tiledb://TileDB-Inc/gtex-analysis-rnaseqc-gene-tpm"
arr <- tiledb_array(uri, query_type="READ", as.data.frame=TRUE); sch <- schema(arr); sch
fl <- filter_list(sch); fl
dom <- domain(sch); dom
ndim <- tiledb_ndim(dom); ndim
dims <- dimensions(dom)
dims[[1]]
dims[[2]]
filter_list(dims[[1]])
filter_list(dims[[2]])
nattr <- length(attrs(sch)); nattr
attri <- attrs(sch, 1); attr
fla <- filter_list(attri); fla
fla[0]

Some notes:

  • Cell 34 in the screenshot -- there is a trailing comma (really a nit unless we want people to be able to copy-paste this as-is)
  • Also in cell there is one too many closing parentheses
  • Cells 14 and 15 do we want to support parenthesis-balanced output at this detail level? If so we should fix this; otherwise a nit ...

Also I would add that I love that the output from show now will actually be, not just a particularly formatted output, but runnable code -- this is a huge advance! :)

Screen Shot 2022-01-21 at 6 28 49 PM

Copy link
Copy Markdown
Contributor

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review comments here (sorry if it's confusing to narrate through a screenshot)
#355 (comment)

@eddelbuettel
Copy link
Copy Markdown
Contributor Author

My thinking (perhaps overwrought) was to make sure if someone prints the entire schema, it comes out OK -- and likewise if they just print the domain, etc.

I am not sure what you are trying to say here. We do have show methods for schema, domain, dimension, filter_list and filter and if go to the SC ticket that is linked you see the output listed there explicitly.

Cell 34 in the screenshot -- there is a trailing comma (really a nit unless we want people to be able to copy-paste this as-is)

Ahh. Likely old TileDB Core instance so no validity filter. Will try to fix.

Also in cell there is one too many closing parentheses

Sorry can you be more specific? Where? Can you maybe show it more directly than "somewhere in those X lines in a screenshot" ? Also I ran the code generated from a schema and that worked. (Of course, no running example proofs anything about other possible bugs...)

Cells 14 and 15 do we want to support parenthesis-balanced output at this detail level? If so we should fix this; otherwise a nit ...

These calls seem "wrong". Maybe I need to add type checkers:

> flt1 <- tiledb_filter("DOUBLE_DELTA")
> flt2 <- tiledb_filter("CHECKSUM_SHA256")
> flt3 <- tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",4)
> show(flt1)
tiledb_filter("DOUBLE_DELTA") 
> show(flt2)
tiledb_filter("CHECKSUM_SHA256") 
> show(flt3)
tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",4) 
> fltlst <- tiledb_filter_list(c(flt1, flt2, flt3))
> show(fltlst)
tiledb_filter_list(c(tiledb_filter("DOUBLE_DELTA")), tiledb_filter("CHECKSUM_SHA256")), tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",4))) 
> dims <- tiledb_dim(name="rows", domain=c(1L,40L), tile=40L, type="INT32")
> show(dims)
tiledb_dim(name="rows", domain=c(1L,40L), tile=40L, type="INT32") 
> 

(As an aside, the debugging in the notebook may be more cumbersome. The code works the same within/without.)

The PR also makes no claim to be adding show() methods for every conceivable TileDB object. That will come in another PR. The focus here is improve on the state of the world as of a week ago and get something releasable.

@eddelbuettel eddelbuettel requested a review from johnkerl January 22, 2022 00:24
@eddelbuettel
Copy link
Copy Markdown
Contributor Author

@johnkerl The trailing comma in case of less than three filter lists is taken care of (and tested against 2.5.3 too). I found no extra paren so no change there.

@johnkerl
Copy link
Copy Markdown
Contributor

johnkerl commented Jan 22, 2022

Sorry for the confusion @eddelbuettel !

My apologies; it looks like the DIM1 FL and DIM2 FL are completely blank (I missed those!) on the 'before' from the second gist

Re imbalanced parens, here's a screenshot of the top of the third gist in vim -- appears to be simply an extra final ) -- ?

Screen Shot 2022-01-21 at 7 50 35 PM

@eddelbuettel
Copy link
Copy Markdown
Contributor Author

Ok, that I can reproduce. Minimal example:

> uri  <- "tiledb://TileDB-Inc/gtex-analysis-rnaseqc-gene-tpm"
> arr  <- tiledb_array(uri)
> sch <- schema(arr)
> sch
tiledb_array_schema(
    domain=tiledb_domain(c(tiledb_dim(name="gene_id", domain=c(NULL,NULL), tile=NULL, type="ASCII"), tiledb_dim(name="sample", domain=c(NULL,NULL), tile=NULL, type="ASCII"))),
    attrs=c(tiledb_attr(name="tpm", type="FLOAT64", ncells=1, nullable=FALSE, filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",1))))),
    cell_order="ROW_MAJOR", tile_order="ROW_MAJOR", capacity=18000, sparse=TRUE, allows_dups=FALSE, 
    coords_filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))),
    offsets_filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("POSITIVE_DELTA"),"POSITIVE_DELTA_MAX_WINDOW",1024)), tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",1))),
    validity_filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("RLE"),"COMPRESSION_LEVEL",-1)))
)
> 
>
> fl <- filter_list(sch)
> fl
$coords
tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))) 

$offsets
tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("POSITIVE_DELTA"),"POSITIVE_DELTA_MAX_WINDOW",1024)), tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",1))) 

$validity
tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("RLE"),"COMPRESSION_LEVEL",-1))) 

> 

I'll work that out tomorrow to correct the filter_list bit for the offets. There is something that comes up wrong with particular mixes of filters with/without options.

@eddelbuettel
Copy link
Copy Markdown
Contributor Author

eddelbuettel commented Jan 22, 2022

Looks like it is sorted out:

> uri  <- "tiledb://TileDB-Inc/gtex-analysis-rnaseqc-gene-tpm"
> arr <- tiledb_array(uri); sch <- schema(arr); fl <- filter_list(sch); fl
$coords
tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))) 

$offsets
tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("POSITIVE_DELTA"),"POSITIVE_DELTA_MAX_WINDOW",1024), tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",1))) 

$validity
tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("RLE"),"COMPRESSION_LEVEL",-1))) 

> sch
tiledb_array_schema(
    domain=tiledb_domain(c(tiledb_dim(name="gene_id", domain=c(NULL,NULL), tile=NULL, type="ASCII"), tiledb_dim(name="sample", domain=c(NULL,NULL), tile=NULL, type="ASCII"))),
    attrs=c(tiledb_attr(name="tpm", type="FLOAT64", ncells=1, nullable=FALSE, filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",1))))),
    cell_order="ROW_MAJOR", tile_order="ROW_MAJOR", capacity=18000, sparse=TRUE, allows_dups=FALSE,
    coords_filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))),
    offsets_filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("POSITIVE_DELTA"),"POSITIVE_DELTA_MAX_WINDOW",1024), tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",1))),
    validity_filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("RLE"),"COMPRESSION_LEVEL",-1)))
)
> 

@eddelbuettel
Copy link
Copy Markdown
Contributor Author

eddelbuettel commented Jan 22, 2022

If you have some TileDB arrays on disk you can use this ad-hoc script to extract the executable schema, and re-create it.

#!/bin/bash

if [ $# -lt 1 ]; then
    echo "Usage: $0 uri"
    exit 1
fi

tf=$(mktemp)
echo "library(tiledb)" > ${tf}
echo -n "tiledb_array_create(uri=tempfile()," >> ${tf}
# use littler:    r -ltiledb -e"show(schema(tiledb_array(\"$1\")))" >> ${tf}
# or Rscript:   Rscript -e 'suppressMessages(library(tiledb)); show(schema(tiledb_array(\"$1\")))" >> ${tf}
Rscript -e "suppressMessages(library(tiledb)); show(schema(tiledb_array(\"$1\")))" >> ${tf}
echo ")" >> ${tf}
echo "cat(\"Done!\\n\")" >> ${tf}

cat ${tf}
Rscript ${tf}
rm -v ${tf}

@eddelbuettel
Copy link
Copy Markdown
Contributor Author

@johnkerl The show() method for FilterList is now smarter about length zero. Your script then just shows nothing.

> library(tiledb)
TileDB R 0.10.2 with TileDB Embedded 2.7.0. See https://tiledb.com for more information.
> uri  <- "tiledb://TileDB-Inc/gtex-analysis-rnaseqc-gene-tpm"
> arr  <- tiledb_array(uri, query_type="READ", as.data.frame=TRUE)
> sch <- schema(arr)
> cat("SCHEMA\n")
SCHEMA
> show(sch)
tiledb_array_schema(
    domain=tiledb_domain(c(tiledb_dim(name="gene_id", domain=c(NULL,NULL), tile=NULL, type="ASCII"), tiledb_dim(name="sample", domain=c(NULL,NULL), tile=NULL, type="ASCII"))),
    attrs=c(tiledb_attr(name="tpm", type="FLOAT64", ncells=1, nullable=FALSE, filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",1))))),
    cell_order="ROW_MAJOR", tile_order="ROW_MAJOR", capacity=18000, sparse=TRUE, allows_dups=FALSE,
    coords_filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))),
    offsets_filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("POSITIVE_DELTA"),"POSITIVE_DELTA_MAX_WINDOW",1024), tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",1))),
    validity_filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("RLE"),"COMPRESSION_LEVEL",-1)))
)
> fl   <- filter_list(sch)
> cat("\nFILTER_LIST\n")

FILTER_LIST
> show(fl)
$coords
tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",-1))) 

$offsets
tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("POSITIVE_DELTA"),"POSITIVE_DELTA_MAX_WINDOW",1024), tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",1))) 

$validity
tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("RLE"),"COMPRESSION_LEVEL",-1))) 

> dom  <- domain(sch)
> cat("\nDOMAIN\n")

DOMAIN
> show(dom)
tiledb_domain(c(tiledb_dim(name="gene_id", domain=c(NULL,NULL), tile=NULL, type="ASCII"), tiledb_dim(name="sample", domain=c(NULL,NULL), tile=NULL, type="ASCII"))) 
> ndim <- tiledb_ndim(dom)
> cat("\nNDIM\n")

NDIM
> show(ndim)
[1] 2
> dims <- dimensions(dom)
> cat("\nDIM1\n")

DIM1
> show(dims[[1]])
tiledb_dim(name="gene_id", domain=c(NULL,NULL), tile=NULL, type="ASCII") 
> cat("\nDIM2\n")

DIM2
> show(dims[[2]])
tiledb_dim(name="sample", domain=c(NULL,NULL), tile=NULL, type="ASCII") 
> cat("\nDIM1 FL\n")

DIM1 FL
> show(filter_list(dims[[1]]))
 
> cat("\nDIM2 FL\n")

DIM2 FL
> show(filter_list(dims[[2]]))
 
> nattr <- length(attrs(sch))
> cat("\nNATTR\n")

NATTR
> show(nattr)
[1] 1
> attr1 <- attrs(sch, 1)
> cat("\nATTR1\n")

ATTR1
> show(attr1)
tiledb_attr(name="tpm", type="FLOAT64", ncells=1, nullable=FALSE, filter_list=tiledb_filter_list(c(tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",1)))) 
> fla <- filter_list(attr1)
> cat("\nATTR1 FL\n")

ATTR1 FL
> show(fla[0])
tiledb_filter_set_option(tiledb_filter("ZSTD"),"COMPRESSION_LEVEL",1) 
> 

Copy link
Copy Markdown
Contributor

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome @eddelbuettel !! :D

#355 (comment) is really the bomb -- built-in syntax checking + user empowerment FTW! :D

@eddelbuettel
Copy link
Copy Markdown
Contributor Author

"It takes a village" 😂 -- started drafting the NEWS for 0.11.0 and we about 'large number' of PRs on this topic but I like the place we got, even if it some iterations!

@eddelbuettel eddelbuettel merged commit 435481c into master Jan 24, 2022
@eddelbuettel eddelbuettel deleted the de/sc-13744/show_update branch January 24, 2022 15:24
@eddelbuettel eddelbuettel mentioned this pull request Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants