Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CZI default parameters for X/obs/var #50

Merged
merged 12 commits into from
May 13, 2022

Conversation

aaronwolen
Copy link
Member

@aaronwolen aaronwolen commented May 10, 2022

Updates several tiledb parameters for the storage of obs, var, and X arrays (as discussed in #48). Also includes a few minor (internal) changes described below.

Changes to X arrays:

  • increased capacity from 1e4 to 1e5
  • offset filters include: double delta, bit width, and zstd
  • use rle on first string dimension
  • use zstd on second string dimension

Changes to annotation dataframes:

  • obs capacity is 256
  • var capacity is 2048

Benchmarks

I compared the ingestion times and on-disk sizes with (tiledbsc0.1.0.9009) and without (tiledbsc0.1.0.9007) the above parameter changes for several different Seurat datasets provided by the SeuratData package.

Here are the sizes of the original .rds files

dataset size_rds
cbmc 35.30 MB
ifnb 41.91 MB
ssHippo 98.05 MB
pbmcsca 126.23 MB
hcabm40k 187.54 MB
panc8 286.05 MB
pbmc3k 298.15 MB
thp1.eccite 321.10 MB

Ingestion times:

dataset tiledbsc0.1.0.9007 tiledbsc0.1.0.9009 change
cbmc 16.28 10.57 35.09%
ifnb 22.90 18.47 19.36%
ssHippo 49.01 38.84 20.75%
pbmcsca 79.25 68.95 13.01%
hcabm40k 171.96 136.09 20.86%
panc8 128.92 117.16 9.12%
pbmc3k 3.83 4.48 -16.92%
thp1.eccite 184.08 179.19 2.66%

On-disk size:

dataset tiledbsc0.1.0.9007 tiledbsc0.1.0.9009 change
cbmc 67.45 MB 40.88 MB 39.39%
ifnb 78.76 MB 47.06 MB 40.25%
ssHippo 171.60 MB 109.54 MB 36.16%
pbmcsca 276.76 MB 190.66 MB 31.11%
hcabm40k 466.52 MB 324.25 MB 30.50%
panc8 360.41 MB 214.82 MB 40.39%
pbmc3k 17.58 MB 9.57 MB 45.55%
thp1.eccite 656.71 MB 406.83 MB 38.05%

File-specific on-disk sizes:

path tiledbsc0.1.0.9007 tiledbsc0.1.0.9009 change
panc8.SeuratData/X/counts/__fragments/.../d0.tdb 53.62M 4.05K 99.993%
thp1.eccite.SeuratData/X/counts/__fragments/.../d0.tdb 72.02M 5.44K 99.993%
pbmcsca.SeuratData/X/counts/__fragments/.../d0.tdb 28.56M 2.16K 99.993%
ssHippo.SeuratData/X/counts/__fragments/.../d0.tdb 23.16M 1.75K 99.993%
hcabm40k.SeuratData/X/counts/__fragments/.../d0.tdb 41.75M 3.16K 99.993%
ifnb.SeuratData/X/counts/__fragments/.../d0.tdb 10.13M 784 99.993%
cbmc.SeuratData/X/counts/__fragments/.../d0.tdb 8.55M 664 99.993%
pbmc3k.SeuratData/X/counts/__fragments/.../d0.tdb 2.36M 184 99.993%
hcabm40k.SeuratData/X/counts/__fragments/.../d1.tdb 50.23M 680.19K 98.678%
thp1.eccite.SeuratData/X/counts/__fragments/.../d1.tdb 78.83M 1.15M 98.547%
cbmc.SeuratData/X/counts/__fragments/.../d1.tdb 9.17M 139.08K 98.519%
ssHippo.SeuratData/X/counts/__fragments/.../d1.tdb 24.57M 377.14K 98.501%
pbmc3k.SeuratData/X/counts/__fragments/.../d1.tdb 2.51M 38.53K 98.500%
ifnb.SeuratData/X/counts/__fragments/.../d1.tdb 10.89M 568.48K 94.902%
pbmcsca.SeuratData/X/counts/__fragments/.../d1.tdb 33.07M 1.83M 94.453%
thp1.eccite.SeuratData/X/counts/__fragments/.../__fragment_metadata.tdb 261.35K 27.93K 89.314%
hcabm40k.SeuratData/X/counts/__fragments/.../__fragment_metadata.tdb 174.83K 18.9K 89.192%
panc8.SeuratData/X/counts/__fragments/.../__fragment_metadata.tdb 236.84K 27.46K 88.404%
pbmcsca.SeuratData/X/counts/__fragments/.../__fragment_metadata.tdb 126.72K 15.4K 87.845%
ssHippo.SeuratData/X/counts/__fragments/.../__fragment_metadata.tdb 89.77K 12.68K 85.878%
ifnb.SeuratData/X/counts/__fragments/.../__fragment_metadata.tdb 40.79K 7.91K 80.596%
panc8.SeuratData/X/counts/__fragments/.../d1.tdb 57.63M 11.65M 79.786%
cbmc.SeuratData/X/counts/__fragments/.../__fragment_metadata.tdb 33.96K 7.24K 78.672%
thp1.eccite.SeuratData/X/counts/__fragments/.../d0_var.tdb 643.37K 210.61K 67.265%
hcabm40k.SeuratData/X/counts/__fragments/.../d0_var.tdb 473.61K 180.78K 61.830%
pbmc3k.SeuratData/X/counts/__fragments/.../__fragment_metadata.tdb 12.17K 4.89K 59.840%
panc8.SeuratData/X/counts/__fragments/.../d0_var.tdb 636.49K 303.86K 52.261%
ssHippo.SeuratData/X/counts/__fragments/.../a0.tdb 15.87M 8.18M 48.456%
ifnb.SeuratData/X/counts/__fragments/.../d0_var.tdb 244.48K 129.79K 46.909%
thp1.eccite.SeuratData/X/counts/__fragments/.../a0.tdb 76.83M 42.13M 45.161%
ifnb.SeuratData/X/counts/__fragments/.../a0.tdb 8.28M 4.59M 44.625%
cbmc.SeuratData/X/counts/__fragments/.../a0.tdb 6.38M 3.61M 43.411%
ssHippo.SeuratData/X/counts/__fragments/.../d0_var.tdb 388.06K 226.55K 41.621%
hcabm40k.SeuratData/X/counts/__fragments/.../a0.tdb 37.09M 21.72M 41.439%
pbmcsca.SeuratData/X/counts/__fragments/.../d0_var.tdb 545.63K 322.65K 40.866%
pbmc3k.SeuratData/X/counts/__fragments/.../a0.tdb 1.62M 1.01M 37.456%
cbmc.SeuratData/X/counts/__fragments/.../d0_var.tdb 310.26K 201.78K 34.965%
pbmcsca.SeuratData/X/counts/__fragments/.../a0.tdb 21.7M 14.84M 31.601%
pbmc3k.SeuratData/X/counts/__fragments/.../d1_var.tdb 10.01M 7.84M 21.676%
panc8.SeuratData/X/counts/__fragments/.../d1_var.tdb 154.99M 127.8M 17.547%
panc8.SeuratData/X/counts/__fragments/.../a0.tdb 76.18M 64.66M 15.120%
pbmc3k.SeuratData/X/counts/__fragments/.../d0_var.tdb 142.87K 123.41K 13.622%
thp1.eccite.SeuratData/X/counts/__fragments/.../d1_var.tdb 396.34M 343.09M 13.436%
ifnb.SeuratData/X/counts/__fragments/.../d1_var.tdb 45.26M 39.33M 13.108%
cbmc.SeuratData/X/counts/__fragments/.../d1_var.tdb 39.62M 34.77M 12.255%
hcabm40k.SeuratData/X/counts/__fragments/.../d1_var.tdb 314.58M 286.02M 9.079%
pbmcsca.SeuratData/X/counts/__fragments/.../d1_var.tdb 178.56M 163.42M 8.478%
ssHippo.SeuratData/X/counts/__fragments/.../d1_var.tdb 98.91M 95.02M 3.939%

Other changes

  • Fix log_array_ingestion()
  • adds private methods for logging array creation/ingestion

@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #17557: Adopt czi default layout options.

@aaronwolen aaronwolen force-pushed the aaronwolen/sc-17557/adopt-czi-default-layout-options branch 3 times, most recently from 1d76435 to c86d455 Compare May 10, 2022 18:53
@aaronwolen aaronwolen force-pushed the aaronwolen/sc-17557/adopt-czi-default-layout-options branch from c86d455 to 92ed0e6 Compare May 13, 2022 14:28
@aaronwolen aaronwolen marked this pull request as ready for review May 13, 2022 18:47
Copy link
Collaborator

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

inst/bench/run-benchmarks.R Outdated Show resolved Hide resolved
@aaronwolen
Copy link
Member Author

CC @dnadave @kaitlin-procogia @augustine-procogia

Unfortunately I can't add you as reviewers b/c you're outside of our organization. But I'll be sure to CC you from here on out and please feel free to leave comments and give a 👍 / 👎 .

Copy link
Member

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. We probably have a need for more of the type of systematic benchmarking you did here -- there are a number of parameters that surely influence performance.

Copy link
Member

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

Copy link
Member

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@eddelbuettel
Copy link
Member

(My DNS is sometimes wonky between my desktop/server and the access point and the browser gets a hickup. Hence to accidental double approval.)

@aaronwolen aaronwolen merged commit d577e50 into main May 13, 2022
@gsakkis gsakkis deleted the aaronwolen/sc-17557/adopt-czi-default-layout-options branch June 5, 2022 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants