Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for enumerated types #562

Merged
merged 27 commits into from Sep 6, 2023
Merged

Conversation

eddelbuettel
Copy link
Member

@eddelbuettel eddelbuettel commented Jun 23, 2023

[WIP] This (work-in-progress) branch supports enumerated types (as provided by the merged-into-dev PR 4051). It is now round-turn complete for the standard case of a data.frame in and out, support for Arrow return is next.

CI is now on but 'ineffective' for the new code as there is no pre-made artifact to utilise, it is tested locally in development; see below for full run.

#866

edd@rob:~/git/tiledb-r(de/sc-30201/enumerated_types)$ rcc.r 
── R CMD build ────────────────────────────────────────────────────────────────────────────────────────
✔  checking for file ‘.../DESCRIPTION’
─  preparing ‘tiledb’:
✔  checking DESCRIPTION meta-information
─  cleaning src
─  running ‘cleanup’
─  installing the package to build vignettes
✔  creating vignettes (9.8s)
─  cleaning src
─  running ‘cleanup’
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘tiledb_0.19.1.8.tar.gz’
   
── R CMD check ────────────────────────────────────────────────────────────────────────────────────────
─  using log directory ‘/tmp/file16d6e479022ada/tiledb.Rcheck’
─  using R version 4.3.0 (2023-04-21)
─  using platform: x86_64-pc-linux-gnu (64-bit)
─  R was compiled by
       gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
       GNU Fortran (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
─  running under: Ubuntu 22.10
─  using session charset: UTF-8
✔  checking for file ‘tiledb/DESCRIPTION’
─  checking extension type ... Package
─  this is package ‘tiledb’ version ‘0.19.1.8’
─  package encoding: UTF-8
✔  checking package namespace information
✔  checking package dependencies (1.7s)
✔  checking if this is a source package
✔  checking if there is a namespace
✔  checking for executable files
✔  checking for hidden files and directories
✔  checking for portable file names
✔  checking for sufficient/correct file permissions
✔  checking whether package ‘tiledb’ can be installed (7.9s)
─  used C compiler: ‘gcc-12 (Ubuntu 12.2.0-3ubuntu1) 12.2.0’
─  used C++ compiler: ‘g++-12 (Ubuntu 12.2.0-3ubuntu1) 12.2.0’
✔  checking C++ specification
     Not all R platforms support C++17
✔  checking installed package size
✔  checking package directory
✔  checking ‘build’ directory
✔  checking DESCRIPTION meta-information
✔  checking top-level files
✔  checking for left-over files
✔  checking index information
✔  checking package subdirectories
✔  checking R files for non-ASCII characters
✔  checking R files for syntax errors
✔  checking whether the package can be loaded (520ms)
✔  checking whether the package can be loaded with stated dependencies (633ms)
✔  checking whether the package can be unloaded cleanly (615ms)
✔  checking whether the namespace can be loaded with stated dependencies (524ms)
✔  checking whether the namespace can be unloaded cleanly (677ms)
✔  checking loading without being on the library search path (722ms)
✔  checking startup messages can be suppressed (1.8s)
✔  checking dependencies in R code (1.8s)
✔  checking S3 generic/method consistency (774ms)
✔  checking replacement functions (499ms)
✔  checking foreign function calls (772ms)
✔  checking R code for possible problems (5.6s)
✔  checking Rd files (557ms)
✔  checking Rd metadata
✔  checking Rd cross-references
✔  checking for missing documentation entries (558ms)
✔  checking for code/documentation mismatches (2s)
✔  checking Rd \usage sections (1.3s)
✔  checking Rd contents (353ms)
✔  checking for unstated dependencies in examples
✔  checking line endings in shell scripts
✔  checking line endings in C/C++/Fortran sources/headers
✔  checking line endings in Makefiles
✔  checking compilation flags in Makevars
✔  checking for GNU extensions in Makefiles
✔  checking for portable use of $(BLAS_LIBS) and $(LAPACK_LIBS)
✔  checking use of PKG_*FLAGS in Makefiles
✔  checking compilation flags used
✔  checking compiled code
✔  checking installed files from ‘inst/doc’
✔  checking files in ‘vignettes’
✔  checking examples (1.2s)
✔  checking for unstated dependencies in ‘tests’
─  checking tests
    [31s/39s] OK
   * checking for unstated dependencies in vignettes ... OK
   * checking package vignettes in ‘inst/doc’ ... OK
   * checking running R code from vignettes ...
     ‘data-ingestion-from-sql.md’ using ‘UTF-8’... OK
     ‘documentation.md’ using ‘UTF-8’... OK
     ‘installation-options.md’ using ‘UTF-8’... OK
     ‘introduction.md’ using ‘UTF-8’... OK
     ‘tiledb-mariadb-examples.md’ using ‘UTF-8’... OK
    NONE
   * checking re-building of vignette outputs ... OK
   * checking PDF version of manual ... OK
   * DONE
   
   Status: OK
   
edd@rob:~/git/tiledb-r(de/sc-30201/enumerated_types)$ 

This PR has now been rebased on the central branch with its dependency on the first RC release of TileDB Core 2.17.0.

@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #30201: [R] Support for enumerated types.

@eddelbuettel eddelbuettel marked this pull request as draft June 23, 2023 21:49
@eddelbuettel eddelbuettel requested a review from davisp June 23, 2023 21:50
@johnkerl
Copy link
Contributor

[sc-30316]

@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #30316: Enumerated data types AKA categoricals AKA factors.

@eddelbuettel eddelbuettel changed the title [WIP] [No Merge] [Needs Not-Yet-Merged Branch in Core] Support for enumerated types [WIP] [No Merge Yet] Support for enumerated types Jul 21, 2023
@eddelbuettel eddelbuettel force-pushed the de/sc-30201/enumerated_types branch 3 times, most recently from 02fe7fa to ffee01e Compare July 31, 2023 22:44
@eddelbuettel eddelbuettel marked this pull request as ready for review July 31, 2023 22:45
@eddelbuettel
Copy link
Member Author

eddelbuettel commented Jul 31, 2023

I have taken the 'draft' status off as this is now fairly featureful but we still have to wait for the TileDB Embedded 2.17.0 release to have enumeration support in the core library so that it can be used here -- for now the tests are all skipped in CI as we are only to to 2.16.1 which does not included enumeration support.

A quick demo with enumeration support including query conditions on enum and non-enum columns:

> library(tiledb)
TileDB R 0.20.1.4 with TileDB Embedded 2.17.0 on Ubuntu 23.04.
See https://tiledb.com for more information about TileDB.
> uri <- "mem://penguins"
> fromDataFrame(palmerpenguins::penguins, uri)
> arr <- tiledb_array(uri, extended=FALSE, return_as="data.table")
> query_condition(arr) <- parse_query_condition(year == 2009 && sex == male && species == Gentoo, ta = arr)
> arr[]
    species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g    sex  year
     <fctr> <fctr>          <num>         <num>             <int>       <int> <fctr> <int>
 1:  Gentoo Biscoe           52.5          15.6               221        5450   male  2009
 2:  Gentoo Biscoe           50.0          15.9               224        5350   male  2009
 3:  Gentoo Biscoe           50.8          17.3               228        5600   male  2009
 4:  Gentoo Biscoe           51.3          14.2               218        5300   male  2009
 5:  Gentoo Biscoe           52.1          17.0               230        5550   male  2009
 6:  Gentoo Biscoe           52.2          17.1               228        5400   male  2009
 7:  Gentoo Biscoe           49.5          16.1               224        5650   male  2009
 8:  Gentoo Biscoe           50.8          15.7               226        5200   male  2009
 9:  Gentoo Biscoe           49.4          15.8               216        4925   male  2009
10:  Gentoo Biscoe           51.1          16.5               225        5250   male  2009
11:  Gentoo Biscoe           55.9          17.0               228        5600   male  2009
12:  Gentoo Biscoe           49.1          15.0               228        5500   male  2009
13:  Gentoo Biscoe           46.8          16.1               215        5500   male  2009
14:  Gentoo Biscoe           53.4          15.8               219        5500   male  2009
15:  Gentoo Biscoe           48.1          15.1               209        5500   male  2009
16:  Gentoo Biscoe           49.8          15.9               229        5950   male  2009
17:  Gentoo Biscoe           51.5          16.3               230        5500   male  2009
18:  Gentoo Biscoe           55.1          16.0               230        5850   male  2009
19:  Gentoo Biscoe           48.8          16.2               222        6000   male  2009
20:  Gentoo Biscoe           50.4          15.7               222        5750   male  2009
21:  Gentoo Biscoe           49.9          16.1               213        5400   male  2009
    species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g    sex  year
> 

@eddelbuettel eddelbuettel changed the title [WIP] [No Merge Yet] Support for enumerated types Support for enumerated types Sep 6, 2023
Copy link
Contributor

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddelbuettel do we need to bump to 0.20.3.(n+1) in DESCRIPTION?

@eddelbuettel
Copy link
Member Author

eddelbuettel commented Sep 6, 2023

Six of one ... but may as well. It is marked as 'bigger than 0.20.3.1' which was this morning's status quo. So .2 works for me (signifying 2.17.0-rc0). Can make it .3 if that makes you happier but a >= .2 should already do.

Will update NEWS.md as well.

@eddelbuettel eddelbuettel merged commit 6e27096 into master Sep 6, 2023
1 check passed
@eddelbuettel eddelbuettel deleted the de/sc-30201/enumerated_types branch September 6, 2023 21:09
@eddelbuettel eddelbuettel mentioned this pull request Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants