Skip to content

Support strings_as_factors option#367

Merged
eddelbuettel merged 2 commits intomasterfrom
de/sc-14685/strings_as_factors
Feb 16, 2022
Merged

Support strings_as_factors option#367
eddelbuettel merged 2 commits intomasterfrom
de/sc-14685/strings_as_factors

Conversation

@eddelbuettel
Copy link
Copy Markdown
Contributor

This PR adds an optional binary toggle strings_as_factors along with getters and setters to permit conversion of character columns to factors.

The default value is the value of the (existing base R) option("stringsAsFactors"). A quick demonstration on the 'penguins' dataset (with a added index column) shows how the three character column are converted which makes for example summary() for informative for counts:

edd@rob:~/git/tiledb-r(de/sc-14685/strings_as_factors)$ r -ltiledb -e'arr <- tiledb_array("/tmp/tiledb/penguinsNew", strings_as_factors=FALSE); res <- arr[]; print(summary(res))'
 __tiledb_rows     species             island          bill_length_mm bill_depth_mm  flipper_length_mm  body_mass_g       sex                 year     
 Min.   :  1.0   Length:344         Length:344         Min.   :32.1   Min.   :13.1   Min.   :172       Min.   :2700   Length:344         Min.   :2007  
 1st Qu.: 86.8   Class :character   Class :character   1st Qu.:39.2   1st Qu.:15.6   1st Qu.:190       1st Qu.:3550   Class :character   1st Qu.:2007  
 Median :172.5   Mode  :character   Mode  :character   Median :44.5   Median :17.3   Median :197       Median :4050   Mode  :character   Median :2008  
 Mean   :172.5                                         Mean   :43.9   Mean   :17.2   Mean   :201       Mean   :4202                      Mean   :2008  
 3rd Qu.:258.2                                         3rd Qu.:48.5   3rd Qu.:18.7   3rd Qu.:213       3rd Qu.:4750                      3rd Qu.:2009  
 Max.   :344.0                                         Max.   :59.6   Max.   :21.5   Max.   :231       Max.   :6300                      Max.   :2009  
                                                       NA's   :2      NA's   :2      NA's   :2         NA's   :2                                       
edd@rob:~/git/tiledb-r(master)$ r -ltiledb -e'arr <- tiledb_array("/tmp/tiledb/penguinsNew", strings_as_factors=TRUE); res <- arr[]; print(summary(res))'
 __tiledb_rows        species          island    bill_length_mm bill_depth_mm  flipper_length_mm  body_mass_g       sex           year     
 Min.   :  1.0   Adelie   :152   Biscoe   :168   Min.   :32.1   Min.   :13.1   Min.   :172       Min.   :2700   female:165   Min.   :2007  
 1st Qu.: 86.8   Chinstrap: 68   Dream    :124   1st Qu.:39.2   1st Qu.:15.6   1st Qu.:190       1st Qu.:3550   male  :168   1st Qu.:2007  
 Median :172.5   Gentoo   :124   Torgersen: 52   Median :44.5   Median :17.3   Median :197       Median :4050   NA's  : 11   Median :2008  
 Mean   :172.5                                   Mean   :43.9   Mean   :17.2   Mean   :201       Mean   :4202                Mean   :2008  
 3rd Qu.:258.2                                   3rd Qu.:48.5   3rd Qu.:18.7   3rd Qu.:213       3rd Qu.:4750                3rd Qu.:2009  
 Max.   :344.0                                   Max.   :59.6   Max.   :21.5   Max.   :231       Max.   :6300                Max.   :2009  
                                                 NA's   :2      NA's   :2      NA's   :2         NA's   :2                                 
edd@rob:~/git/tiledb-r(master)$ 

@shortcut-integration
Copy link
Copy Markdown

This pull request has been linked to Shortcut Story #14685: Support strings_as_factors for character columns.

Copy link
Copy Markdown
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice addition!

Comment thread R/TileDBArray.R
timestamp_end = as.POSIXct(double(), origin="1970-01-01"),
return_as = get_return_as_preference(),
query_statistics = FALSE,
strings_as_factors = getOption("stringsAsFactors", FALSE),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart to use the global option here!

Copy link
Copy Markdown
Contributor Author

@eddelbuettel eddelbuettel Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 😀 It's a not-uncommon pattern I picked up 'somewhere' along the way; I do recall at some point a few years back Duncan Temple-Lang helped with a snippet like this. The fallback in getOption() is also golden.

@eddelbuettel eddelbuettel merged commit 39cd8e2 into master Feb 16, 2022
@eddelbuettel eddelbuettel deleted the de/sc-14685/strings_as_factors branch February 16, 2022 14:07
@eddelbuettel eddelbuettel mentioned this pull request Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants