New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Update docs to clarify that stringsAsFactors isn't relevant for parquet/feather #24055
Comments
Neal Richardson / @nealrichardson: |
Keith Hughitt / @khughitt:
The down-side is still that the rest of the R ecosystem
I'm not sure what the best solution is here. In principle, I agree that the current behavior is the most sensible, so perhaps it is just a matter of educating the community to be aware of these differences when working with filetypes that are able to properly encode factor variables.
Perhaps just including a note in the |
Francois Saint-Jacques / @fsaintjacques: |
Neal Richardson / @nealrichardson: Leaving aside the merits of the > options(stringsAsFactors=FALSE)
> tf <- tempfile()
> saveRDS(iris, tf)
> str(readRDS(tf))
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
> iris$Species <- as.character(iris$Species)
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : chr "setosa" "setosa" "setosa" "setosa" ...
> saveRDS(iris, tf)
> str(readRDS(tf))
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : chr "setosa" "setosa" "setosa" "setosa" ... |
Keith Hughitt / @khughitt:
I agree that this is the expected and desired behavior. I can close both this and the related "read_feather()" issue I reported.
Do you think it's worth including a note in the docs for the methods to caution users who aren't familiar with parquet/feather's handling of column types?
It's true that most users should already have some experience with this with
Your call though. Either way, I appreciate you taking the time to respond and clarify the important differences between the methods. |
Neal Richardson / @nealrichardson: |
Neal Richardson / @nealrichardson: |
Same issue as reported for feather::read_feather (#24054);
For the R arrow package, the "read_parquet()" function currently does not respect "options(stringsAsFactors = FALSE)", leading to unexpected/inconsistent behavior.
Example:
Versions:
R 3.6.2
arrow_0.15.1.9000
Environment: Linux 64-bit 5.4.15
Reporter: Keith Hughitt / @khughitt
Assignee: Neal Richardson / @nealrichardson
Related issues:
Note: This issue was originally created as ARROW-7825. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: