New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read/write dm as csv/zip(csv)/xlsx #485
Conversation
Nice! To the questions:
We should also integrate with {shard}. |
Excuse my ignorance, but is what you mean this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have compound keys, need to adapt.
I'd prefer using existing methods for enumerating primary and foreign keys.
csv_files <- list.files(csv_directory) | ||
# compress the file ("-j" junks the path to the file) | ||
|
||
zip( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect {zip} might work better here: https://cran.r-project.org/web/packages/zip/index.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe if it works it's more platform agnostic, but in the first test (which works with utils::zip()
), I get the error:
Error in zip_internal(zipfile, files, recurse, compression_level, append = FALSE, :
zip error: `Cannot add file `/var/folders/x3/ndmkxk1j2wn0mx2pw9v9httm0000gn/T//RtmpmKOSNA/dm_zip_1149248ed84b/___coltypes_file_dm.csv` to archive `___test_path/dm.zip`` in file `zip.c:348`
Not sure if it's worth the effort to try finding the source of this problem.
|
||
if (file.exists(xlsx_file_path)) { | ||
if (overwrite) { | ||
message(glue::glue("Overwriting file {tick(xlsx_file_path)}.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we mute this message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to be able to mute this? i.e., how often will users recreate/overwrite files? (honestly not sure)
We could add a quiet
argument
"POSIXct" | ||
) | ||
|
||
convert_all_times_to_utc <- function(table_list, col_class_table) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this within the scope of the function? Will it affect the roundtrip, or only the snapshot tests, if we omit UTC conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several problems:
- unfortunately it's not possible to read timezones via
readr::read_csv()
. Therefore, it seems much safer to convert to UTC and to inform when writing the csv files. - for xlsx: actually
writexl::write_xlsx()
does the conversion to UTC quietly itself anyway. In this case we would not need to perform the conversion, but just inform the user.
I think it's not harmful to leave it as is, but I am open to suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- unfortunately it's not possible to read timezones via readr::read_csv(). Therefore, it seems much safer to convert to UTC and to inform when writing the csv file
I was too quick to claim that:
https://readr.tidyverse.org/articles/locales.html
It is actually possible to steer the timezone with the argument locale
in readr::read_csv()
. Not sure if making use of this possibility improves the transparency of our functions (mainly if it doesn't work with xlsx)
c("Converting the datetime values for the following column(s) to timezone `UTC`:\n", | ||
glue::glue("{paste0(to_convert$table, '$', to_convert$column, collapse = '\n')}")) | ||
) | ||
table_list <- reduce2(to_convert$table, to_convert$column, function(tables, table, column) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mutate(across())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe, but my thought was that it's good to inform users which columns are converted. And since then we know those columns, we can make those changes as well explicitly.
) | ||
|
||
convert_all_times_to_utc <- function(table_list, col_class_table) { | ||
if (any(col_class_table$class %in% c("POSIXlt", "POSIXct"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this won't pick up subclasses of "POSIXlt"
or "POSIXct"
? Not sure if this is relevant though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as it is implemented, just the following types are supported: character
, Date
, integer
, logical
, numeric
, POSIXct
, POSIXlt
.
If if turns out that there are useful further classes that should be supported, I would suggest these should be added in future PRs.
We also want |
Co-authored-by: Kirill Müller <krlmlr@users.noreply.github.com>
Compound works now |
for the functions from {readr}, {readxl}, {writexl}? good idea. Even though there is some implementation of it in main, shall we wait for the merge? |
This looks good, I'd like to play with it before merging. Is this blocking another project? If we had to choose between csv, zip and xlsx, what would be the preference? |
Sure, take your time. It's not blocking anything - at least not for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it locally. I think we need to adapt it to the case where a foreign key is linked to a non-primary key, also for the case of compound keys.
Let's wait for #517, it will be easier to serialize the output of dm_meta()
This is mostly new code, could also be moved elsewhere. |
Closing for now, added a reference to the issue. |
dm_write_csv(dm, csv_directory)
: writedm
as collection ofcsv
-filesdm_read_csv(csv_directory)
: readdm
from directory created usingdm_write_csv()
dm_write_zip(dm, zip_file_path = "dm.zip", overwrite = FALSE)
: same as csv, but zipped.dm_read_zip(zip_file_path)
dm_write_xlsx(dm, xlsx_file_path = "dm.xlsx", overwrite = FALSE)
dm_read_xlsx(xlsx_file_path)
I am prepared for a longer wait until the conclusion of this PR, since there might be a few things to discuss.
For example:
dm
to a file/directory?closes #276