read/write dm as csv/zip(csv)/xlsx #485

TSchiefer · 2021-03-03T14:17:32Z

dm_write_csv(dm, csv_directory): write dm as collection of csv-files
dm_read_csv(csv_directory): read dm from directory created using dm_write_csv()
dm_write_zip(dm, zip_file_path = "dm.zip", overwrite = FALSE): same as csv, but zipped.
dm_read_zip(zip_file_path)
dm_write_xlsx(dm, xlsx_file_path = "dm.xlsx", overwrite = FALSE)
dm_read_xlsx(xlsx_file_path)

I am prepared for a longer wait until the conclusion of this PR, since there might be a few things to discuss.

For example:

should we already prepare for compound keys.
will it work for all (important) column classes? what will be the default if it doesn't work?
could there be some way to support writing remote dm to a file/directory?

closes #276

krlmlr · 2021-03-03T15:44:08Z

Nice! To the questions:

Yes, absolutely.
Failure, perhaps with extensibility option later.
Looks very niche -- if it fit into RAM, we can collect(), if not then what's the purpose to save as CSV?

We should also integrate with {shard}.

R/read-write.R

TSchiefer · 2021-03-04T07:42:37Z

We should also integrate with {shard}.

Excuse my ignorance, but is what you mean this?
If yes, maybe in another PR?

krlmlr

We now have compound keys, need to adapt.

I'd prefer using existing methods for enumerating primary and foreign keys.

R/error-helpers.R

R/read-write.R

krlmlr · 2021-07-05T03:05:50Z

R/read-write.R

+  csv_files <- list.files(csv_directory)
+  # compress the file ("-j" junks the path to the file)
+
+  zip(


I suspect {zip} might work better here: https://cran.r-project.org/web/packages/zip/index.html.

maybe if it works it's more platform agnostic, but in the first test (which works with utils::zip()), I get the error:

Error in zip_internal(zipfile, files, recurse, compression_level, append = FALSE, : zip error: `Cannot add file `/var/folders/x3/ndmkxk1j2wn0mx2pw9v9httm0000gn/T//RtmpmKOSNA/dm_zip_1149248ed84b/___coltypes_file_dm.csv` to archive `___test_path/dm.zip`` in file `zip.c:348`

Not sure if it's worth the effort to try finding the source of this problem.

krlmlr · 2021-07-05T03:06:25Z

R/read-write.R

+
+  if (file.exists(xlsx_file_path)) {
+    if (overwrite) {
+      message(glue::glue("Overwriting file {tick(xlsx_file_path)}."))


How can we mute this message?

Is it necessary to be able to mute this? i.e., how often will users recreate/overwrite files? (honestly not sure)
We could add a quiet argument

krlmlr · 2021-07-05T03:08:17Z

R/read-write.R

+  "POSIXct"
+)
+
+convert_all_times_to_utc <- function(table_list, col_class_table) {


Is this within the scope of the function? Will it affect the roundtrip, or only the snapshot tests, if we omit UTC conversion?

Several problems:

unfortunately it's not possible to read timezones via readr::read_csv(). Therefore, it seems much safer to convert to UTC and to inform when writing the csv files.

for xlsx: actually writexl::write_xlsx() does the conversion to UTC quietly itself anyway. In this case we would not need to perform the conversion, but just inform the user.

I think it's not harmful to leave it as is, but I am open to suggestions.

unfortunately it's not possible to read timezones via readr::read_csv(). Therefore, it seems much safer to convert to UTC and to inform when writing the csv file

I was too quick to claim that:
https://readr.tidyverse.org/articles/locales.html
It is actually possible to steer the timezone with the argument locale in readr::read_csv(). Not sure if making use of this possibility improves the transparency of our functions (mainly if it doesn't work with xlsx)

krlmlr · 2021-07-05T03:08:25Z

R/read-write.R

+      c("Converting the datetime values for the following column(s) to timezone `UTC`:\n",
+        glue::glue("{paste0(to_convert$table, '$', to_convert$column, collapse = '\n')}"))
+    )
+    table_list <- reduce2(to_convert$table, to_convert$column, function(tables, table, column) {


mutate(across()) ?

maybe, but my thought was that it's good to inform users which columns are converted. And since then we know those columns, we can make those changes as well explicitly.

krlmlr · 2021-07-05T03:09:56Z

R/read-write.R

+)
+
+convert_all_times_to_utc <- function(table_list, col_class_table) {
+  if (any(col_class_table$class %in% c("POSIXlt", "POSIXct"))) {


It looks like this won't pick up subclasses of "POSIXlt" or "POSIXct"? Not sure if this is relevant though.

as it is implemented, just the following types are supported: character, Date, integer, logical, numeric, POSIXct, POSIXlt.
If if turns out that there are useful further classes that should be supported, I would suggest these should be added in future PRs.

krlmlr · 2021-07-05T03:24:41Z

We also want check_suggested() from #572.

Co-authored-by: Kirill Müller <krlmlr@users.noreply.github.com>

TSchiefer · 2021-07-20T11:08:50Z

Compound works now

TSchiefer · 2021-07-20T11:17:40Z

We also want check_suggested() from #572.

for the functions from {readr}, {readxl}, {writexl}? good idea. Even though there is some implementation of it in main, shall we wait for the merge?

krlmlr · 2021-07-25T18:56:10Z

This looks good, I'd like to play with it before merging. Is this blocking another project? If we had to choose between csv, zip and xlsx, what would be the preference?

TSchiefer · 2021-07-26T07:56:00Z

This looks good, I'd like to play with it before merging. Is this blocking another project? If we had to choose between csv, zip and xlsx, what would be the preference?

Sure, take your time. It's not blocking anything - at least not for me.
xlsx is nice, since it's just one file (as opposed to csv) and one can still easily get an overview of the contents (as opposed to zip). Obvious disadvantage is the requirement of MS Excel (unless you're not interested in looking at the file in Excel and just want to restore it later with dm_read_xlsx()).

krlmlr

I tested it locally. I think we need to adapt it to the case where a foreign key is linked to a non-primary key, also for the case of compound keys.

Let's wait for #517, it will be easier to serialize the output of dm_meta()

krlmlr · 2023-08-15T04:08:01Z

This is mostly new code, could also be moved elsewhere.

krlmlr · 2023-08-20T15:26:49Z

Closing for now, added a reference to the issue.

TSchiefer added 19 commits March 2, 2021 15:55

dm_write_zip() and dm_read_zip()

d118434

functions 'dm_write_csv()' and 'dm_read_csv()'

8929c18

tweaks

4190a51

errors and fixmes

0a4a35a

abort with correct error but also remove directory in case of error

473db9f

rename write.R -> read-write.R

c5b8334

can also chose an empty directory for csv-files

32b7499

tweak

42a5b94

tests for csv- and zip-read/write

89cc58a

tweaks, edge cases, errors

ca7ce34

move functions to be exported

a359c51

move prepare tables

6860108

dm_read_xlsx(), tweaks, tests and errors

53aa929

new suggestions

ac8a1e8

oops

8144ef0

document

eb498fa

add global variables

a82c702

update _pkgdown.yml

239a807

don't run examples

eda3b39

TSchiefer requested a review from krlmlr March 3, 2021 14:17

krlmlr reviewed Mar 3, 2021

View reviewed changes

R/read-write.R Outdated Show resolved Hide resolved

R/read-write.R Outdated Show resolved Hide resolved

R/read-write.R Outdated Show resolved Hide resolved

R/read-write.R Outdated Show resolved Hide resolved

krlmlr reviewed Mar 3, 2021

View reviewed changes

R/read-write.R Outdated Show resolved Hide resolved

TSchiefer added 6 commits March 4, 2021 09:05

review comments: write_*() returns dm invisibly

33f7df7

simpler

61a15c8

review comments: rename prepare_tables -> prepare_tbls_for_def_and_class

7f5d732

review comment: removing default zip- and xlsx-paths

0a4bb58

distinguish between supported for xlsx and csv

67f7b9d

tweak tests

94642c5

krlmlr added 2 commits July 5, 2021 04:58

Style

d292f9d

Avoid partial argument matching

a665917

krlmlr reviewed Jul 5, 2021

View reviewed changes

TSchiefer and others added 12 commits July 19, 2021 12:43

abort() -> stop() (review suggestion)

c85b7c0

Co-authored-by: Kirill Müller <krlmlr@users.noreply.github.com>

Merge branch 'main' into f-276-dm-to-csv-or-xlsx-locally

88847f5

no hyphen for file types (review suggestion)

58cbe3f

add missing parentheses (review suggestion)

622dbdd

new error for remote dm-con

de4a607

do not use column 'segment'

d3481d8

test error dispatch on DB

78dd5ef

error in case a table is missing

25e7a76

adapt for compound key support

c6832c1

avoid deprecation warnings

c1c821a

maybe avoids note

7237804

avoid R CMD Check problems

9918b0c

TSchiefer requested a review from krlmlr July 20, 2021 11:08

krlmlr requested a review from moodymudskipper September 10, 2021 03:22

Merge branch 'main' into f-276-dm-to-csv-or-xlsx-locally

65e46fd

krlmlr reviewed Oct 12, 2021

View reviewed changes

TSchiefer mentioned this pull request Jul 5, 2022

Store dm as xlsx or collection of csv files (zip) #276

Open

krlmlr mentioned this pull request Jul 11, 2022

Explore dm/Airtable links #1224

Open

krlmlr marked this pull request as draft August 15, 2023 04:07

krlmlr closed this Aug 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read/write dm as csv/zip(csv)/xlsx #485

read/write dm as csv/zip(csv)/xlsx #485

TSchiefer commented Mar 3, 2021

krlmlr commented Mar 3, 2021

TSchiefer commented Mar 4, 2021

krlmlr left a comment

krlmlr Jul 5, 2021

TSchiefer Jul 19, 2021

krlmlr Jul 5, 2021

TSchiefer Jul 19, 2021

krlmlr Jul 5, 2021

TSchiefer Jul 20, 2021

TSchiefer Jul 20, 2021

krlmlr Jul 5, 2021

TSchiefer Jul 20, 2021

krlmlr Jul 5, 2021

TSchiefer Jul 19, 2021

krlmlr commented Jul 5, 2021

TSchiefer commented Jul 20, 2021

TSchiefer commented Jul 20, 2021

krlmlr commented Jul 25, 2021

TSchiefer commented Jul 26, 2021 •

edited

krlmlr left a comment

krlmlr commented Aug 15, 2023

krlmlr commented Aug 20, 2023 •

edited

read/write dm as csv/zip(csv)/xlsx #485

read/write dm as csv/zip(csv)/xlsx #485

Conversation

TSchiefer commented Mar 3, 2021

krlmlr commented Mar 3, 2021

TSchiefer commented Mar 4, 2021

krlmlr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krlmlr commented Jul 5, 2021

TSchiefer commented Jul 20, 2021

TSchiefer commented Jul 20, 2021

krlmlr commented Jul 25, 2021

TSchiefer commented Jul 26, 2021 • edited

krlmlr left a comment

Choose a reason for hiding this comment

krlmlr commented Aug 15, 2023

krlmlr commented Aug 20, 2023 • edited

TSchiefer commented Jul 26, 2021 •

edited

krlmlr commented Aug 20, 2023 •

edited