Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding #207

Merged
merged 16 commits into from
May 29, 2022
Merged

Encoding #207

merged 16 commits into from
May 29, 2022

Conversation

JanMarvin
Copy link
Owner

@JanMarvin JanMarvin commented May 26, 2022

Encoding ... my old friend. Been wrangling with it for some time now, it's still a draft and I still don't know what exactly is going wrong inside :)

It's working. Only negative side effect I see right now: saving larger files has become slower. Previously we simply threw everything bit by bit into a sheet.xml. Now we construct everything in Rcpp as a single character, return it to R, pass this file back to Rcpp and pugi. Here we load it, apply the encoding and save it. Therefore in my benchmark saving is now on par with openxlsx. Guess we can bring that down again, if we construct it in a XML pointer and pass this to the function that saves the file. Something to figure out in a follow up PR.

session

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] openxlsx2_0.2.0.9000

loaded via a namespace (and not attached):
[1] compiler_4.1.2 magrittr_2.0.3 R6_2.5.1       tools_4.1.2    Rcpp_1.0.8.3   stringi_1.7.6  zip_2.2.0

test

library(openxlsx2)
# to test

wb <- wb_workbook()$
  add_worksheet("höhö")$
  add_data("höhö", "bää", colNames = FALSE)

c1 <- create_comment(text = "this is ä cömment", author = "öpenxlsx2 äüthör")
write_comment(wb, 1, col = "B", row = 10, comment = c1)

# file is saved correctly and everything looks fine
wb$save("test0.xlsx")

wb$sheet_names
names(wb)
wb_to_df(wb)

# file is imported incorrectly
wb1 <- wb_load("test0.xlsx")

names(wb1)
wb1$sheet_names
wb_to_df(wb1)
# wb1$comments

wb1$save("test1.xlsx")

unlink("test0", recursive = TRUE)
unlink("test1", recursive = TRUE)
unzip("test0.xlsx", exdir = "test0")
unzip("test1.xlsx", exdir = "test1")

…vironment. This needs further clean ups and additional fixes for writing.

Currently this breaks reading and writing on utf8 too. Needs some additional option when to set which pugi::encoding.
@JanMarvin JanMarvin marked this pull request as draft May 26, 2022 16:42
@JanMarvin
Copy link
Owner Author

Maybe I have a working version, after 10+h of starring at the computer and old R and old Windows problems (a fix for the past)

A strange thing I've encountered, somehow the comment got duplicated. Either it is written incorrectly or something else is broken.

library(openxlsx2)
# to test
wb <- wb_workbook()$
  add_worksheet("höhö")$
  add_data("höhö", "bää", colNames = FALSE)

c1 <- create_comment(text = "this is ä cömment", author = "öpenxlsx2 äüthör")
write_comment(wb, 1, col = "B", row = 10, comment = c1)

names(wb)
wb$sheet_names
wb_to_df(wb)

wb$save("test0.xlsx")

# new xlsx file
wb <- wb_load("test0.xlsx")

names(wb1)
wb$sheet_names
wb_to_df(wb)

wb$save("test1.xlsx")

unlink("test0", recursive = TRUE)
unlink("test1", recursive = TRUE)
unzip("test0.xlsx", exdir = "test0")
unzip("test1.xlsx", exdir = "test1")

@codecov-commenter
Copy link

codecov-commenter commented May 26, 2022

Codecov Report

Merging #207 (bed4d3d) into main (2a457f4) will increase coverage by 0.14%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #207      +/-   ##
==========================================
+ Coverage   83.24%   83.39%   +0.14%     
==========================================
  Files          36       36              
  Lines        7414     7425      +11     
==========================================
+ Hits         6172     6192      +20     
+ Misses       1242     1233       -9     
Impacted Files Coverage Δ
R/class-hyperlink.R 79.06% <ø> (-0.48%) ⬇️
R/helperFunctions.R 77.77% <ø> (-0.74%) ⬇️
R/class-workbook.R 77.35% <100.00%> (-0.03%) ⬇️
R/class-worksheet.R 77.27% <100.00%> (ø)
R/pugixml.R 100.00% <100.00%> (ø)
R/wb_load.R 95.76% <100.00%> (ø)
src/helper_functions.cpp 98.07% <100.00%> (ø)
src/load_workbook.cpp 97.22% <100.00%> (+0.01%) ⬆️
src/openxlsx2.h 100.00% <100.00%> (+100.00%) ⬆️
src/pugi.cpp 100.00% <100.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2a457f4...bed4d3d. Read the comment docs.

@JanMarvin JanMarvin marked this pull request as ready for review May 26, 2022 19:30
@JanMarvin
Copy link
Owner Author

this only needs minor cleanups, otherwise it should be good to go. confirmed at my work PC with some antique xlsm file. unfortunately it will not make the broken Rstudio go away 😄

@JanMarvin JanMarvin added enhancement 😀 New feature or request Windows 🪟 Windows related issues encoding 🔠 Encoding while reading or writing is disturbed labels May 27, 2022
JanMarvin and others added 2 commits May 27, 2022 20:31
* another stringi function
* switch back to print(" "); unintended development mixup
* remove commented code
@JanMarvin JanMarvin merged commit 587b96e into main May 29, 2022
@JanMarvin JanMarvin deleted the encoding branch May 29, 2022 09:29
@JanMarvin JanMarvin changed the title [WIP] Encoding Encoding May 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
encoding 🔠 Encoding while reading or writing is disturbed enhancement 😀 New feature or request Windows 🪟 Windows related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants