Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One note on CRAN about package size on Solaris #3058

Closed
mattdowle opened this issue Sep 21, 2018 · 7 comments
Closed

One note on CRAN about package size on Solaris #3058

mattdowle opened this issue Sep 21, 2018 · 7 comments
Milestone

Comments

@mattdowle
Copy link
Member

One note on CRAN.
image

I guess the binary is slightly larger on Solaris for some reason and has just tipped it over 5MB limit to 5.1MB. The datatable.so for me locally is 430KB. Looks like it's twice that (1.1MB) on Solaris since that's the only file that libs/ contains. Or it's possible that libs/ on Solaris contains another file.

Therefore, focussing on tests :

$ cd ~/GitHub/data.table/inst/tests
$ du -h * | sort -h
... snip ...
108K    issue_2157_sampling_reached_eof_early.txt
140K    allchar.csv
180K    winallquoted.csv
284K    grr.csv
664K    tests.Rraw

We can't reduce tests.Rraw (12,257 lines of tests) but it may be possible to reduce the next 4 largest data files, potentially saving up to 108+140+180+284 = 712K.

@mattdowle mattdowle added this to the 1.12.0 milestone Sep 21, 2018
@mattdowle
Copy link
Member Author

mattdowle commented Sep 21, 2018

CRAN has replied

I think you should not worry about this part.

I take this as meaning don't spend any time on it now (not even for next release) and we'll look again at it in future if and when the size exceeds 5MB on Windows or Linux, not just Solaris.

@mattdowle
Copy link
Member Author

For completeness, I looked at the 4 largest test data files above, briefly. They are non-trivial to reduce in size. They need to be large enough to trigger fread sampling and are related to jump points / handovers. The risk in reducing their size would be no longer testing the desired edge cases.

@MichaelChirico
Copy link
Member

MichaelChirico commented Sep 27, 2018 via email

@mattdowle
Copy link
Member Author

Combining you mean like stacking? They're all different format.
Internet connection maybe ... there's something about data-only packages being allowed to be bigger?

@MichaelChirico
Copy link
Member

more like merging?

idea being to end up with one large file that covers all the edge case grounds intended by the four files we have now....

taking a glance I guess the logic of allchar.csv could be merged in another file? and possibly the same with the issue_2157 file?

will be a time sink but maybe worth considering if we eventually are asked to shrink...

@mattdowle
Copy link
Member Author

mattdowle commented Sep 27, 2018

Ok I see. That's in theory possible (0.01% chance) I guess. But a time sink as you say, and risk.

@mattdowle
Copy link
Member Author

mattdowle commented Sep 29, 2018

Finally realized the obvious solution: compress those 4 files. That would save 608KB and give us plenty of breathing space. Then fread would just need to accept .gz files directly.

$ ls -lrth
total 712K
-rw-r--r-- 1 mdowle mdowle 138K Sep 28 17:13 allchar.csv
-rw-r--r-- 1 mdowle mdowle 180K Sep 28 17:13 winallquoted.csv
-rw-r--r-- 1 mdowle mdowle 283K Sep 28 17:13 grr.csv
-rw-r--r-- 1 mdowle mdowle 105K Sep 28 17:13 issue_2157_sampling_reached_eof_early.txt
$ ls -lrth
total 104K
-rw-r--r-- 1 mdowle mdowle 39K Sep 28 17:13 allchar.csv.gz
-rw-r--r-- 1 mdowle mdowle 16K Sep 28 17:13 winallquoted.csv.gz
-rw-r--r-- 1 mdowle mdowle 18K Sep 28 17:13 grr.csv.gz
-rw-r--r-- 1 mdowle mdowle 27K Sep 28 17:13 issue_2157_sampling_reached_eof_early.txt.gz

@mattdowle mattdowle reopened this Sep 29, 2018
@mattdowle mattdowle modified the milestones: 1.12.0, 1.11.8 Sep 29, 2018
@mattdowle mattdowle mentioned this issue Sep 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants