Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of zlib fwrite dependency #3872

Closed
MichaelChirico opened this issue Sep 14, 2019 · 21 comments · Fixed by #3951
Closed

Handling of zlib fwrite dependency #3872

MichaelChirico opened this issue Sep 14, 2019 · 21 comments · Fixed by #3951
Labels
Milestone

Comments

@MichaelChirico
Copy link
Member

#include <zlib.h> might fail on some machines. In this case it would be helpful to fail more helpfully.

Currently on rocker/r-ver:3.5.3 (e.g.), we get the following installation failure for dev data.table:

install.packages('data.table', type = 'source', repos='http://Rdatatable.github.io/data.table')
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
trying URL 'http://Rdatatable.github.io/data.table/src/contrib/data.table_1.12.3.tar.gz'
Content type 'application/gzip' length 4960958 bytes (4.7 MB)
==================================================
downloaded 4.7 MB

* installing *source* package ‘data.table’ ...
** libs
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c assign.c -o assign.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c between.c -o between.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c bmerge.c -o bmerge.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c chmatch.c -o chmatch.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c cj.c -o cj.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c coalesce.c -o coalesce.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c dogroups.c -o dogroups.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c fastmean.c -o fastmean.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c fcast.c -o fcast.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c fifelse.c -o fifelse.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c fmelt.c -o fmelt.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c forder.c -o forder.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c frank.c -o frank.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c fread.c -o fread.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c freadR.c -o freadR.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c froll.c -o froll.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c frollR.c -o frollR.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c frolladaptive.c -o frolladaptive.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c fsort.c -o fsort.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG   -I/usr/local/include  -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c fwrite.c -o fwrite.o
fwrite.c:22:48: fatal error: zlib.h: No such file or directory
 #include "zlib.h"      // for writing gzip file
                                                ^
compilation terminated.
/usr/local/lib/R/etc/Makeconf:163: recipe for target 'fwrite.o' failed
make: *** [fwrite.o] Error 1
ERROR: compilation failed for package ‘data.table’
* removing ‘/usr/local/lib/R/site-library/data.table’
* restoring previous ‘/usr/local/lib/R/site-library/data.table’

The downloaded source packages are in
	‘/tmp/Rtmp7Po4yU/downloaded_packages’
Warning message:
In install.packages("data.table", type = "source", repos = "http://Rdatatable.github.io/data.table") :
  installation of package ‘data.table’ had non-zero exit status

(this doesn't affect the CRAN version since there's no fwrite+gz support there yet)

By contrast, here's how units package fails:

# install.packages('Rcpp') first on clean image
install.packages('units')
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
trying URL 'https://mran.microsoft.com/snapshot/2019-04-26/src/contrib/units_0.6-2.tar.gz'
Content type 'application/octet-stream' length 917316 bytes (895 KB)
==================================================
downloaded 895 KB

* installing *source* package ‘units’ ...
** package ‘units’ successfully unpacked and MD5 sums checked
configure: units: 0.6-2
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking how to run the C++ preprocessor... g++ -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for stdbool.h that conforms to C99... no
checking for _Bool... no
checking for error_at_line... yes
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for XML_ParserCreate in -lexpat... no
checking udunits2.h usability... no
checking udunits2.h presence... no
checking for udunits2.h... no
checking udunits2/udunits2.h usability... no
checking udunits2/udunits2.h presence... no
checking for udunits2/udunits2.h... no
checking for ut_read_xml in -ludunits2... no
configure: error: in `/tmp/Rtmp78wlOS/R.INSTALL3f75541cfe/units':
configure: error: 
--------------------------------------------------------------------------------
  Configuration failed because libudunits2.so was not found. Try installing:
    * deb: libudunits2-dev (Debian, Ubuntu, ...)
    * rpm: udunits2-devel (Fedora, EPEL, ...)
    * brew: udunits (OSX)
  If udunits2 is already installed in a non-standard location, use:
    --configure-args='--with-udunits2-lib=/usr/local/lib'
  if the library was not found, and/or:
    --configure-args='--with-udunits2-include=/usr/include/udunits2'
  if the header was not found, replacing paths with appropriate values.
  You can alternatively set UDUNITS2_INCLUDE and UDUNITS2_LIBS manually.
--------------------------------------------------------------------------------

See `config.log' for more details
ERROR: configuration failed for package ‘units’
* removing ‘/usr/local/lib/R/site-library/units’

The downloaded source packages are in
	‘/tmp/RtmpJB7VSi/downloaded_packages’
Warning message:
In install.packages("units") :
  installation of package ‘units’ had non-zero exit status

Quite useful & actionable!

I'm not sure how ubiquitous zlib is in the wild...

@MichaelChirico
Copy link
Member Author

It seems to do so we should add a configure script which can detect the missing zlib header:

@jangorecki
Copy link
Member

Could we compile without zlib support if it is not present? Then fwrite would error when attempting to use gz. This reduce OS software dependency, this is very important to stay lightweight.

@st-pasha
Copy link
Contributor

We had a similar problem in (py) datatable, and ended up just shipping the zlib.h and zconf.h with the source distribution (h2oai/datatable#1957).

@MichaelChirico MichaelChirico added this to the 1.12.4 milestone Sep 15, 2019
@jangorecki
Copy link
Member

jangorecki commented Sep 15, 2019

@st-pasha way of shipping zlib together with data.table is another option. Definitely better than current error. Yet IMO it is better to not ship it together, but escape support of gz fwrite when zlib was not present during compilation. I can imagine that not every user uses fwrite, and among them not many uses gz, ultimately zlib might be needed by a tiny fraction of data.table users.

@MichaelChirico
Copy link
Member Author

MichaelChirico commented Sep 16, 2019

Agree w Jan it's best if we can essentially make zlib a "Suggests" (optional), I just don't know how to do that at the C level. Seems the way we handle openmp could be a template but I couldn't quite make it work. Exploratory stuff here in case anyone sees how to take it over the line:

https://github.com/MichaelChirico/data.table/tree/zlib_suggests

@philippechataignon
Copy link
Contributor

philippechataignon commented Sep 16, 2019

zlib is required for R : that's why I think that there wasn't a problem for gzip support. But you're right: the headers (zlib1g-dev on Debian/Ubuntu) are not always present and are needed for compiling. On Debian zlib1g-dev is a r-base-dev dependency: that's why it's often present. In the initial example, I had a look at the dockerfile and zlib1g-dev is in BUILDDEPS : why zlib header is not found a compile time ?

I don't think that including zlib headers is a good idea because distribution binaries from zlib and included headers in data.table may be incompatibles. In this case, data.table must include headers and source of the zlib library.

I think configure is the best idea to detect if zlib is present and compile without zlib if not. OK for suggest. A warning can be displayed at compile time like package units do (I think curl, git2r and xml2 act in similar way).

Another advantage of configure is that it could be possible to add another compression method like lz4 and xz (which need external libs too). I've tested lz4 and on my machine : lz4 fwrite is fastest than csv fwrite. xz is interesting to have better ratio compressed (never implemented).

I don't know how to write a configure script. If  this solution is retained, I can try to see how to modify fwrite.c to use a ZLIB_PRESENT macro to have a fwrite version with or without gzip support.

philippechataignon pushed a commit that referenced this issue Sep 16, 2019
ZLIB_PRESENT must be defined if zlib (lib and headers) is present on the
system. If not, ZLIB_PRESENT is undefined and data.table is compiled
without zlib support.

Normally ZLIB_PRESENT (or an equivalent) will be defined (or not) by a configure
script.

See issue #3872.
philippechataignon pushed a commit to philippechataignon/data.table that referenced this issue Sep 16, 2019
HAVE_LIBZ must be defined if zlib (lib and headers) is present on the
system. If not,  HAVE_LIBZ is undefined and data.table is compiled
without zlib support.

Normally  HAVE_LIBZ (or an equivalent) will be defined (or not) by a configure
script (see ax_check_zlib).

See issue Rdatatable#3872.
@philippechataignon
Copy link
Contributor

This branch is an attempt to write a configure script to detect zlib. Feel free to test. Not a PR because it introduces changes in current data.table build process.

@jangorecki
Copy link
Member

@philippechataignon any chance for a unit test? It has to be conditionally escaped of course.
Also it seems more appropriate to error than attempt to write non-gzip file.

@jangorecki jangorecki added the dev label Sep 17, 2019
@jangorecki jangorecki changed the title Build could offer instructions for installing zlib Handling of zlib fwrite dependency Sep 18, 2019
@philippechataignon
Copy link
Contributor

I can't go further on the branch fwrite_zlib_present et for me it's OK but perhaps too late/too dangerous for 1.12.4. I've replaced warning by stop as suggests. I've manually tested fwrite on data.table compiled with/without HAVE_ZLIB flag forced in fwrite.c and it's OK but I don't know how to write a unit test to verify the message error when there's no zlib headers on the machine.

@MichaelChirico
Copy link
Member Author

MichaelChirico commented Sep 19, 2019

@jangorecki I think this is something Gitlab builds would have to do... as I see it zlib is basically a suggested dependency. Already we nocov suggested R package dependencies... must be all the harder to do coverage tests for Suggests on C lib

@philippechataignon currently on rocker/r-ver:3.5.3 I see:

------------------------- ANTICONF ERROR ---------------------------
Configuration failed because zlib was not found. Try installing:
 * deb: zlib1g-dev (Debian, Ubuntu, etc)
 * rpm: zlib-devel (Fedora, CentOS, RHEL)
If zlib is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a zlib.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘data.table’
* removing ‘/usr/local/lib/R/site-library/data.table’
Error in i.p(...) : 
  (converted from warning) installation of package ‘/tmp/RtmpZjnvPS/file124bbd766/data.table_1.12.3.tar.gz’ had non-zero exit status

Is it possible for this to be a warning & allow installation to continue?

I see (converted from warning) there but getOption('warn') is 0 on the session I'm installing from so that must be set from the installation script?

@philippechataignon
Copy link
Contributor

There was a exit 1 in configure script when R_CONFIG_ERROR is set. I removed it. Please can you test docker build again ? Thanks.

@mattdowle
Copy link
Member

mattdowle commented Sep 19, 2019

Looks great. Agree it's a bit risky/late for 1.12.4 so postponing to next release as suggested.

@mattdowle mattdowle modified the milestones: 1.12.4, 1.13.0 Sep 19, 2019
@jangorecki
Copy link
Member

jangorecki commented Sep 20, 2019

By postponding to 1.13.0 we can also get insight if there are many affected users. Assuming that zlib should be installed as r-base-dev dependency then we should consider this situation as abnormal. IMO it might not be worth to introduce configure script just to cover that abnormal situation, unless of course it affects many users.
@MichaelChirico maybe worth to report this issue in rocker/r-ver:3.5.3?

@MichaelChirico
Copy link
Member Author

has zlib always been a dependency for R? I'm not sure how to check

@philippechataignon
Copy link
Contributor

Here for R 3.3.0 (may 2016) : The previously included versions of zlib, bzip2, xz and PCRE have been removed, so suitable external (usually system) versions are required.

In R 3.2.0 (april 2015): configure options --with-system-zlib, --with-system-bzlib and --with-system-pcre are now the default. For the time being there is fallback to the versions included in the R sources if no system versions are found or (unlikely) if they are too old. Linux users should check that the -devel or -dev versions of packages zlib, bzip2/libbz2 and pcre as well as xz-devel/liblzma-dev (or similar names) are installed

Commit 2001-04-06 69db71eb0c has log message "require and build zlib and libzml (not built yet)" : zlib is a dependency for R since the beginning but R source includes zlib source if there is not system zlib.

@MichaelChirico
Copy link
Member Author

OK so noting based on the clarification from r-ver:

rocker-org/rocker-versioned#161

In order for users to be missing zlib when installing data.table, their machine has to have specifically had zlib support removed after R installation. Should be quite rare, I think the main exception being people using lean Dockerized environments such as r-ver -- and I think we're more comfortable devolving responsibility for installing to such users (especially with the note in the NEWS and we should add to Installation wiki as well).

@st-pasha
Copy link
Contributor

For the sake of clarity, when people speak of "zlib" library, they could mean 3 different things:

  • zlib so/dll files, allowing zlib to be dynamically linked during the runtime. I believe that this is the version of zlib that is most commonly present on a user's system, and it is also the one that R binary depends on.
  • zlib a/obj file, allowing zlib to be statically linked. This would only be needed if datatable wanted to statically link with that library.
  • zlib h files, allowing compiling of the code that uses zlib. These files may or may not be present on user's system, and I don't know whether R gives any promises that they will ship these header files together with R's installation. However I can see that on my Mac's R 3.5.2.2 these headers are not included.

@MichaelChirico
Copy link
Member Author

I don't know whether R gives any promises that they will ship these header files together with R's installation.

I think this is covered by Philippe's first note from R NEWS:

https://github.com/wch/r-source/blob/1a74f78a3c9e751f465b6eb27f7c1a8bf5ff2ada/doc/NEWS.Rd#L4056-L4059

@philippechataignon
Copy link
Contributor

NEWS says that zlib headers are required to compile R but when you install R from a binary package (very usual) there no need to have zlib headers present.

@mattdowle mattdowle modified the milestones: 1.12.7, 1.12.5 Oct 9, 2019
@vikram-rawat
Copy link

R-base-dev brings some tools with it so to ensure you can compile basic programmes.

And it also pulls in zlib1g-dev as on Debian/Ubuntu systems

So just install r-base-dev along with r-base-core should be enough

The 'zlib1g-dev' missing is an indication of r-base-dev not installed, and the first step (on Debian and Ubuntu) before building packages from source should be to haev r-base-dev.

Please mention this step in news section on wiki page

@mattdowle
Copy link
Member

mattdowle commented Oct 13, 2019

@vikram-rawat data.table has a configure now (v1.12.5+) which points the user to install zlib1g-dev directly. See https://github.com/Rdatatable/data.table/blob/master/configure.

  pkg-config did not detect zlib is installed. Try installing:
  * deb: zlib1g-dev (Debian, Ubuntu, ...)
  * rpm: zlib-devel (Fedora, EPEL, ...)
  * brew: zlib (OSX)

For others finding this in future, only those installing from source need this. Windows and MacOS users who install the CRAN binary do not compile the package so they don't need this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants