Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flexlink produces an invalid dll when building lablgtk-2.18.3 on mingw64 #6

Open
eternalNight opened this issue Apr 4, 2015 · 14 comments

Comments

@eternalNight
Copy link

Hi all,

I'm recently building lablgtk (a GTK2 wrapper for OCaml) using mingw64 toolchains provided by msys2. The package uses ocamlmklib (and thus flexlink) to create a dll library called dlllablgtk2.dll. Here are the version of the tools in my environment:

flexdll    0.34 (from http://alain.frisch.fr/flexdll.html; built from source)
ocaml    4.02.1 (built from source)

Flexlink generates the library without error, but the library is considered invalid by LoadLibraryEx:

Error: Error on dynamically loaded library: .\dlllablgtk2.dll: %1 is not a valid win32 application

The following toy program gives the same result.

$ cat testdll.c
#include <flexdll.h>
#include <stdio.h>
#include <windows.h>

int main(int argc, char *argv[]) {
    void *handle;
    printf("Try open: %s\n", argv[1]);
    handle = flexdll_dlopen(argv[1], FLEXDLL_RTLD_GLOBAL);
    printf("Handle: %p\n", handle);
    if (handle == NULL) {
            printf("Error code: %d\n", GetLastError());
            printf("Error message: %s\n", flexdll_dlerror());
    }
    return 0;
}

$ flexlink -chain mingw64 -exe -o testdll testdll.c
$ testdll.exe dlllablgtk2.dll
Try open: dlllablgtk2.dll
Handle: 0000000000000000
Error code: 193
Error message: %1 is not a valid win32 application

The library is created using 24 object files in addition to some system libraries. The command is:

flexlink -v -v -chain mingw64 -LD:/msys64/mingw64/x86_64-w64-mingw32/lib \
-o dlllablgtk2.dll -lpthread -LD:/msys64/mingw64/lib -lgtk-win32-2.0 \
-limm32 -lshell32 -lole32 -lpangocairo-1.0 -lpangoft2-1.0 -lpangowin32-1.0 -lgdi32 \
-lpango-1.0 -lm -latk-1.0 -lcairo -lpixman-1 -lfontconfig -lexpat -lfreetype -lexpat -lfreetype \
-lbz2 -lharfbuzz -lgdk_pixbuf-2.0 -lpng16 -lgio-2.0 -lz -lgmodule-2.0 -lgobject-2.0 -lffi \
-lglib-2.0 -lws2_32 -lole32 -lwinmm -lshlwapi -lintl \
ml_gobject.o ml_gpointer.o ml_gtk.o ml_gtkaction.o ml_gtkbin.o ml_gtkbroken.o ml_gtkbutton.o \
ml_gtkassistant.o ml_gtkedit.o ml_gtkfile.o ml_gtklist.o ml_gtkmenu.o ml_gtkmisc.o ml_gtkpack.o \
ml_gtkrange.o ml_gtkstock.o ml_gtktext.o ml_gtktree.o ml_gdkpixbuf.o ml_gdk.o ml_glib.o \
ml_pango.o ml_gvaluecaml.o wrappers.o

When I remove some of the objects (e.g. ml_gtktree.o), the generated library becomes valid.

$ testdll.exe dlllablgtk2.dll        # ml_gtktree.o removed from the command
Try open: dlllablgtk2.dll
Handle: 0000000000000000
Error code: 1114
Error message: Cannot resolve caml_failwith

It seems the issue is not raised by a single object. The library built without ml_gtktext.o (but with ml_gtktree.o) is also valid.

The binaries from https://github.com/shadinger/flexdll-win64 (version 0.26) does not suffer from this issue.

Here is the verbose log during linking.

** Use cygpath: true
** Search path:
D:/msys64/mingw64/lib
D:/msys64/mingw64/x86_64-w64-mingw32/lib
D:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.9.2
/mingw/lib
/mingw64/x86_64-w64-mingw32/lib
** Default libraries:
dllcrt2.o
-lmingw32
-lgcc
-lmoldname
-lmingwex
-lmsvcrt
-luser32
-lkernel32
-ladvapi32
-lshell32
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\dllcrt2.o
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libmingw32.a
** open: D:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.9.2\libgcc.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libmoldname.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libmingwex.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libmsvcrt.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libuser32.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libkernel32.a
** open: D:/msys64/mingw64/x86_64-w64-mingw32/lib\libadvapi32.a
+ x86_64-w64-mingw32-gcc -mconsole -shared -Wl,-eFlexDLLiniter  -L. -I"D:/msys64/mingw64/lib" -I"D:/msys64/mingw64/x86_64-w64-mingw32/lib" -L"D:/msys64/mingw64/lib" -L"D:/msys64/mingw64/x86_64-w64-mingw32/lib" -o "test.dll" "D:\msys64\tmp\dyndll3ef3ef.o" "D:\msys64\mingw64\bin\flexdll_initer_mingw64.o" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libpthread.dll.a" "D:/msys64/mingw64/lib\libgtk-win32-2.0.dll.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libimm32.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libshell32.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libole32.a" "D:/msys64/mingw64/lib\libpangocairo-1.0.dll.a" "D:/msys64/mingw64/lib\libpangoft2-1.0.dll.a" "D:/msys64/mingw64/lib\libpangowin32-1.0.dll.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libgdi32.a" "D:/msys64/mingw64/lib\libpango-1.0.dll.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libm.a" "D:/msys64/mingw64/lib\libatk-1.0.dll.a" "D:/msys64/mingw64/lib\libcairo.dll.a" "D:/msys64/mingw64/lib\libpixman-1.dll.a" "D:/msys64/mingw64/lib\libfontconfig.dll.a" "D:/msys64/mingw64/lib\libexpat.dll.a" "D:/msys64/mingw64/lib\libfreetype.dll.a" "D:/msys64/mingw64/lib\libbz2.dll.a" "D:/msys64/mingw64/lib\libharfbuzz.dll.a" "D:/msys64/mingw64/lib\libgdk_pixbuf-2.0.dll.a" "D:/msys64/mingw64/lib\libpng16.dll.a" "D:/msys64/mingw64/lib\libgio-2.0.dll.a" "D:/msys64/mingw64/lib\libz.dll.a" "D:/msys64/mingw64/lib\libgmodule-2.0.dll.a" "D:/msys64/mingw64/lib\libgobject-2.0.dll.a" "D:/msys64/mingw64/lib\libffi.dll.a" "D:/msys64/mingw64/lib\libglib-2.0.dll.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libws2_32.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libwinmm.a" "D:/msys64/mingw64/x86_64-w64-mingw32/lib\libshlwapi.a" "D:/msys64/mingw64/lib\libintl.dll.a" "D:\msys64\tmp\dyndll00be4c.o" "D:\msys64\tmp\dyndlle902c0.o" "D:\msys64\tmp\dyndll54d32d.o" "D:\msys64\tmp\dyndll2e0163.o" "ml_gtkbin.o" "D:\msys64\tmp\dyndll7ac0f6.o" "D:\msys64\tmp\dyndll3f46a1.o" "D:\msys64\tmp\dyndll6e7d00.o" "D:\msys64\tmp\dyndll709dae.o" "D:\msys64\tmp\dyndll4b5dee.o" "D:\msys64\tmp\dyndll027612.o" "D:\msys64\tmp\dyndll478b19.o" "D:\msys64\tmp\dyndll0fdffc.o" "D:\msys64\tmp\dyndll533488.o" "D:\msys64\tmp\dyndllc5412c.o" "D:\msys64\tmp\dyndllb81a8b.o" "D:\msys64\tmp\dyndll5f1731.o" "D:\msys64\tmp\dyndll4bc469.o" "D:\msys64\tmp\dyndlleed2db.o" "D:\msys64\tmp\dyndlla2929b.o" "D:\msys64\tmp\dyndll56c73c.o" "D:\msys64\tmp\dyndllde988f.o" "ml_gvaluecaml.o" "D:\msys64\tmp\dyndll049034.o" "D:\msys64\tmp\flexlink250fe6.def"
(call with bash: D:\msys64\tmp\longcmd233aa5)
@alainfrisch
Copy link
Collaborator

Is it easy for you to test with OCaml trunk? The win64 backend has been changed to avoid problems when the DLL is loaded too far away in memory from the main process, and this might fix such issues.

@eternalNight
Copy link
Author

I have tried the latest ocaml and camlp4 from the github mirror. The problem remains.

@eternalNight
Copy link
Author

The issue seems to be related to the cygwin64 COMDATA hacks which are introduced in commit 37e6b5a. The library works if the snippets are commented out.

@alainfrisch
Copy link
Collaborator

Perhaps the current hacks for Cygwin64 should be restricted to cygwin64 indeed. Can you check which parts in the commit you refer to must be disabled (there are two fragments related to COMDATA sections -- do we need to disable both)?

@eternalNight
Copy link
Author

Disabling the following fragment in add_reloc_table works in my case:

    if sec.sec_opts &&& 0x1000l <> 0l && has_prefix ".rdata$.refptr." sec.sec_name then
      begin
        (* under Cygwin64, gcc introduces mergable (link once) COMDAT sections to store
           indirection pointers to external darta symbols.  Since we don't deal with such section
           properly, we turn them into regular data section, thus loosing sharing (but we don't care). *)
        sec.sec_opts <- 0xc0500040l;
        sec.sec_name <- Printf.sprintf ".flexrefptrsection%i" (Oo.id (object end));
      end;

This should be the first fragment mentioning COMDATA in the patch.

@alainfrisch
Copy link
Collaborator

As reported by Andreas Hauptmann on the caml-list:

It either won't solve the issue or it will introduce new ones (I
don't remember details, but I've tried it).
As a temporary workaround, you can try to strip your invalid dll files
(e.g. 'x86_64-w64-mingw32-strip --strip-unneeded dlllablgtk2.dll') or
switch to an older version of the gcc-toolchain (4.8 or 4.7).

@yselkowitz
Copy link

I'm having what appears to be a related issue trying to build lablgtk 2.18.5 with flexdll 0.35 and ocaml 4.02.3 on cygwin64:

ocamlmktop -I +lablGL -thread -o lablgtktop unix.cma threads.cma lablgl.cma
-I . lablgtk.cma lablgtkgl.cma lablglade.cma lablgnomecanvas.cma lablgnomeui.cma lablrsvg.cma lablgtkspell.cma lablgtksourceview2.cma gtkThread.cmo
File "none", line 1:
Error: Error on dynamically loaded library: ./dlllablgtk2.so: Exec format error

@yselkowitz
Copy link

FWIW, stripping does help on Cygwin; I was able to get a successful and functional build by adding -ldopt -Wl,-s to the ocamlmklib -o lablgtk command.

@MSoegtropIMC
Copy link
Contributor

Would it be possible to eventually fix this? This issue is hanging around for more than 2 years now. I just tried it with the source and binary delivieres version 0.35 as well as the current git master. This is a major source of build unreliabilities in the Windows builds of INRIA Coq. I currently use an explicit call to strip which magically fails as well if completely unrelated things in the build script are changed (like to which file messages are redirected). Why this is even procmon couldn't help me to understand. I will now instead try the method suggested above instead of the explicit call to strip.

But I would really appreciate a fix for this problem. If there is anything I can do to help, please let me know. E.g. I can send a script which sets up a fresh cygwin and reproduces the error with a single call to a batch file.

Best regards,

Michael

@alainfrisch
Copy link
Collaborator

I'm afraid I don't understand the problem enough to fix it, and don't have the time and courage to investigate. If you could create a simple reproduction case that don't involve a bunch of external libraries, this would definitely make the problem easier to investigate. But the conclusion could also be that there is no easy fix.

I think my recommendation would be to avoid using flexlink with code not generated by OCaml compilers. For your use case, is it an option to link all native libraries statically in the main program?

@MSoegtropIMC
Copy link
Contributor

Dear Alain,

you are right, maybe the best option is to patch the lablgtk build scripts such that they create just a static library and use this. I think for the whole lablgtk library there is no need to link it dynamically, since the GUI tool always needs it and in Coq there is only one GUI tool, so there wouldn't be DLL sharing either.

Also it is an interesting hint that the issues might come from the C code in lablgtk.

I will let you know how it goes along this path.

Best regards,

Michael

@yselkowitz
Copy link

FYI this is the patch I used to work around this:

https://github.com/cygwinports/ocaml-lablgtk2/blob/master/2.18.5-flexlink.patch

@dra27
Copy link
Member

dra27 commented May 6, 2018

I ran into this too, but doing my usual fumbling in the dark noticed that one of the fixes above involved reducing the number of .o files which in turn reduces the number of sections. That got me thinking that the name change removes the rdata$ prefix which is an instruction to the linker to merge the sections. So I tried this:

diff --git a/reloc.ml b/reloc.ml
index 358f6b9..823021d 100644
--- a/reloc.ml
+++ b/reloc.ml
@@ -434,7 +434,7 @@ let add_reloc_table obj obj_name p =
            indirection pointers to external darta symbols.  Since we don't deal with such section
            properly, we turn them into regular data section, thus loosing sharing (but we don't care). *)
         sec.sec_opts <- 0xc0500040l;
-        sec.sec_name <- Printf.sprintf ".flexrefptrsection%i" (Oo.id (object end));
+        sec.sec_name <- Printf.sprintf ".flex$.flexrefptrsection%i" (Oo.id (object end));
       end;

     let min = ref Int32.max_int and max = ref Int32.min_int in

which appears to be enough to build a working lablgtk2 without having to strip the DLLs (one slight thing which concerned me with the stripping is that the resulting DLLs also crash Microsoft's objdump, though that may be objdump's fault, and it was on the Windows 7 SDK).

I don't really understand @alainfrisch's comment about not dealing with the section properly - what prompted you to write that comment originally? Is this all related to #52 and should we therefore be deleting these sections for symbols which flexdll is going to relocate and simply leaving it alone for any other symbols, which presumably the linker is going to deal with. It appears on a vague inspection that the linker will eliminate these in "normal" linking, so I'm guessing they just get folded into the normal relocation process?

Again, fumbling around trying to diagnose the original problem, I can't find a reference to the idea of a problem about having too many sections in the PE header (there's a reference to a limit of 96 for Windows XP but it's increased to 65536 since Vista, so that doesn't seem a likely candidate). Perhaps it's that these sections appear before one of the others and some offset becomes too big or larger than expected. Either way, stripping removes those sections and, on the basis that merging them also seems to fix the problem then it would appear to be the number of them which is the underlying issue.

But it still begs the question (from me at least) of what precisely they're for and what we should really be being done with them...

@alainfrisch
Copy link
Collaborator

I don't really understand @alainfrisch's comment about not dealing with the section properly - what prompted you to write that comment originally?

I don't remember exactly, but keeping the COMDAT section resulted in some problems with cygwin64. (Perhaps because the section could be merged by the linker, and this breaks some assumptions made by the flexdll runtime.) I don't think this was related to #52.

bryphe added a commit to bryphe/flexdll that referenced this issue Nov 14, 2018
* Set up CI with Azure Pipelines

* Switch to azure pipelines

* bump

* Use shorter path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants