Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in grdimage/GDAL #3568

Closed
PaulWessel opened this issue Jun 30, 2020 · 107 comments
Closed

Crash in grdimage/GDAL #3568

PaulWessel opened this issue Jun 30, 2020 · 107 comments

Comments

@PaulWessel
Copy link
Member

This now fails (same as #3566 I assume). I will see if I can find out in debug:

gmt grdimage -JM6.5i -R0/360/-45/45 -Bag @earth_day_01d > t.ps

ERROR: Caught signal number 11 (Segmentation fault) at
0   libproj.15.dylib                    0x000000010a7078fe _ZL13pj_obj_createP9projCtx_tRKN7dropbox6oxygen2nnINSt3__110shared_ptrIN5osgeo4proj6common16IdentifiedObjectEEEEE + 560
1   ???                                 0x0000000000000020 0x0 + 32
Stack backtrace:
0   libgmt.6.dylib                      0x00000001070ccfc2 sig_handler + 498
1   libsystem_platform.dylib            0x00007fff6e5045fd _sigtramp + 29
2   ???                                 0x0000000000000000 0x0 + 0
3   libproj.15.dylib                    0x000000010a715069 proj_create_ellipsoidal_2D_cs + 161
4   libgdal.27.dylib                    0x0000000107992e6a _ZN19OGRSpatialReference9SetGeogCSEPKcS1_S1_ddS1_dS1_d + 322
5   libgdal.27.dylib                    0x0000000107b7a795 GTIFGetOGISDefnAsOSR + 3507
6   libgdal.27.dylib                    0x0000000107b32a14 _ZN12GTiffDataset17LookForProjectionEv + 206
7   libgdal.27.dylib                    0x0000000107b4bbc5 _ZNK12GTiffDataset13GetSpatialRefEv + 35
8   libgdal.27.dylib                    0x0000000107e71940 GDALGetProjectionRef + 20
9   libgmt.6.dylib                      0x0000000107135f8b populate_metadata + 347
10  libgmt.6.dylib                      0x000000010713875a gmt_gdalread + 3082
11  libgmt.6.dylib                      0x0000000107140201 gmt_gdal_read_grd_info + 337
12  libgmt.6.dylib                      0x000000010715f7dc gmtlib_read_grd_info + 204
13  libgmt.6.dylib                      0x0000000107461715 GMT_grdimage + 3349
14  libgmt.6.dylib                      0x00000001071179b1 GMT_Call_Module + 1073
15  gmt                                 0x00000001070bddec main + 1228
16  libdyld.dylib                       0x00007fff6e307cc9 start + 1
17  ???                                 0x0000000000000006 0x0 + 6
pwessel@macnut:~/GMTdev/gmt-dev/dbuild/test/grdimage/marbles-> 
@joa-quim
Copy link
Member

It works for me but prints that annoyingly false message

gmt grdimage -JM6.5i -R0/360/-45/45 -Bag @earth_day_01d > t.ps
grdimage [WARNING]: The image memory layout (TRP ) is of a wrong type. It should be BRPa.

The problem has be that what we think is T is in fact B (or the other way around)

@PaulWessel
Copy link
Member Author

It dies in populate_metadata now:

Untitled

GDAL 3.1.1, released 2020/06/22

@joa-quim
Copy link
Member

I built my GDAL yesterday from master, so not older than that.

@PaulWessel
Copy link
Member Author

Hm, yet it is always suspicious when we get such a crash just after an update...

@PaulWessel
Copy link
Member Author

These are the same crashes @seisman mentioned in the macos CI, so cannot just be my install.

@PaulWessel
Copy link
Member Author

Down porting and that avoids the crashes:

GDAL 3.1.0, released 2020/05/03

So I guess I am using that for building the bundle.

@seisman
Copy link
Member

seisman commented Jun 30, 2020

We still need to fix it. Homebrew is providing GDAL 3.1.1.

@PaulWessel
Copy link
Member Author

But is it ours to fix? As far as I can tell it dies inside GDAL and I don't think we have made any changes to that section of gdal_read very recently? And it should crash with GDAL 3.1.0 if we had a bug, no? perhaps like netCDF, they have decided to not check for someting that could have been NULL in the past but now must be set?

@seisman
Copy link
Member

seisman commented Jun 30, 2020

If we can't fix it, then the only thing we can do is reporting the bug to GDAL and hopefully, they can have a quick bugfix release.

@PaulWessel
Copy link
Member Author

I get the
grdimage [WARNING]: The image memory layout (TRP ) is of a wrong type. It should be BRPa.

too but the plot is correct.

@joa-quim
Copy link
Member

joa-quim commented Jul 1, 2020

This is a tough one to report. It doesn't occur all the times (for example readwrite_withgdal is not having issues) and works well on Windows.

@PaulWessel
Copy link
Member Author

Since you biuld from source, can you step into that call and see where GDALGetProjectionRef goes and possibly see if any of the members we pass in might be NULL or something? I am not sure what we can do about this since it seems GDAL changed something between 3.1.0 and 3.1.1 and that effects us. @seisman is there a way to tell what their changes are as a way to look for clues?

@seisman
Copy link
Member

seisman commented Jul 1, 2020

Here are the changes between v3.1.0 and v3.1.1: OSGeo/gdal@v3.1.0...v3.1.1

@PaulWessel
Copy link
Member Author

Hopefully @joa-quim can see what may be going on there.

@joa-quim
Copy link
Member

joa-quim commented Jul 1, 2020

Impossible to see anything. But the crash occurs immediately after reading the file

hDataset = gdal_open (GMT, gdal_filename);

so no time for us to screw the hDataset . Are the failing tests all using the same image format? Does it still crash with anther image format?

I don't see any other option but to build GDAL and find the commit where it breaks.

@PaulWessel
Copy link
Member Author

Yes. Maybe it is possible for you to ask how if something changed since we are just passing that pointer back in, no?

@PaulWessel
Copy link
Member Author

I mean, you've had direct contact with that head guy before and he seems pretty responsive.

@PaulWessel
Copy link
Member Author

You also need to go to bed to be ready for zoom tomorrow morning (your time).

@joa-quim
Copy link
Member

joa-quim commented Jul 1, 2020

We need a more specific question than just "what changed". Specially since it works on Windows. For example I see a tiff change in this commit.

Does it crash with all geotiffs? Is your GDAL using the internal tiff lib or links to an external?

@joa-quim
Copy link
Member

joa-quim commented Jul 1, 2020

Another tiff commit

@PaulWessel
Copy link
Member Author

PaulWessel commented Jul 1, 2020

I know, but you could just tell him that this sequence:

	hDataset = gdal_open (GMT, gdal_filename);
	GDALGetProjectionRef(hDataset);

Used to work fine, and does in 3.1.0 on macOS but 3.1.1 it SEGV and give him the traceback above, and tell him these are geotiffs. Perhaps he will know where to look or have suggestions for how we can better debug this case.

@PaulWessel
Copy link
Member Author

List of changes in more readable terms, see list under Geotiff: https://fossies.org/linux/gdal/NEWS

@joa-quim
Copy link
Member

joa-quim commented Jul 1, 2020

I think the crash is coming from PROJ

@PaulWessel
Copy link
Member Author

Yes, I guess proj_create_ellipsoidal_2D_cs is touching a NULL somewhere.

@PaulWessel
Copy link
Member Author

I am not sure where we are on this issue. Here are some thoughts; they are founded on the belief (?) that this surely cannot be a bug in GMT since it runs fine with 3.1.0 but suddenly crashes violently with 3.1.1.

  1. We release 6.1.0. @joa-quim builds the Win version anyway and there is no problems there. I will build the bundle using GDAL 3.1.0. And apparently the problem does no affect Linux, right @seisman ? So I don't see a problem except for those who build from source on macOS who will need to be informed about rolling back to 3.1.0 for now.

  2. We build GDAL on macos from source so taht we can run grdimage in debug and discover why it crashes. As @joa-quim pointed out, looks like it happens in a call to a PROJ function from GDAL. So I am concerned about spending hours building GDAL only to find out I would also need to build PROJ to learn why.

  3. I wonder if we can write a tiny 3 liner program taht simply does

hDataset = gdal_open (GMT,"earth_day_01d_p.tif");
GDALGetProjectionRef(hDataset);


and see if it crashes on macOS.  I think that is a simple test.  I can do that after my 10 am zoom meeting.

@joa-quim
Copy link
Member

joa-quim commented Jul 1, 2020

I spent the hole morning/afternoon trying to install gdal in my CentOS7 VM only to managed to fscrew everything. A real nightmare. Old gcc non updatable, headers not found, ended up doing an upgrade to CentOS8 and that was where all vaporized (perhaps those VMs can't be upgraded).
Next tryied my old Mac and it spent some 4 or 5 hours building llvm. At the end I finally managed to build GMT but more shits. It now takes ~1 min to run just gmt. Possibly a curl shit but at the end ... the same crash.

It would be nice to confirm that the problem is restricted to linux. I have my doubts.

To build GDAL you need PROJ. If the tiny program works that is ideal because without something more solid to work on, Even will not do anything else.

Finally, *nix is an horror. We should REALLY recommend the course students to use Windows if possible.

@PaulWessel
Copy link
Member Author

Tried this:

#include <gdal.h>

int main () {
	GDALDatasetH h = GDALOpen ("/Users/pwessel/.gmt/server/earth/earth_day/earth_day_01d_p.tif", GA_ReadOnly);
	GDALGetProjectionRef(h);
}

gcc shitter.c -I/opt/local/include -L/opt/local/lib -lgdal -o shitter
pwessel@macnut:~-> shitter
ERROR 4: No driver registered.
ERROR 10: Pointer 'hDS' is NULL in 'GDALGetProjectionRef'.

No crash, SEGV though. Are there things that need to be set before this call since I am getting these errors?

@joa-quim
Copy link
Member

joa-quim commented Jul 1, 2020

Ah, yes. You need to at least call
GDALAllRegister();
if more, see gdalread

@PaulWessel
Copy link
Member Author

int main () {
	GDALDatasetH h;
	GDALAllRegister();
	h = GDALOpen ("/Users/pwessel/.gmt/server/earth/earth_day/earth_day_01d_p.tif", GA_ReadOnly);
	GDALGetProjectionRef(h);
}

No crash. Hm, the easiest explanation is that some memory is overwritten by us and a delayed reaction happens inside that call. ANy other suggestions for how to do this?

@PaulWessel
Copy link
Member Author

@PaulWessel
Copy link
Member Author

Hoping this is doable via brew, @seisman

@seisman
Copy link
Member

seisman commented Jul 4, 2020

Building the GDAL source codes via brew fails. So I can't test it.

@PaulWessel
Copy link
Member Author

OK, sounds like it is a 90% likelihood that the bug was fixed, so we will move forward.

@joa-quim
Copy link
Member

joa-quim commented Jul 4, 2020

Ghrrr building GDAL is only
configure && make
Even asked for confirmation.

@PaulWessel
Copy link
Member Author

If it was only that then I would not run configure and get

checking for PROJ >= 6 library... checking for proj_create_from_wkt in -lproj... no
checking for internal_proj_create_from_wkt in -lproj... no
checking for internal_proj_create_from_wkt in -linternalproj... no
configure: error: PROJ 6 symbols not found

But port info proj6 says
proj6 @6.3.2 (gis)

@PaulWessel
Copy link
Member Author

But I have proj 5 also since that is how things are.

@PaulWessel
Copy link
Member Author

OK, got past that with --with-proj=/opt/local/lib/proj6. Running make.

@PaulWessel
Copy link
Member Author

So far so good, make will run for some hours it seems.

@joa-quim
Copy link
Member

joa-quim commented Jul 4, 2020

Not hours, I build it in 5-10 min.

@PaulWessel
Copy link
Member Author

PaulWessel commented Jul 4, 2020 via email

@PaulWessel
Copy link
Member Author

Finished building GDAL. Looking inside it:

pwessel@macnut:~/gdal/gdal-> otool -L ./.libs/libgdal.dylib
./.libs/libgdal.dylib:
	/usr/local/lib/libgdal.27.dylib (compatibility version 28.0.0, current version 28.0.0)
	/opt/local/lib/libgeos_c.1.dylib (compatibility version 15.0.0, current version 15.3.0)
	/opt/local/lib/libwebp.7.dylib (compatibility version 9.0.0, current version 9.0.0)
	/opt/local/lib/libopenjp2.7.dylib (compatibility version 7.0.0, current version 2.3.1)
	/opt/local/lib/libjasper.4.dylib (compatibility version 4.0.0, current version 5.0.0)
	/opt/local/lib/libnetcdf.18.dylib (compatibility version 18.0.0, current version 18.0.0)
	/opt/local/lib/proj6/lib/libproj.15.dylib (compatibility version 19.0.0, current version 19.2.0)
	/opt/local/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.100.1)
	/opt/local/lib/libcurl.4.dylib (compatibility version 11.0.0, current version 11.0.0)
	/opt/local/lib/libiconv.2.dylib (compatibility version 9.0.0, current version 9.1.0)
	/opt/local/lib/libxml2.2.dylib (compatibility version 12.0.0, current version 12.10.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 902.1.0)

I see it lists /usr/local/lib/libgdal. That is a bit odd, no?

@joa-quim
Copy link
Member

joa-quim commented Jul 5, 2020

We know MacOS is dependency-insane. Maybe /opt/local/lib/proj6/lib/libproj.15.dylib has some memory of the other libgdal?
But it crash or not?

@PaulWessel
Copy link
Member Author

Sorry, now to figure out how to use this libgda instead of macport. Was busy with testing installers.

@PaulWessel
Copy link
Member Author

Do @joa-quim or @seisman know how to make Cmake find a gdal lib in an unusual place? Searching the cache talked about GDAL_CONFIG so I temporarily placed the new one in the retular path and renamed to old, but cmake still found the macports gdal... need the equivalent of --with-gdal=/Users/pwessel/gdal/gdal/.libs/libgdal.27.dylib

I suspect I must disable the sytem gdal and rebuild mine though.

@joa-quim
Copy link
Member

joa-quim commented Jul 5, 2020

I do all dependencies via ConfigUser.cmke. For GDAL I have (with a few of variables)

	# set location of gdal (can be root directory, path to header file or path to gdal-config):
	set(GDAL_DIR "${DEPS_ROOT}/gdal_GIT/${COMP_SUBDIR}/${VC}_${BITAGE}")
	set(GDAL_LIBRARY "${DEPS_ROOT}/gdal_GIT/${COMP_SUBDIR}/${VC}_${BITAGE}/lib/gdal_i.lib")

@PaulWessel
Copy link
Member Author

OK, I will try that. Deactiviting the macport gdal lead cmake to find
/Library/Frameworks/gdal.framework

WTF, 2016 version, probably QGIS or somthing. So giving a path is better.
BTW, there is no
/usr/local/lib/libgdal.27.dylib (compatibility version 28.0.0, current version 28.0.0)
on my system, so vapor-ware?

@PaulWessel
Copy link
Member Author

Os the GDAL_Dir the path to where the dir is or the one above so it has both include and lib?

@joa-quim
Copy link
Member

joa-quim commented Jul 5, 2020

For me GDAL_DIR is one that contains both include and lib Not sure that it's really needed and if it was not only to trick cmake thinking it had the output of findGDAL. Or it may be needed after all to find the include dir.

@PaulWessel
Copy link
Member Author

I think it has been fixed. The plot works, and after I installed my gdal build in the default /usr/local place it is picked up:

 otool -L /Users/pwessel/GMTdev/gmt-dev/dbuild/gmt6/lib/libgmt.dylib
/Users/pwessel/GMTdev/gmt-dev/dbuild/gmt6/lib/libgmt.dylib:
	/Users/pwessel/GMTdev/gmt-dev/dbuild/gmt6/lib/libgmt.6.dylib (compatibility version 6.0.0, current version 6.1.0)
	/opt/local/lib/libnetcdf.18.dylib (compatibility version 18.0.0, current version 18.0.0)
	/opt/local/lib/libcurl.4.dylib (compatibility version 11.0.0, current version 11.0.0)
	/usr/local/lib/libgdal.27.dylib (compatibility version 28.0.0, current version 28.0.0)
	/opt/local/lib/libgeos_c.1.dylib (compatibility version 15.0.0, current version 15.3.0)

So you can tell Even that things look good.

@liamtoney
Copy link
Member

I'm getting the same error reported at the beginning of this thread with e.g. the call

gmt grdinfo @earth_day_01m

I installed 6.1.0 via Homebrew. System info:

  • macOS Catalina 10.15.5
  • GMT 6.1.0
  • GDAL 3.1.1

What I gather from above is that I need to downgrade GDAL? But the GMT Homebrew package installs 3.1.1.

@liamtoney
Copy link
Member

What I gather from above is that I need to downgrade GDAL? But the GMT Homebrew package installs 3.1.1.

Conda working though, GDAL 3.0.4

@seisman
Copy link
Member

seisman commented Jul 6, 2020

I believe GMT can do nothing here. Options are:

  1. Downgrade GDAL to 3.1.0
  2. Wait for the release of GDAL 3.1.2, which is scheduled to be released on Aug. 24, 2020 (https://github.com/OSGeo/gdal/milestone/21)
  3. Apply the upstream patch (OSGeo/gdal@d123b99) to the homebrew recipe

@PaulWessel
Copy link
Member Author

Wait, I think Evan at GDAL said they are fixing this with 3.1.2 on Tuesday. it is breaking other things than GMT so it is serious for them.

@seisman
Copy link
Member

seisman commented Jul 6, 2020

That's a good news. We only need to wait for a few more days to have a working GDAL.

@Dave-Allured
Copy link

GDAL 3.1.2 is released. Please update to this version.
https://lists.osgeo.org/pipermail/gdal-dev/2020-July/052395.html

@seisman
Copy link
Member

seisman commented Jul 8, 2020

FYI, homebrew has updated to GDAL 3.1.2

@liamtoney
Copy link
Member

FYI, homebrew has updated to GDAL 3.1.2

Excellent, just tested and things work as expected. Also seems like brew install gdal sets up the Python bindings correctly now(?), which is cool.

@Dave-Allured
Copy link

Now Macports is also updated to GDAL 3.1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants