Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dmap / testmap broken under linux (Ubuntu 19.04) #436

Closed
IceReaper opened this issue Oct 15, 2019 · 53 comments
Closed

dmap / testmap broken under linux (Ubuntu 19.04) #436

IceReaper opened this issue Oct 15, 2019 · 53 comments

Comments

@IceReaper
Copy link

When compiling a map under linux, some brush triangles are missing causing the map to leak. Compiling the same map under windows works flawless.

Ill try to provide a doom3 compatible testmap showcasing the problem later, as i am using a custom game using RBDOOM3-BFG as its engine.

@ericwomer
Copy link

ericwomer commented Jan 18, 2020

How are you coming along on that custom test map? It doesn't need to be fancy, just something to demonstrate the issue at hand. Also not to be discouraging, but is there a reason you're not using an engine like UE4, Unity, or Godot? They would be more adept at giving you what you would need and want in an engine for a custom game.

@BielBdeLuna
Copy link

I have the same errors, dmap skips some random triangles, and I've encountered some portaling errors too, I've made a video that shows it

@DanielGibson
Copy link

DanielGibson commented May 26, 2021

there's a similar bugreport for dhewm3: dhewm/dhewm3#147
I can't reproduce the problem though (with the test map linked in the first post), maybe it has been fixed accidentally.

As a lucky guess, try removing all instances of -ffast-math from CMakeLists.txt
If that doesn't help, replace -O3 with -O2. If that still doesn't help it's probably not caused by overzealous compileroptimizations and I have no idea what fixed it.

It would probably help if you shared your testmap and not only the video ;)

@BielBdeLuna
Copy link

BielBdeLuna commented May 28, 2021

Unfortunately none of those changes to CMakeLists.txt solved the issue. what do -O3 and -O2 do? are those levels of optimization?

@DanielGibson
Copy link

DanielGibson commented May 28, 2021

are those levels of optimization?

Yes, -O3 is a higher level of optimization and tends to do more "dangerous" optimizations that can break code that is not 100% standard-conformant (spoiler: no code is).

Could you share your testmap? Otherwise no one can debug this.. (and I'd like to test if dhewm3 is affected)

@BielBdeLuna
Copy link

Could you share your testmap?

https://github.com/BielBdeLuna/OTE_testmaps/tree/vilcabamba

@DanielGibson
Copy link

I can't really reproduce your problems there, but I noticed

  1. I need to use noclip a lot even if there were no obvious barriers (I haven't looked at the map in an editor at all, maybe there's clip brushes or something?)
  2. It doesn't work on dhewm3 at all, unfortunately (everything is black, flashlight doesn't help), even after converting the pngs to tga
  3. there were lots of dmap warnings like WARNING: brush has multiple area portal sides at 1088 1504 320
    No idea what those warnings mean exactly and if they're actually bad, but as you mentioned portal bugs it might be relevant

@BielBdeLuna
Copy link

BielBdeLuna commented May 28, 2021

1. I need to use noclip a lot even if there were no obvious barriers (I haven't looked at the map in an editor at all, maybe there's clip brushes or something?)
3. there were lots of dmap warnings like `WARNING: brush has multiple area portal sides at 1088 1504 320`
   No idea what those warnings mean exactly and if they're actually bad, but as you mentioned portal bugs it might be relevant

it seems that dmap fails at detecting the visportals, the visportals seem not be registered as such, and so the material is invisible but the movement through them seems to be impeded. edit the visportals work, those errors seem to be due the faces of the visportal not creating the visportal making such an error

2. It doesn't work on dhewm3 at all, unfortunately (everything is black, flashlight doesn't help), even after converting the pngs to tga

wait, didn't I put the needed TGA's? edit I checked all the necessary tgas are there

it seems that everything is failing, if you "reloadDecls" what sort of errors do you gets from the materials? did you put the folder "vilcabamba" as a mod folder? or you put "OTE_testmaps/vilcabamba" as a mod folder?

edit I downloaded the map in another computer and with RBDoom3BFG it loads correctly, the level compiles (with those errors ) but it loads correctly ( with all the textures working ) but still unfortunately with the errors displayed in the video.

@DanielGibson
Copy link

wait, didn't I put the needed TGA's? edit I checked all the necessary tgas are there

As everything was black (in dhewm3) I tried adding all the other OTE assets (which used PNGs which I converted) but that didn't help either

But in latest RBDoom3BFG git (master branch) I didn't get those glitches seen in your video.
No idea what's going on here, maybe uninitialized variables that for some reason have a less broken random value than on your system? I don't know.

@BielBdeLuna
Copy link

BielBdeLuna commented May 28, 2021

all the other OTE assets

ah, ok, but the map wasn't meant to be used with all the assets from OTE just as a single map test

I pushed a map without visportals, it displays the same triangular errors in the same places in my end when compiled, minus the visportal related compiling error warnings lines.

I wonder are you using Intel? both of my computers are Ryzen based, I wonder if there is something against AMD optimizations?
I can't test this on a Intel to test this possibility.

@DanielGibson
Copy link

DanielGibson commented May 28, 2021

I'm using a Ryzen 2700X (with a Geforce GTX 970), so AMD vs Intel can be ruled out.

I got your map to work with dhewm3 now, by adjusting the filenames (png => tga) in the materials as well.
I still can't reproduce the bug though (neither in dhewm3 nor in RBDoom3BFG) - can you test if it also happens with dhewm3 on your machine?

@BielBdeLuna
Copy link

BielBdeLuna commented May 28, 2021

ah I found the culprit for the black textures, the gray texture ( that the level used still makes a reference to a PNG texture ) I don't know why it worked in RBDoom3BFG as it doesn't use PNG for textures edit that's not true PNG IS supported. edit , in dhewm3 compiles without the errors in those newly created triangles in the brushes.

I pushed the changes now it should work

@DanielGibson
Copy link

DanielGibson commented May 28, 2021

So the same map works (meaning: looks correct) if compiled with dhewm3 but not if compiled with RBDoom3BFG?

I don't know why it worked in RBDoom3BFG as it doesn't use PNG for textures

RBDoom3BFG does support loading PNGs. They probably get converted to .bimage before being actually used, but PNG is supported as far as I can tell.

@BielBdeLuna
Copy link

BielBdeLuna commented May 29, 2021

yeah, something lays rotten within the source of dmap in RBDoom3BFG unlike Dhewm3.
And it has to be in the code that generates the optimized triangles, because I don't see triangles that follow the full brushes that I used for the creation of the map, but that meet where other brushes collide, so those are optimized triangles failing

RBDoom3BFG does support loading PNGs.

ah, it's true, now I see the readme... sorry :)

@BielBdeLuna
Copy link

BielBdeLuna commented May 29, 2021

oh, also OTE compiles that map fine, it's RBDoom3BFG that has some kind of problem.

I'm using the Master branch (last commit), and OpenGL of RBDoom3BFG. ...compiling Vulkan to see if there is any luck, but I don't think it might make any change to the issue.

@DanielGibson
Copy link

Damn, just realized that I can reproduce the bug after all, in both RBDoom3BFG and dhewm3 - if I enable ONATIVE in CMake.

So it seems like it's related to optimization after all

@BielBdeLuna
Copy link

certainly ONATIVE off compiles the map correctly! :) so following cmake:

	if(NOT CMAKE_CROSSCOMPILING AND ONATIVE)
		if(CMAKE_SYSTEM_PROCESSOR MATCHES "((powerpc|ppc)64le)|(mips64)")
			add_definitions(-mcpu=native)
		else()
			add_definitions(-march=native)
		endif()
	endif()

the culprit is:

add_definitions(-march=native)

isn't it?

@BielBdeLuna
Copy link

I wonder, can we command the compiler not to optimize some part of the code? some source file in specific?

@BielBdeLuna
Copy link

BielBdeLuna commented May 29, 2021

first I'll try to see if ONATIVE ON but -O0 clears the error, if so then I will be trying to add:

#pragma GCC push_options
#pragma GCC optimize("O0")

to the beginning of every file in dmap and:

#pragma GCC pop_options

In their end.

*** edit *** Effectively -O0 solves the issue for all the compiled code in the engine, now I'll try to add the pragma command to the dmap code files in order to optimize all with the normal settings minus the dmap code

@DanielGibson
Copy link

it's something in dmap/optimize.cpp, more specifically in idlib/math/Vector.h included there

haven't narrowed it down further yet

@BielBdeLuna
Copy link

in order to clarify and to narrow out my test benches:

  • a laptop computer with zen+ 3700U
  • a desktop computer with zen2 3700X

and using GCC 10.3.0 in Ubuntu 21.4

@DanielGibson
Copy link

DanielGibson commented May 30, 2021

I'm debuggin this in dhewm3. I found out the following:

It's somewhere in bool PointInTri( const idVec3 &p, const mapTri_t *tri, optIsland_t *island ) (from tools/compilers/dmap/optimize.cpp), in combination with #include idlib/math/Vector.h.

Click to show source

I moved that function into a separate source file (I called it optimize2.cpp) which looks like:

#pragma GCC push_options
#pragma GCC optimize("O0") // commenting this line out breaks the build

// moving this include above #pragma GCC optimize("O0") breaks the build
// (usually it's included implicitly by dmap.h, but including it first allows to use different optimization settings on it)
#include "idlib/math/Vector.h"

// restore normal optimization for the rest of the file => still works as long as Vector.h isn't optimized
// (while only optimizing Vector.h and not PointInTri() breaks it)
#pragma GCC pop_options

#include "tools/compilers/dmap/dmap.h"

bool PointInTri( const idVec3 &p, const mapTri_t *tri, optIsland_t *island ) {
	idVec3	d1, d2, normal;

	// the normal[2] == 0 case is not uncommon when a square is triangulated in
	// the opposite manner to the original

	d1 = tri->optVert[0]->pv - p;
	d2 = tri->optVert[1]->pv - p;
	normal = d1.Cross( d2 );
	if ( normal[2] < 0 ) {
		return false;
	}

	d1 = tri->optVert[1]->pv - p;
	d2 = tri->optVert[2]->pv - p;
	normal = d1.Cross( d2 );
	if ( normal[2] < 0 ) {
		return false;
	}

	d1 = tri->optVert[2]->pv - p;
	d2 = tri->optVert[0]->pv - p;
	normal = d1.Cross( d2 );
	if ( normal[2] < 0 ) {
		return false;
	}

	return true;
}

in the original optimize.cpp I just comment out that function and added a prototype for it, like extern bool PointInTri( const idVec3 &p, const mapTri_t *tri, optIsland_t *island ); so it can be found by the functions using it.

So it appears like something in Vector.h, quite likely idVec3's cross product (just a hunch, haven't checked yet!), gets optimized in a way that breaks the PointInTri() function which causes some faces to be left out

@DanielGibson
Copy link

DanielGibson commented May 30, 2021

Ok, turns out it's indeed the cross product, can be easily verified without even a second source file:
Right before static bool PointInTri( const idVec3 &p, const mapTri_t *tri, optIsland_t *island ) { ... add:

#pragma GCC push_options
#pragma GCC optimize("O0") // commenting this out triggers the bug
// copy of idVec3::Cross() as standalone function
static idVec3 myCross( const idVec3& v1, const idVec3 &v2 ) {
	return idVec3( v1.y * v2.z - v1.z * v2.y, v1.z * v2.x - v1.x * v2.z, v1.x * v2.y - v1.y * v2.x );
}
#pragma GCC pop_options // restore default optimizations

then in PointInTri() replace all three instances of
normal = d1.Cross( d2 ); with normal = myCross(d1, d2);

After recompiling and running dmap on the testmap, it should look fine.
If you comment out the #pragma GCC optimize("O0") line and run dmap again, it's broken again.

I guess I'll have to look at disassembly next to figure out what's really going wrong.. not that I'm any good at reading assembler

UPDATE:
I stripped this further down: The whole cross product isn't even needed, as only the .z component is used like this: if ( normal[2] < 0 ) { (and the optimized code doesn't calculate the rest).
So I wrote an even simpler function that does exactly this - and made it non-inline - luckily I could still reproduce the bug with it.
It looks like:

#pragma GCC push_options
#pragma GCC optimize("O1") // O0 and O1 work, O2 causes the glitches

static bool norm2lt0( const idVec3& v1, const idVec3 &v2 ) __attribute__((noinline));
static bool norm2lt0( const idVec3& v1, const idVec3 &v2 ) {
	float z = v1.x * v2.y - v1.y * v2.x;
	bool ret = z < 0.0f;
	return ret;
}
#pragma GCC pop_options // restore default optimizations
// followed by PointInTri() implementation

and in PointInTri() I replaced normal = myCross(d1, d2); if ( normal[2] < 0 ) {
with if( norm2lt0(d1, d2) ) {

I also found out that with -O1 optimization (like with -O0) the bug doesn't happen but with -O2 it does (at least on my system with GCC 7.5.0)

Here's the disassembly in case anyone can read that (I haven't looked very hard at it yet, I need to look up every single instruction..):

# broken (-O2)
0000000000000810 <_ZL8norm2lt0RK6idVec3S1_.isra.3>:
        float z = v1.x * v2.y - v1.y * v2.x;
     810:       c5 f2 59 d2             vmulss xmm2,xmm1,xmm2
        bool ret = z < 0;
     814:       c5 f0 57 c9             vxorps xmm1,xmm1,xmm1
static bool norm2lt0( const idVec3& v1, const idVec3 &v2 ) {
     818:       55                      push   rbp
     819:       48 89 e5                mov    rbp,rsp
}
     81c:       5d                      pop    rbp
        float z = v1.x * v2.y - v1.y * v2.x;
     81d:       c4 e2 69 9b c3          vfmsub132ss xmm0,xmm2,xmm3
        bool ret = z < 0;
     822:       c5 f8 2e c8             vucomiss xmm1,xmm0
     826:       0f 97 c0                seta   al
}
     829:       c3                      ret    
     82a:       66 0f 1f 44 00 00       nop    WORD PTR [rax+rax*1+0x0]

# working (-O1)

static bool norm2lt0( const idVec3& v1, const idVec3 &v2 ) __attribute__((noinline));
static bool norm2lt0( const idVec3& v1, const idVec3 &v2 ) {
     2b0:       55                      push   rbp
     2b1:       48 89 e5                mov    rbp,rsp
        float z = v1.x * v2.y - v1.y * v2.x;
     2b4:       c5 fa 10 07             vmovss xmm0,DWORD PTR [rdi]
     2b8:       c5 fa 59 46 04          vmulss xmm0,xmm0,DWORD PTR [rsi+0x4]
     2bd:       c5 fa 10 4f 04          vmovss xmm1,DWORD PTR [rdi+0x4]
     2c2:       c5 f2 59 0e             vmulss xmm1,xmm1,DWORD PTR [rsi]
     2c6:       c5 fa 5c c1             vsubss xmm0,xmm0,xmm1
        bool ret = z < 0;
     2ca:       c5 f0 57 c9             vxorps xmm1,xmm1,xmm1
     2ce:       c5 f8 2e c8             vucomiss xmm1,xmm0
     2d2:       0f 97 c0                seta   al
        return ret;
}
     2d5:       5d                      pop    rbp
     2d6:       c3                      ret    
     2d7:       66 0f 1f 84 00 00 00    nop    WORD PTR [rax+rax*1+0x0]
     2de:       00 00 

@coldtobi
Copy link

coldtobi commented May 30, 2021

does this happen for certain input values or generally? Do you have some test vectors and expected results?

FWIW: I've playing with the function at https://godbolt.org/z/4GnzrYsfW ; code generated by it however is different, more lke the working example; for all over O>=1 I've also tried different compiler versions… Also tried for clang in the hope the clang diagnostics has some hints…

@BielBdeLuna
Copy link

BielBdeLuna commented May 30, 2021

I can confirm the -O1 doesn't display the bug in RBDoom3BFG. -O2 does.

I've found that page that explains very basically some methods on how to stop the optimization: https://programfan.github.io/blog/2015/04/27/prevent-gcc-optimize-away-code/

maybe we could use this method from the page in order to do the cross operations:

template <typename T> inline doNotOptimizeAway(T&& datum) {
    asm volatile ("" : "+r" datum);
}

and then maybe:

static idVec3 myCross( const idVec3& v1, const idVec3 &v2 ) {

        //  v1.y * v2.z - v1.z * v2.y
        float 1x;
        doNotOptimizeAway( 1x = v1.y );
        doNotOptimizeAway( 1x *= v2.z );
        float 2x;
        doNotOptimizeAway( 2x = v1.z );
        doNotOptimizeAway( 2x *= v2.y );
        float final_x;
        doNotOptimizeAway( final_x = 1x );
        doNotOptimizeAway( final_x -= 2x );

        // v1.z * v2.x - v1.x * v2.z
        float 1y;
        doNotOptimizeAway( 1y = v1.z );
        doNotOptimizeAway( 1y *= v2.x );
        float 2y;
        doNotOptimizeAway( 2y = v1.x );
        doNotOptimizeAway( 2y *= v2.z );
        float final_y;
        doNotOptimizeAway( final_y = 1y );
        doNotOptimizeAway( final_y -= 2y );

        // v1.x * v2.y - v1.y * v2.x

        float 1z;
        doNotOptimizeAway( 1z = v1.x );
        doNotOptimizeAway( 1z *= v2.y );
        float 2z;
        doNotOptimizeAway( 2z = v1.y );
        doNotOptimizeAway( 2z *= v2.x );
        float final_z;
        doNotOptimizeAway( final_z = 1z );
        doNotOptimizeAway( final_z -= 2z );

        return idVec3( final_x, final_y, final_z );
	//return idVec3( v1.y * v2.z - v1.z * v2.y, v1.z * v2.x - v1.x * v2.z, v1.x * v2.y - v1.y * v2.x );
}

like explained in that page?

*** edit *** I see that my proposed solution resembles somewhat to the assembly operations in -O1 for every dimension of the result. only making them "volatile"

@BielBdeLuna
Copy link

so is the error everywhere where the cross product is used? or is it only in the dmap optimization usage of it?

@coldtobi
Copy link

coldtobi commented May 30, 2021

My experience says that whether optimization or not is likely a red herring… IMHO it is a subtle bug in the code or (unlikely) a compiler problem. So I'm skeptical about this trying to trick the compiler…

One idea: That could be the order of execution: With optimization the compiler is (beside other things) allowed to reorder instructions. I've seen situations where this might be problemtic, especially if the individual floats are using significant different exponents, so that for example an a-b is basically a, if b is so small that the difference cannot be represented… (lacking of words to express myself, sorry if that confuses more than helping)

one obersavtion: the (broken) assembly snippet from Daniel seems to have omitted the load instructions: It soley works on registers. That could mean that the function has been inlined and gcc ignoring the attribute. As noinline is only a hint for the compiler, it is not bound to obey, this is possible: See also https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html)
That could be a hint that the problem is somewhere in the calling code.

PS: https://godbolt.org/z/csfqW56Kh -- observe that test() is not calling norm2lt0() in the assembly code AFAICS… (On O3)
If you go O1 it does… So compiler thinks it can calculate the result without the help of the function…

@BielBdeLuna
Copy link

BielBdeLuna commented May 30, 2021

I've got it solved!
it seems that my previous solution doesn't work, Daniel's works perfectly! :)

Optimized to 3 as per normal, but adding Daniel function before PointInTri() with a slight change (and also the due replacements in PointInTri() ) :

static bool norm2lt0( const idVec3& v1, const idVec3 &v2 ) __attribute__((optimize("O0")));
static bool norm2lt0( const idVec3& v1, const idVec3 &v2 ) {

	float z = v1.x * v2.y - v1.y * v2.x;
	return z < 0.0f;
}

no need of the preprocessers if it's just that function

*** edit *** made a pull request with the changes: #573

Maybe we should put norm2lt0 in the general vectors file?
I propose rename norm2lt0 to NormalCrossIsNegative "only true if the resulting normal cross product is negative"

@BielBdeLuna
Copy link

BielBdeLuna commented May 30, 2021

I added a "CrossZisNegative" function to Vector.h inside the idVec3 under the Cross function:

    bool            CrossZisNegative( const idVec3& a ) const __attribute__((optimize("O0")));

afterwards I could inline it after the inlined cross function:

ID_INLINE bool idVec3::CrossZisNegative( const idVec3& a ) const {
	float cross_z = x * a.y - y * a.x;
	return cross_z < 0.0f;
}

and eventually I have reupdated the three cases of PointInTri to use the CrossZisNegative with:

        //normal = d1.Cross( d2 );
	if( d1.CrossZisNegative( d2 ) )
	{
		return false;
	}

also now in PointInTri normal is no longer needed

it works fine.

I haven't made this change in the pull request yet.
Do you think we might need this function in vector.h or it's too specific for such a problem to not needing to generalize it?
in fact I couldn't find another instance of normal[2] < 0 anywhere else in the whole /neo/ folder

@DanielGibson
Copy link

DanielGibson commented May 30, 2021

one obersavtion: the (broken) assembly snippet from Daniel seems to have omitted the load instructions: It soley works on registers. That could mean that the function has been inlined and gcc ignoring the attribute.

I think that the x86_64 calling convention (on Unices) pass suitable arguments in those xmmN registers, so it should be fine to operate directly on them instead of doing loads.
In my -O2 code that function indeed was called if the noinline attribute was set.

That could be the order of execution: With optimization the compiler is (beside other things) allowed to reorder instructions.

That's likely - note that the -O2 version uses a multiplication (vmulss) and the vfmsub132ss instruction (which multiplies and then subtracts), while the -O1 version does two multiplications and then a subtraction like one would expect.

One set of values that triggers the bug is:
d1: {x = -277.333313, y = -69.3333282, z = 0} or {0xc38aaaaa 0xc28aaaaa 0x0}
d2: {x = -341.333313, y = -85.3333282, z = 0} or {0xc3aaaaaa 0xc2aaaaaa 0x0}
the floating point values are not exact, I think, but those hex values are a bit-wise copies of the floats, i.e. exact, so they can be converted like this:
uint32_t x = 0xc38aaaaa; float res; memcpy(&res, &x, 4);

The cross product's .z (or [2]) value of d1 and d2 is very close to zero.

I'll go on investigating later today, I just got up..

It's possible that a better solution would be to replace if ( normal[2] < 0 ) with if ( normal[2] < -0.00001f ) (or some similar value), i.e. using a little bit of tolerance

@BielBdeLuna
Copy link

BielBdeLuna commented May 30, 2021

I guess -ffast-math speeds up math for the renderer, isn't it? I mean we can go without it, isn't it?

what if we keep this function out of the optimization and we search a way to stop the optimization for MSVC or a workaround for MSVC, like a function specific for Windows?

also:

float z = v1.x * v2.y - v1.y * v2.x.
If v1.x * v2.y and v1.y * v2.x are identical, z should obviously be 0.0.

maybe we can test those results separated and via code make z 0.0?

@DanielGibson
Copy link

I guess -ffast-math speeds up math for the renderer, isn't it? I mean we can go without it, isn't it?

It should be disabled, it's garbage.

But even without -ffast-math GCC miscompiles that function.

I don't think MSVC has that problem, if we used some GCC-specific pragma or function annotation as a workaround that must be guarded by #ifdef __GNUC__ or something like that

@BielBdeLuna
Copy link

BielBdeLuna commented May 30, 2021

But even without -ffast-math GCC miscompiles that function.

but at -O0 then doesn't display the error, it is still being miscompiled by then?

@DanielGibson
Copy link

Did you read my posts?
GCC does this if the FMA CPU extension is enabled and if -O2 or above are used. Not with -O0, not with -O1.

@BielBdeLuna
Copy link

BielBdeLuna commented May 30, 2021

Did you read my posts?

yes but it's not an easy issue to follow for me, that's why I'm asking.

by FMA CPU you mean Fused Multiply-Add extensions? isn't it? are those extension enabled with the -ffast-math or are those two things different? -march=znver1 is never called in the CMakeLists.txt, is it implicit with -ffast-math?

@DanielGibson
Copy link

DanielGibson commented May 30, 2021

I updated the post to make things a bit clearer without having to keep all the former posts (that investigated which compiler settings trigger it) in mind.

by FMA CPU you mean Fused Multiply-Add extensions?

yes

are those extension enabled with the -ffast-math

No, it's enabled with -march=native (if the host CPU supports it), or -march=znver1 or some other architecture that supports it.
or probably -mfma.
Probably using that extension isn't a problem in general, but using it to optimize this specific kinda of operation (basically a*b - c*d) is a bad idea.
Clang only uses it in this case if -ffast-math is set (which is ok, -ffast-math means "you're free to fuck up my math code in non-standard ways as long as we can pretend it gets faster"), GCC also does it without -ffast-math, if -O2 (or higher) is set (which is not ok, because just -O2 or -O3 is not supposed to violate the standard, and at least -O2 should avoid "dangerous" optimizations in general).
There might be a specific -f option that's implicitly enabled by -O2 which controls this with GCC, I don't know.
UPDATE: Apparently the flag that enables this (bad) optimization is -fexpensive-optimizations - which unfortunately is very unspecific and might include other optimizations that are generally desirable, so it's not good as a global option.
Luckily, #pragma GCC optimize("-fno-expensive-optimizations") instead of #pragma GCC optimize("O1") seems to work


I'm still not sure how to properly fix this issue.
At first I thought adding some tolerance for the comparison is a good idea, but OTOH then some points that even with non-broken code would be considered outside the triangle (=> small negative numbers) suddenly get treated differently.
As this is (IMO) a compiler bug, maybe using pragmas or function attributes to reduce optimization to -O1 (if compiling with GCC) for the cross products might be a better solution after all.
Either way, -ffast-math should be completely removed; see dhewm3's CMakeLists.txt for what related options should be safe.

@BielBdeLuna
Copy link

BielBdeLuna commented May 30, 2021

ah, ok znver1 are the extensions of Zen cpus (version 1, whatever that is)

So if you're attribute to -O1 the cross product function or -O0, so even if you're using GCC you're not optimizing it, isn't it? as you said if you're using clang you don't find the problem, so maybe we could just optimize -O0 just for GCC as you pointed already, and then we would only need to search a way to not optimize it for Windows too, I think it's with the /0 functions ( I don't know anything about MSVC, sorry )

maybe we could use -O2 as the standard isnted of -O3 and test how much more slow the engine would get without the -ffast-math.

*** edit ***
so establishing a -O2, without -ffast-math compile, and no other changes, still displays the error in dmap.
It's only when you go-O1 that the issue is not displayed.

*** edit 2 ***
changed the cross function in idVec3 in vector.h:

#ifdef __GNUC__
    idVec3			Cross( const idVec3& a ) const __attribute__((optimize("O1")));
#else
    idVec3			Cross( const idVec3& a ) const;
#endif

and compiled with -O2 and without -ffast-math without further changes.
it doesn't display the issue.

shouldn't this be it?

DanielGibson added a commit to dhewm/dhewm3 that referenced this issue May 31, 2021
Only happend if `ONATIVE` was enabled (or some other flag was set
that enables the FMA extension), the root cause was that the cross
product didn't return 0 when it should, but a small value < 0.
RobertBeckebans/RBDOOM-3-BFG#436 (comment)
has lots of explanation.

I think this is a compiler bug, this commit works around it.

fixes #147
@DanielGibson
Copy link

DanielGibson commented May 31, 2021

See dhewm/dhewm3@320c15f for a fix

DanielGibson added a commit to dhewm/dhewm3 that referenced this issue May 31, 2021
Only happend if `ONATIVE` was enabled (or some other flag was set
that enables the FMA extension), the root cause was that the cross
product didn't return 0 when it should, but a small value < 0.
Caused some faces to be missing in maps compiled with dmap.
RobertBeckebans/RBDOOM-3-BFG#436 (comment)
has lots of explanation.

I think this is a compiler bug, this commit works around it.

fixes #147
@DanielGibson
Copy link

GCC bugreport for this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100839

@BielBdeLuna
Copy link

BielBdeLuna commented May 31, 2021

so it's -ffp-contract=off, have you tested it in dhewm3?

*** edit ***
let me tell you we have a winner here!
-O3 and -ffp-contract=off and the issue is gone!
no other changes, just changes in the CMakeLists.txt!

how do you check the ASM commands that the compiler do?
how do you check for sure that all the vmulss and vfmsub132ss hocus pocus in the code are the right one calls for sure? :)

@coldtobi
Copy link

I've used Daniels Compiler Explorer to try something:
This seems to hint the compiler, at least compiler explorer says it works too with O2 and Daniels test vectors.

static float crossZ_O2( const idVec3& v1, const idVec3 &v2 ) {
float intermediate1 = v1.x * v2.y;
float intermediate2 = v1.x * v2.y;
float z = intermediate1 - intermediate2;
return z;
}`

(the compiler explorer has some extra lines to output things commented out that I used to see the intemediate values; I've omitted them here)

Just wanted to share this, I don't know if this is suitable.

@DanielGibson
Copy link

Nice idea (and I think I even read somewhere that this kind of optimization shouldn't be done then, at least in C?), but float intermediate2 = v1.x * v2.y; this should've been v1.y * v2.x - and with that change it doesn't seem to work - same ASM is generated as with the original code :-/

I'm annoyed that the GCC developers don't even seem to see any problem with this behavior. They must really hate their users..

@coldtobi
Copy link

coldtobi commented May 31, 2021

thanks for seeing my damn copy+paste f*up, Daniel.
Yes, gcc folks are difficult… /me wonders if this bug triggers on other math stuff as well, in the game.

I'll try another thing: using strict c++11 mode… (Daniel did so already, just saw that in the bug report for gcc…)
No need to repeat that.

@coldtobi
Copy link

Another idea, as in the gcc bugreport, they've said: "C is just different here from C++ :)." …
if I did not stupid things this time, it worked:
wrapping the function in extern "C" { … } :

https://godbolt.org/z/1brM46bjc

@BielBdeLuna
Copy link

BielBdeLuna commented May 31, 2021

but if there might be other functions corrupted by contract optimization wouldn't then make sense to just use -ffp-contract=off instead of going back instruction per instruction testing if the math is correct?

I think the idea is:
"you wanted optimization and that's it, you weren't forced to optimize your code, so it's expectable that optimization might break your code"
so something in the line of
"If you want speedy code, code better rather than just rely on compiler optimizing your code"
isn't it?

I think it's more a problem of the documentation of the compiler that fails to inform of such behaviour and how to combat it (with that single command) rather than gcc devs hating the gcc users.

so this also might mean that those same optimizations we're calling might break more stuff, and we might have to scan the documentation in order to find more commands to stop the code optimization troubles somewhere else.

@DanielGibson
Copy link

wrapping the function in extern "C" { … } :

oh my god. nice that it works but..

I still don't get that the GCC people think that it's completely normal and fine that you can't implement a simple cross product with their "compiler" without using obscure compilerflags, pragmas or hacks.
This is completely unacceptable - it's worse than -fstrict-aliasing, because this definitely is valid code and -O2 is not a flag expected to break code.
This will cause a lot of trouble (and countless wasted developer hours from debugging) once more people enable FMA support - I think the only reason this hasn't blown up spectacularly/publicly (yet) is that by default on x86_64 it's not used.

I'm thinking about ditching GCC and explicitly removing support for it from dhewm3 (with build errors or something) and tell people to use clang instead. It's time GCC loses the last relevance it still has.

@BielBdeLuna
Copy link

BielBdeLuna commented May 31, 2021

in our case we are not just asking the compile to do a simple cross product, we are also asking it to optimize it.

what do you think? Do I make a pull request with the simple changes in CMakeLists.txt?

@DanielGibson
Copy link

updated fix: dhewm/dhewm3@2521c3d
(setting -ffp-contract=off in CMakeLists.txt)

@BielBdeLuna
Copy link

BielBdeLuna commented Jun 1, 2021

made the following pull request #575

@BielBdeLuna
Copy link

solved! :)

@RobertBeckebans RobertBeckebans added this to To do in RBDOOM-3-BFG 1.3.0 via automation Jun 3, 2021
@RobertBeckebans RobertBeckebans moved this from To do to Feedback, Testing in RBDOOM-3-BFG 1.3.0 Jun 3, 2021
@RobertBeckebans RobertBeckebans moved this from Feedback, Testing to Done in RBDOOM-3-BFG 1.3.0 Aug 29, 2021
rorgoroth pushed a commit to rorgoroth/dhewm3 that referenced this issue Apr 8, 2023
Only happend if `ONATIVE` was enabled (or some other flag was set
that enables the FMA extension), the root cause was that the cross
product didn't return 0 when it should, but a small value < 0.
Caused some faces to be missing in maps compiled with dmap.
RobertBeckebans/RBDOOM-3-BFG#436 (comment)
has lots of explanation.

I think this is a compiler bug, this commit works around it.

fixes dhewm#147
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

No branches or pull requests

6 participants