Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

long double floating point calculations give inexact results #830

Open
bewantbe opened this Issue Aug 9, 2016 · 43 comments

Comments

Projects
None yet
@bewantbe
Copy link

bewantbe commented Aug 9, 2016

This program

// g++ -std=c++11 -static -o m24_11 m24.cpp ; ./m24_11
int main()
{
  long double t = 1;
  printf("%.17Le\n", t / 100000);
}

outputs (Linux, expected result):
1.00000000000000000e-05

But in BashOnWindows, it outputs:
1.00000000000000008e-05

The long double data type in C/C++ under Linux using GCC for x86/x86-64 platform has an 80-bit format with 64 bits mantissa.
When a program using long double runs in BashOnWindows it behaves like having 53 bits mantissa number just like the usual "double" (but with almost the same exponential range as long double)

This harms functions like std::pow(long double, int) and hence boost::lexical_cast<double>(char*) (boost version 1.55) since internally it uses long double as intermediate result (but boost version 1.60 fixes this problem), therefore boost::program_options can reads double type options inaccurately also (even for exact input e.g. 0.03125 = 1/32).

Likewise this program:

// g++ -std=c++11 -static -o m25 m25.cpp ; ./m25
#include <stdio.h>
#include <limits>
int main()
{
  long double t0, t1 = 1.0;
  while (t1 > 0) {
    t0 = t1;
    t1 = t0 / 2;
  }
  long double lt = std::numeric_limits<long double>::denorm_min();
  if (t0 == lt) {
    printf("Good.\n");
  } else {
    printf("What? t0 = %.17Le != %.17Le\n", t0, lt);
  }
}

Should output:
Good.
While in BashOnWindows, it output:
What? t0 = 7.46536864129530799e-4948 != 3.64519953188247460e-4951

@benhillis

This comment has been minimized.

Copy link
Member

benhillis commented Aug 9, 2016

Are you sure you're using the same version of g++ on WSL versus your Linux machine?

@therealkenc

This comment has been minimized.

Copy link
Collaborator

therealkenc commented Aug 9, 2016

Confirmed it happens with consistent g++ version (6.1.1 20160510). Same ELF64 object generates byte-for-byte on native and WSL. It doesn't even call any math library functions. Looking at the assembly, everything that calculates t0 is inline, and it only escapes to call denorm_min() and printf()/puts(), let alone leaves userspace for math. strace is the same on both sides. No idea.... 🤷

@aseering

This comment has been minimized.

Copy link
Contributor

aseering commented Aug 9, 2016

It's been a while since I looked at this sort of thing (pre-x86_64), but IIRC Windows and Linux have slightly different calling conventions regarding saving things like the FPU control registers during/after syscalls? Maybe relevant?

@stehufntdev

This comment has been minimized.

Copy link
Collaborator

stehufntdev commented Aug 9, 2016

Thanks for reporting the issue. Just looked at this, and it is a difference between the default floating point control register (i.e. FPCSR\NPXCW) between Linux and NT. glibc assumes 0x37f but NT uses 0x27f. To confirm, I added the following lines to the sample above to set the control register and it passed:

#include <fpu_control.h>
...
   unsigned short Cw = 0x37f;
  _FPU_SETCW(Cw);

MxCsr is the same between Linux and NT, but I'll poke around to see if there are other control registers that are mismatched. I'll file a bug on our side to match the default configuration on Linux.

@hrlngrv

This comment has been minimized.

Copy link

hrlngrv commented Aug 10, 2016

FTHOI, I compiled the sample program in Linux, then copied the resulting executable to WSL. It produced different results under Linux and WSL, 1.00000000000000000e-05 under Linux but 1.00000000000000008e-05 under WSL. It's not g++ that's the problem.

Does WSL not support 80-bit floating point? Does WSL not use the FPU but only provide 64-bit software floating point math? FWLIW, awk under Linux produces the same output as Windows.

@benhillis

This comment has been minimized.

Copy link
Member

benhillis commented Aug 10, 2016

See the comment above yours. There is a difference between the floating point control register on Windows vs Linux. Steve is working on a fix.

@bewantbe

This comment has been minimized.

Copy link
Author

bewantbe commented Aug 10, 2016

@stehufntdev Thank you for the temporary solution (especially the magic number!!! :-)) and the on going work. With that fix, my (some other) computation executable now gives the same results under BashOnWindows as under Linux.

@therealkenc Thanks for the clarification, that's exactly what happend.

Just for completeness
Windows version:
10.0.14393
The test programs are produced by either
GCC 4.9, libc 2.19 (Debian 8.5 (jessie))
GCC 5.4, libc 2.23 (Debian testing (stretch))
with all libs statically linked.

@richfelker

This comment has been minimized.

Copy link

richfelker commented Apr 29, 2017

Bumping this because it's a serious ABI issue and completely breaks any musl libc-linked binaries that use floating point scanf/strtod, among other things. It probably also breaks printf rounding of floating point values. We use long double internally for various reasons, including issues related to excess precision (on archs that have FLT_EVAL_METHOD>0), and the cost is irrelevant since the bulk of the code actually works with integers, but of course it badly breaks when run with an ABI-non-conforming control word setting or an emulator that fails to emulate x87 correctly.

@benhillis benhillis added the backlog label May 5, 2017

@benhillis

This comment has been minimized.

Copy link
Member

benhillis commented May 5, 2017

@richfelker - Thanks for pinging this thread, this issue is still on your backlog. If possible could you describe a specific scenario that is not working because of this to help us prioritize this fix?

@richfelker

This comment has been minimized.

Copy link

richfelker commented May 5, 2017

OK, I'll work out a specific strtod call that returns wrong result with x87 control word set incorrectly.

@fweimer

This comment has been minimized.

Copy link

fweimer commented May 5, 2017

@benhillis, I think several tests in the glibc test suite (e.g., math/test-ldouble, stdlib/tst-strtod-round) fail due to this issue.

@richfelker

This comment has been minimized.

Copy link

richfelker commented May 5, 2017

With the control word set wrong, strtod("4398046511105",0) returns 4398046511104. This represents the exact boundary I expected to break: values with more than 53-(64-53) = 42 bits in the significand.

@therealkenc

This comment has been minimized.

Copy link
Collaborator

therealkenc commented Mar 22, 2018

I still think this is going to have a NodeJS repro if someone in the small intersection of coders that do numerical computation stuff and ECMAScript stuff (not me) wants to take a crack at it. The interpreter (JIT compiled or otherwise) is going to make hard assumptions about the floating point mode. Might be motivational.

@ctsa

This comment has been minimized.

Copy link

ctsa commented May 23, 2018

+1

We just tried to transfer one of our projects onto WSL (https://github.com/Illumina/strelka), and traced failing unit tests down to differences like this within boost::math::hypergeomtric_distribution<>:

#include <iostream>
#include <iomanip>

int main()
{
    long double n(7.59070505394721872907e+286);
    long double d(2.00440157654530257755e+291);

    // linux n = 3.78701810194652734355e-05
    // WSL  n = 3.78701810194652746001e-05
    n /= d;

    std::cout.precision(21);
    std::cout << "n = " << n << "\n";
}
@ghost

This comment has been minimized.

Copy link

ghost commented May 28, 2018

@benhillis, this is about to be a two years old issue. Can we have some updates? Something we can do to help you guys fixing it?

@ethanherbertson

This comment has been minimized.

Copy link

ethanherbertson commented Jun 15, 2018

I just ran into this (or a similar) issue while performance testing two different ways to do a calculation in Node.js.

On WSL the calculation started producing an incorrect result after a few thousand iterations, whereas on a RHEL virtual machine it was always correct (at least through 1,000,000 iterations). The specific iteration where the first error occurred was consistent as long as the script was unchanged, but it would vary somewhat if small changes were introduced into the script.

Here is one such script:

var sqrtfive = Math.sqrt(5);
var phi = (1 + sqrtfive) / 2;
var fibc = function (n) { return Math.round((Math.pow(phi, n)) / sqrtfive); };

var output = 0;
var correct_output = fibc(75);
var num_errors = 0;
for (var x = 0; x < 100000; x++) {
    output = fibc(75);
    if (output != correct_output) {
        num_errors++;
        console.log('error at x = ' + x);
        if (num_errors > 25) {
            break;
        }
    }
}
console.log(correct_output);
console.log(output); 

Running the above through node v8.11.3 prints, on my machine:

error at x = 11644
error at x = 15526
error at x = 15527
error at x = 15528
error at x = 15529
error at x = 15530
error at x = 15531
error at x = 15532
error at x = 15533
error at x = 15534
error at x = 15535
error at x = 15536
error at x = 15537
error at x = 15538
error at x = 15539
error at x = 15540
error at x = 15541
error at x = 15542
error at x = 15543
error at x = 15544
error at x = 15545
error at x = 15546
error at x = 15547
error at x = 15548
error at x = 15549
error at x = 15550
2111485077978050
2111485077978055

(For the record, 2111485077978050 is the correct answer for Fib(75).)

@richfelker

This comment has been minimized.

Copy link

richfelker commented Feb 10, 2019

Ping? Back in October I had indication out-of-band that a fix for this was coming, but I haven't seen anything about it since.

@luoqi-git

This comment has been minimized.

Copy link

luoqi-git commented Mar 13, 2019

It looks like a pretty old ticket, is there any plan to resolve it? I image the fix is very straightforward, just initialize the right word in the process control block/context of the first linux process.

@luoqi-git

This comment has been minimized.

Copy link

luoqi-git commented Mar 13, 2019

Ok, this problem is more complicated (and more serious) than I thought -- the x87 control word is NOT inherited by a child process across fork(), which means there is no easy user space workaround by writing a wrapper program to properly initialize the control word then fork and exec. I believe this inheritance is required for POSIX compliance.

% cat t.c
#include <fenv.h>
#include <math.h>
#include <pthread.h>
#include <setjmp.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <ucontext.h>
#include <unistd.h>
#include <xmmintrin.h>

static unsigned short _mm_getfcw()
{
        unsigned short word;
        asm volatile("fstcw %0" : "=m" (*(&word)));
        return word;
}

static void _mm_setfcw(unsigned short word)
{
        asm volatile("fldcw %0" : : "m" (*(&word)));
}

__attribute__((constructor))
static void _mm_fpuinit()
{
        asm volatile("finit");
}

static void report(const char *name)
{
        printf("%s\tfcw %x csr %x\n", name, _mm_getfcw(), _mm_getcsr());
}

static jmp_buf state;

static void sigfpe(int sig, siginfo_t *info, void *uap)
{
        ucontext_t *uc = uap;
        fpregset_t fp = uc->uc_mcontext.fpregs;

        if (!fp) {
                printf("sigfpe\tno fpu context\n");
        } else {
                printf("sigfpe\tfcw %x csr %x\n", fp->cwd, fp->mxcsr);
        }
        longjmp(state, 1);
}

int main(int argc, char **argv)
{
        report("initial");

        fesetround(FE_TOWARDZERO);
        _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

        report("parent");

        {
                pthread_t thr;
                pthread_create(&thr, 0, (void *(*)(void*))report, "thread");
                pthread_join(thr, 0);
        }

        if (fork() == 0) {
                report("child");
                if (setjmp(state) == 0) {
                        long double x = 0;
                        struct sigaction sa;
                        memset(&sa, 0, sizeof sa);
                        sa.sa_sigaction = sigfpe;
                        sa.sa_flags = SA_SIGINFO;
                        sigaction(SIGFPE, &sa, 0);
                        feenableexcept(FE_DIVBYZERO);
                        x = 1 / x;
                        return (int)x;
                }
        } else {
                int status;
                wait(&status);
                if (WIFSIGNALED(status)) {
                        printf("child killed by signal %d\n",
                                WTERMSIG(status));
                } else if (WIFEXITED(status)) {
                        printf("child exited with status %d\n",
                                WEXITSTATUS(status));
                }
        }
        return 0;
}
% cc -o t t.c -lm -lpthread
native% ./t
initial	fcw 37f csr 1f80
parent	fcw f7f csr ff80
thread	fcw f7f csr ff80
child	fcw f7f csr ff80
sigfpe	fcw f7b csr fd80
child exited with status 0
wsl% ./t
initial	fcw 37f csr 1f80
parent	fcw f7f csr ff80
thread	fcw 27f csr 1f80
child	fcw 27f csr 1f80
child exited with status 142

It is also confirmed that the control word is NOT inherited by a child (POSIX) thread. And the problem is not limited to x87 control word, it equally affects the SSE control and status register.

Many manifestations of the same problem were reported earlier in this ticket, let me try to summarize here the full impact, based on what I have gathered so far:

  • long double, which gcc maps into 80-bit extended precision floating point on x86, won't work as expected
  • C99 fenv(3) settings won't be passed down to child processes or threads, changes to rounding mode or exception masks will have to be set in each individual thread an application creates
  • The same is true for the commonly used flush to zero mode to speed up the denormal handling in SIMD code

Update:
There also seems to be problem with SIGFPE generation, division by zero is expected to raise a SIGFPE signal, but instead the process is terminated with some arbitrary exit status. My original intention was to see what x87 control word value would be saved into a sigcontext, as a way to probe if/how the kernel tracks the register. This behavior is the same if SSE is used instead of x87 (just replace the long double with double in the test code).

@therealkenc

This comment has been minimized.

Copy link
Collaborator

therealkenc commented Mar 13, 2019

which means there is no easy user space workaround by writing a wrapper program to properly initialize the control word then fork and exec

LD_PRELOAD __libc_start_main(). Something like:

#include <dlfcn.h>
#include <fpu_control.h>

template<typename Tfn>
auto dlfn(const char *name)
{
    return reinterpret_cast<Tfn>(dlsym(RTLD_NEXT, name));
}

extern "C"
int __libc_start_main(int *(main) (int, char * *, char * *), int argc, char * * ubp_av, void (*init) (void), 
    void (*fini) (void), void (*rtld_fini) (void), void (* stack_end))
{
    auto start = dlfn<decltype(&::__libc_start_main)>("__libc_start_main");
    unsigned short cw = 0x37f;
    _FPU_SETCW(cw);
    return start(main, argc, ubp_av, init, fini, rtld_fini, stack_end);
}

Utterly untested, that is off the cuff.

I believe this inheritance is required for POSIX compliance.

I believe you made that up I believe I've been corrected. But it's still pretty disappointing the incompatibility hasn't been addressed yet.

@luoqi-git

This comment has been minimized.

Copy link

luoqi-git commented Mar 13, 2019

@therealkenc Yes, it is possible to initialize the control word in the main process through LD_PRELOAD, but it will not be passed down to any child process or more seriously any child thread, which renders it useless in most applications.

I didn't make it up about POSIX compliance, whereas there could be some latitude regarding execve(2), there is no ambiguity about fork(2), see the spec about fork, and I quote,

The fork() function shall create a new process. The new process (child process) shall be an exact copy of the calling process (parent process) except as detailed below:

Here "exact copy" is to be interpreted as all machine specific/dependent states -- I know this because I am an OS developer.

This means I also understand the fix of this problem is non-trivial. The symptom indicates the x87 control word is not maintained as part of the linux process context, it's only in the NT process context. What could further complicate this effort is that x87 coprocessor is usually implemented as lazy during context switch, that is, the FPU is initially marked as not present, any FPU instruction would trigger a fpu-unavailable trap at which time FPU context (including the control word) would be restored. It is very likely the current linux subsystem module won't even see this trap, and some changes to the NT microkernel proper may have to be made.

@therealkenc

This comment has been minimized.

Copy link
Collaborator

therealkenc commented Mar 13, 2019

I didn't make it up about POSIX compliance, whereas there could be some latitude regarding execve(2), there is no ambiguity about fork(2)

Okay, conceded. I yield.

This means I also understand the fix of this problem is non-trivial.

Probably, using the "or it would have been" rule of thumb.

@luoqi-git

This comment has been minimized.

Copy link

luoqi-git commented Mar 13, 2019

I would volunteer to fix it were the code open source...

@therealkenc

This comment has been minimized.

Copy link
Collaborator

therealkenc commented Mar 13, 2019

You got me thinking for a few minutes if it could be done if we ever get PTRACE_SYSEMU. I think it might... or at least, could -- as long as one is willing disallow perfect from being the enemy of good. I've got some unrelated code in the can that is shimming fork() for what it is worth. Looks like (pasting):

// this gets called eventually via a LD_PRELOAD of libc fork()
int WslShim::fork()
{
    //LOG(3) << "entering WslShim::fork";
    int pid;
    if (state_) {
        state_->rpc_->notify_fork(rpc::fork_prepare);
        pid = libc_.fork_();
        if (pid == 0) {
            //LOG(3) << "WslShim::fork() child";
            state_->rpc_->notify_fork(rpc::fork_child);
        } else {
            //LOG(3) << "WslShim::fork() parent";
            state_->rpc_->notify_fork(rpc::fork_parent);
        }
    } else {
        //LOG(3) << "WslShim::fork() called with no state";
        pid = libc_.fork_();
    }
    return pid;
}

One could stuff a _FPU_SETCW() in there. Same can be done for (say) pthread_create() ad nausium. That doesn't help you with a raw clone() syscall, mind. But if we were allowed to trap it. Maybe....

Or maybe not. It's hard to differentiate open issues that fall into the category of "haven't got to it" versus "can't be done (perfectly) without disturbing the core NT team people" versus "could do it but effort". I could cite examples of all three categories on the open issue pile.

If I had to guess, your "some changes to the NT microkernel proper may have to be made" assessment is probably correct. It's actually pretty rare for issues to get marked with a big yellow 'backlog' tag.

@luoqi-git

This comment has been minimized.

Copy link

luoqi-git commented Mar 13, 2019

@therealkenc I just raised the bar for you :) with the mxcsr register, and you will also have to hook pthread_create(), which I believe is more important given the prevalence of multithreaded applications.

It also got me wondering what other (less often used) process context are not properly maintained. For example, the debug registers, which are required for the proper functioning of gdb, or the even more obscure io permission map -- does NT even allow the linux module this level of access?

@richfelker

This comment has been minimized.

Copy link

richfelker commented Mar 13, 2019

"User-space workarounds" are a non-starter because WSL is supposed to be able to run arbitrary Linux binaries, including static-linked ones. You can't hook functions by name because the name might not even exist; pthread_create or fork could have been LTO'd into main for all you know. This really needs fixing on the WSL side: SYS_clone needs to correctly clone the full register file including floating point control and status registers, and SYS_execve needs to setup the initial register file in a manner than conforms with the ABI (x87 set to full precision).

@richfelker

This comment has been minimized.

Copy link

richfelker commented Mar 13, 2019

The latter (right state at initial entry/execve) is an utterly trivial one-line fix somewhere; it's ridiculous that this has been unfixed for nearly years, and makes WSL unsafe for lots of serious production use. In particular any binaries/distros built on musl libc will have dangerously wrong behavior from strtol, etc. - things like converting an exactly representable integer to the wrong value.

I'm not sure if the SYS_clone (fork/pthread_create) side is quite as easy - it might be 5 or 10 lines instead of 1. But I think that's well within the range of what could be done in 3 years...

@benhillis

This comment has been minimized.

Copy link
Member

benhillis commented Mar 13, 2019

@richfelker - We are a small team trying to catch up with hundreds of Linux kernel developers that have a 20 year head start. @therealkenc is correct that this change (I hesitate to call it a fix, because the issue is usermode libraries using assumed values) has greater implications than you assume for the underlying Windows OS around saving / restoring the control register so Windows apps are not broken.

I know vinegar is more available on the internet, but flys prefer honey.

@richfelker

This comment has been minimized.

Copy link

richfelker commented Mar 13, 2019

It absolutely is a fix, not "a change", because the ABI is that the x87 control word is extended precision at entry. The usermode apps/libraries in question are Linux apps/libraries, not Windows ones, which correctly assume the specified ABI, not the mismatched ABI which WSL has been providing. I don't see any plausible argument that code written for and tested on systems with the right ABI somehow expects the wrong ABI that WSL has been providing.

I understand that you're a small team and that shouting at FOSS maintainers about their bugs is not nice (being one myself and all), but this is not a community FOSS project, it's a major Windows feature Microsoft is promoting as giving you the ability to (safely, one would hope!) run Linux apps/distros/containers on Windows, and it doesn't do that, because of a bug that could be fixed with a trivial patch. I've been told multiple times that people were working on this and that we could expect a fix soon, and it's been neglected again and again.

@benhillis

This comment has been minimized.

Copy link
Member

benhillis commented Mar 13, 2019

Your assumption the fix is trivial is incorrect.

@richfelker

This comment has been minimized.

Copy link

richfelker commented Mar 13, 2019

For the initial-entry case (ignoring clone for now), can you clarify why just setting the x87 control word right before jumping to the ELF entry point address does not suffice?

@luoqi-git

This comment has been minimized.

Copy link

luoqi-git commented Mar 13, 2019

@benhillis As a kernel developer myself, I'd put the responsibility squarely on the kernel side of the court. This is much more serious than "usermode libraries using assumed values", the most damning part is the non-compliance of fork() and pthread_create() with respect to inheritance of certain execution context, maybe underestimating the seriousness of this bug is why this issue got less attention it deserves?

@richfelker The fix however could be more involved than you might have imagined. It depends on the internal interface between the NT microkernel and the POSIX subsystem, and also the interaction with the Windows subsystem. It could be as simple as POSIX reaching down and modifying the NT thread context, or it may have to make a kernel IPC call instead (if NT is anything like Mach) -- for all I know, the POSIX subsystem may not even be running in ring 0.

@benhillis

This comment has been minimized.

Copy link
Member

benhillis commented Mar 13, 2019

@luoqi-git - Correct, this requires some fairly substantial changes to the Windows kernel itself.

@richfelker

This comment has been minimized.

Copy link

richfelker commented Mar 13, 2019

Admittedly there's code I can't see, but conceptually I don't see how this could require any invasive changes. Assuming Windows already lets processes modify the x87 control word and preserves it across task switch (which afaik it does; I've seen this before in awful graphics lib stuff that broke things by setting it to single precision!), just setting it right before jumping to the entry point should solve the problem at entry. This is purely a userspace operation. The fork/clone part is harder for me to make predictions about without being able to see source.

@luoqi-git

This comment has been minimized.

Copy link

luoqi-git commented Mar 13, 2019

I say it is time for Microsoft to open up the NT source code, at least the microkernel part, just like what Apple has done (and Sun Microsystems). There will be more benefits to Microsoft (a share holder I am:-) than any potential downside, for one, I'd be very willing to help out on a fix to this problem.

Now it even owns the proper forum for the code release...

@luoqi-git

This comment has been minimized.

Copy link

luoqi-git commented Mar 13, 2019

@richfelker Let me give you an example why this is more complicated than you think. Saving and restoring FPU/SSE context is quite expensive, much more than that of the integer register file, that's why all OSes that I know of would perform the save/restore at context switch only if a process has actually used FPU. Always modifying the FPU control register from userspace would completely nullify the optimization, that's something no decent OS developer would ever do (but a desperate application developer might).

@richfelker

This comment has been minimized.

Copy link

richfelker commented Mar 13, 2019

I'm quite aware of that, but the "no decent OS developer" comment is rather exaggerated. These days many operating systems do not make this optimization for various reasons; it's nowhere near as valuable as it was a couple decades ago. In any case you could make the fix that way for now, and optimize later if it turns out the performance hit is a problem, rather than leaving it broken for the sake of performance.

FWIW the issue you cited is exactly why we're not going to workaround the WSL bug in musl; doing so would make all binaries generated with the workaround unconditionally poke the fpu when they shouldn't have to, and the cost would not go away when WSL gets fixed because it would be linked into the binaries rather than in OS-level code that will go away once it's fixed right.

@luoqi-git

This comment has been minimized.

Copy link

luoqi-git commented Mar 13, 2019

@richfelker Brother, I feel your pain. I am here not because I'm a kernel developer, but because an application my team developed was hit by this bug. I would very much like to see this problem fixed and am offering any help I can render. At this point I may be desperate enough to insert an finit instruction at the beginning of any thread our app creates.

@therealkenc

This comment has been minimized.

Copy link
Collaborator

therealkenc commented Mar 13, 2019

FWIW the issue you cited is exactly why we're not going to workaround the WSL bug in musl; doing so would make all binaries generated with the workaround unconditionally poke the fpu when they shouldn't have to

I suspect, but can't prove, a variation of this argument is being applied by WSL devs too. It could be fixed hacked worked-around in the WSL driver, but a "proper" fix is an upstream problem. So here were are.

Which is good. Everyone is in agreement that without a perfect fix all hope is lost, and hopefully this thread can go quiet again.

@richfelker

This comment has been minimized.

Copy link

richfelker commented Mar 13, 2019

@therealkenc, I don't see them as equivalent arguments at all. One is about a suboptimal fix at the level of a component that ships with OS updates and never gets linked into applications, so that, if it's costly, it can go away in a future OS update when a better fix is made. The other is about hard-coding a workaround into application binaries in a way that affects running them on systems not affected by the bug, including real Linux systems and the expected post-fix WSL systems at some point in the future.

@Po-wei

This comment has been minimized.

Copy link

Po-wei commented Mar 24, 2019

Although you want same output between native Linux and WSL, I think the double / long double do not have explicit size in C++ standard so it is just like a Linux distro run a difference hardware and therefore has different result.
Maybe you need a quick workaround and not a substantial change in kernel.
WSL has compatibility is better though.

@njsmith

This comment has been minimized.

Copy link

njsmith commented Mar 24, 2019

@Po-wei It's true that the C/C++ language standards don't specify the exact precision of long double. They leave that up to the platform ABI. And the standard ABI for Linux x86 and x86-64 says that long double has 80 bit precision, and that the x87 FPU control word has to be initialized the right way.

Other platforms are free to implement C/C++ with different ABIs, but WSL is trying to implement the Linux x86/x86-64 ABI, so that's the relevant standard. This is definitely a bug in WSL.

Note that several of the people posting here are well-known experts who are involved in making C/C++ work on Linux in the first place. They know what they're talking about :-)

@Po-wei

This comment has been minimized.

Copy link

Po-wei commented Mar 24, 2019

@njsmith Thanks for your explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.