Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix code writing to the database to work in any locale as well. #241

Closed
wants to merge 4 commits into from

Conversation

vadz
Copy link
Member

@vadz vadz commented Mar 30, 2014

This is similar to #238 but completes it by

  1. Doing the same thing in the other direction, i.e. when writing doubles to the database and not reading them.
  2. Doing the same changes in mysql and sqlite3 backends in addition to the postgresql one.

@pfedor
Copy link
Member

pfedor commented Mar 30, 2014

This thread provides some context for why we haven't made such change in
the past:
http://sourceforge.net/p/soci/mailman/soci-devel/thread/220179a90805262143g4e0be28ew2e319a4bf09477b5%40mail.gmail.com/

The current code is predictably and reliably broken when locale is set to
something which uses ',' for a decimal separator. After this fix, the code
will do the right thing most of the time, but may randomly break
multi-threaded applications on platforms where imbue() changes the global
locale (which it shouldn't but it used to be very common behavior).

For the right though inelegant way of addressing this problem look at the
function NoLocaleStrtod() in file
https://code.google.com/p/protobuf/source/browse/trunk/src/google/protobuf/stubs/strutil.cc

Thanks,

Aleksander

On Sun, Mar 30, 2014 at 1:08 PM, VZ notifications@github.com wrote:

This is similar to #238 #238 but
completes it by

  1. Doing the same thing in the other direction, i.e. when writing
    doubles to the database and not reading them.
  2. Doing the same changes in mysql and sqlite3 backends in addition to
    the postgresql one.

You can merge this Pull Request by running

git pull https://github.com/vadz/soci dtostr

Or view, comment on, or merge it at:

#241
Commit Summary

  • Use locale-independent function for converting doubles to strings.
  • Make sqlite3 and mysql backends work with any locale too.
  • Use the environment locale in the tests.

File Changes

Patch Links:

Reply to this email directly or view it on GitHubhttps://github.com//pull/241
.

@vadz
Copy link
Member Author

vadz commented Mar 30, 2014

Do you have any example of platforms where imbue() changes the global locale? I've absolutely never heard about this and a quick web search failed to find any mentions of it as well.

The Google hack is horrible and is simply due to the well-known aversion of that company to using C++ as anything other than C-with-classes-and-some-other-minor-additions. We could still use it, at the very least the existence of cstring_to_double() and double_to_cstring() makes it easy to replace their implementation at any time, but I'd like to have at least some proof that it's really needed.

@pfedor
Copy link
Member

pfedor commented Mar 31, 2014

On Sun, Mar 30, 2014 at 2:11 PM, VZ notifications@github.com wrote:

Do you have any example of platforms where imbue() changes the global
locale?

Not imbue() itself, to be precise, but the following call to
ostream::operator<<.

I've absolutely never heard about this and a quick web search failed to
find any mentions of it as well.

Fair question. For example on my OSX box the file c++locale.h appears a
number of times (XCode, Android NDK, /opt/local/include/gcc47/...) and in
every single one of them __convert_from_v() uses setlocale() before calling
vsnprintf() to format the number. __convert_from_v() is called from
num_put::_M_insert_float() which is called from num_put::do_put() which is
called from num_put::put(), which is used by ostream::operator<< for number
formatting. The files:

http://www.opensource.apple.com/source/libstdcxx/libstdcxx-60/include/c++/4.2.1/bits/c++locale.h
http://www.opensource.apple.com/source/libstdcxx/libstdcxx-60/include/c++/4.2.1/bits/locale_facets.tcc
http://www.opensource.apple.com/source/libstdcxx/libstdcxx-60/include/c++/4.2.1/bits/locale_facets.h

I don't have much experience reading c++ standard library source, so it's
possible that I'm completely misreading the code above, and if I am then
please correct me. But if I'm not, this looks pretty bad, doesn't it?

Thanks,

Aleksander

@vadz
Copy link
Member Author

vadz commented Mar 31, 2014

On Sun, 30 Mar 2014 20:49:54 -0700 Pawel Aleksander Fedorynski notifications@github.com wrote:

PAF> On Sun, Mar 30, 2014 at 2:11 PM, VZ notifications@github.com wrote:
PAF>
PAF> > Do you have any example of platforms where imbue() changes the global
PAF> > locale?
PAF> >
PAF> Not imbue() itself, to be precise, but the following call to
PAF> ostream::operator<<.
PAF>
PAF> > I've absolutely never heard about this and a quick web search failed to
PAF> > find any mentions of it as well.
PAF> >
PAF> Fair question. For example on my OSX box the file c++locale.h appears a
PAF> number of times (XCode, Android NDK, /opt/local/include/gcc47/...) and in
PAF> every single one of them __convert_from_v() uses setlocale() before calling
PAF> vsnprintf() to format the number.

Sorry, I stand corrected, libstdc++ under OS X does indeed have this
horrible bug :-( I guess the lesson is to never think that something can't
be done just because it's horribly wrong (as horribly as changing a global
variable to affect a local function call), sigh...

PAF> I don't have much experience reading c++ standard library source, so it's
PAF> possible that I'm completely misreading the code above, and if I am then
PAF> please correct me. But if I'm not, this looks pretty bad, doesn't it?

Yes, and here is a quick and dirty test https://gist.github.com/vadz/9890860
to prove that it is indeed buggy:

% g++ -v
Using built-in specs.
Target: i686-apple-darwin11
Configured with: /private/var/tmp/llvmgcc42/llvmgcc42-2336.11~67/src/configure --disable-checking --enable-werror --prefix=/Applications/Xcode.app/Contents/Developer/usr/llvm-gcc-4.2 --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-prefix=llvm- --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin11 --enable-llvm=/private/var/tmp/llvmgcc42/llvmgcc42-2336.11~67/dst-llvmCore/Developer/usr/local --program-prefix=i686-apple-darwin11- --host=x86_64-apple-darwin11 --target=i686-apple-darwin11 --with-gxx-include-dir=/usr/include/c++/4.2.1
Thread model: posix
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)
% g++ -Wall thrloc.cpp && ./a.out
Assertion failed: (check_printf()), function thread2, file thrloc.cpp, line 40.
[1]    67387 abort      ./a.out

It's somewhat reassuring that at least it is fixed in libc++:

% clang++ -v
Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin12.5.0
% clang++ -stdlib=libc++ -Wall thrloc.cpp && ./a.out
[...nothing...]

but as long as we support using libstdc++ under OS X, we need to deal with
this broken case :-( I'll update the code to use the comma-replacing hack.

Thanks,
VZ

@vadz
Copy link
Member Author

vadz commented Mar 31, 2014

New version using explicit decimal comma hack.

@pfedor
Copy link
Member

pfedor commented Mar 31, 2014

OK good to see that the test confirms it.

I think github lost the email I typed last night so sorry for duplication if it reappears:

On a closer look, things seem even worse: __convert_from_v() calls setlocale() not to switch to whatever was imbued, but to always switch to "C". [And the decimal point '.' is replaced with ',' (if required) later in _M_insert_float().] So the problem is not with calling imbue(): it seems that any calls that pass a floating point number to ostream::operator<< are dangerous in multi-threaded programs when the global locale is not "C"!? And this is not just OSX with libstdc++, the code looks similar in Android NDK, OpenBSD, FreeBSD. Could be that on at least some of these platforms this is obsolete code (like libstdc++ is being obsoleted by libc++ on OSX). On Linux this has been fixed long ago by replacing setlocale() with uselocale() in __convert_from_v().

So we are probably not in a happy place because SOCI is using ostream::operator<< probably quite a lot. But in such cases it's good to fix things one place at a time and from that perspective your change is an excellent step.

As for the code itself, I looked at it and it looks good. This kind of code tends to be very tricky so it would be good to get a second pair of eyes to review cstring_to_double() and double_to_cstring(). My only complaint is that it's probably better to avoid strdup() which is a non-standard extension. Also, it would be good to have some tests for these functions.

Alternatively, if licenses permit, we could just copy the protobuf code I linked, which is probably thoroughly reviewed and tested.

Thanks,

Aleksander

@vadz
Copy link
Member Author

vadz commented Apr 1, 2014

The existing mysql test proved to be sufficient for them, it already tests all possible cases (and did discover a bug, with not failing to parse the numbers using decimal comma in my first version).

strdup(): good point, thanks, I'll change it, it's true that some platforms have troubles with it.

Reusing protobuf code: it has dependencies on other Google stuff, it's probably not going to be that easy to extract. And we don't need full strtod() (i.e. with the output end parameter), so our version can be simpler.

vadz added 4 commits April 1, 2014 15:07
Some implementations of C++ streams (notably libstdc++ under OS X) are buggy
and change the global locale to implement support for imbue(), which is not
thread-safe, and so could introduce subtle bugs in multi-thread programs.

Replace the old correct but not working in practice code with a new ugly
version only supporting decimal comma and period which is not totally correct
in theory but should work fine in practice (and shouldn't make things worse
even for the hypothetical locales using some other decimal separator anyhow).
Similarly to the issue SOCI#238 but in the other direction, we must use
locale-independent functions to ensure that we send the strings in correct
format to the database even if the current locale doesn't use period as
decimal separator.

This fixes errors like

	invalid input syntax for type double precision: "3,1415926500000002086"

when executing the tests in e.g. French locale.
Propagate the changes done to postgresql backend in sqlite3 and mysql
backends: use locale-independent functions for converting between the
(floating point) numbers and strings as the database itself always expects,
and provides, the strings in "C" locale.
This makes it easier to check that SOCI works correctly when used in a program
that changes the global locale as it can now be tested just by running tests
after setting LC_ALL to some non-default value.
@vadz
Copy link
Member Author

vadz commented Apr 1, 2014

New version: just avoid strdup().

@mloskot
Copy link
Contributor

mloskot commented Apr 5, 2014

I completely forgot about this potential of troubles in multi-threaded environment, and the old thread too. Thanks @pfedor for digging it out.

@vadz suggested to apply this (initially #238) fix to SOCI 3.2.3 as well. I agreed it's a good idea. So, this PR #241 sound and ready to be backported to 3.2.3?

@mloskot mloskot added this to the 4.0.0 milestone Apr 5, 2014
@pfedor
Copy link
Member

pfedor commented Apr 5, 2014

I think it's a very good change and big thanks to VZ for working on it. A
couple points/questions

  1. Shouldn't we be checking errno for ERANGE after every call to strtod()?
  2. Seems to me cstring_to_double() does not consider it an error if there's
    leading whitespace. Not saying this is wrong, just pointing this out to
    make sure this is intended.
  3. Is cstring_to_double() being tested for cases when input contains the
    literal strings like "INFINITY" or "NAN"? Is it doing the right thing in
    such cases?
  4. I'm not sure it's such a good idea to use the systems locale for the
    tests. This makes the tests nondeterministic, and also, this means that the
    scenario with non-standard locale won't be tested unless someone explicitly
    runs the test on a system with different locale. It would be better to just
    explicitly set the locale to something that uses comma for just one test
    and then revert it to "C" (ie run the number parsing test twice, once in
    "C" and once in some other locale)

Mateusz: can you review cstring_to_double() and double_to_cstring() just to
make sure VZ and I haven't missed some corner case?

Thanks

Aleksander

On Sat, Apr 5, 2014 at 12:23 PM, Mateusz Łoskot notifications@github.comwrote:

I completely forgot about this potential of troubles in multi-threaded
environment, and the old thread too. Thanks @pfedorhttps://github.com/pfedorfor digging it out.

@vadz https://github.com/vadz suggested to apply this (initially #238#238)
fix to SOCI 3.2.3 as well. I agreed it's a good idea. So, this PR #241https://github.com/SOCI/soci/pull/241sound and ready to be backported to 3.2.3?


Reply to this email directly or view it on GitHubhttps://github.com//pull/241#issuecomment-39648290
.

@vadz
Copy link
Member Author

vadz commented Apr 6, 2014

On Sat, 05 Apr 2014 12:58:56 -0700 Pawel Aleksander Fedorynski notifications@github.com wrote:

PAF> I think it's a very good change and big thanks to VZ for working on it. A
PAF> couple points/questions
PAF>
PAF> 1. Shouldn't we be checking errno for ERANGE after every call to strtod()?

You're right, I think we should. The explanation for why I didn't do it is
actually that I wanted to mechanically make the existing code work with
locales using different character decimal point, without any other changes.
And the old code didn't check for ERANGE. But, again, I think we should
indeed check for it and throw an exception if it happens.

PAF> 2. Seems to me cstring_to_double() does not consider it an error if there's
PAF> leading whitespace. Not saying this is wrong, just pointing this out to
PAF> make sure this is intended.

Again, this just preserves the old code behaviour. I don't think it's
really useful to skip leading white space but neither can I think of any
situation in which it would be a problem.

PAF> 3. Is cstring_to_double() being tested for cases when input contains the
PAF> literal strings like "INFINITY" or "NAN"? Is it doing the right thing in
PAF> such cases?

This is a can of worms, too. Some implementations (notably MSVC CRT) don't
handle this properly and working around this is quite non-trivial. I don't
know if handling INF/NAN here is really required and, in any case, I didn't
change anything here.

PAF> 4. I'm not sure it's such a good idea to use the systems locale for the
PAF> tests. This makes the tests nondeterministic, and also, this means that the
PAF> scenario with non-standard locale won't be tested unless someone explicitly
PAF> runs the test on a system with different locale. It would be better to just
PAF> explicitly set the locale to something that uses comma for just one test
PAF> and then revert it to "C" (ie run the number parsing test twice, once in
PAF> "C" and once in some other locale)

Unfortunately there is no portable way to set locale in C. We could try
using "fr_FR" under Unix and "French" under Windows, probably, but this is
not ideal as French locale might not be available while some other locale
using decimal comma (e.g. German) could be. Also, having this setlocale()
call is really convenient for testing because it allows you to easily run
the tests under any locale by just changing LC_ALL environment variable.

Regards,
VZ

#define SOCI_PRIVATE_SOCI_DTOCSTR_H_INCLUDED

#include "soci/error.h"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add #include "soci/soci-platform.h" to provide snprintf

@mloskot
Copy link
Contributor

mloskot commented Apr 9, 2014

On 5 April 2014 21:58, Pawel Aleksander Fedorynski <notifications@github.com wrote:

  1. Is cstring_to_double() being tested for cases when input contains the
    literal strings like "INFINITY" or "NAN"? Is it doing the right thing in
    such cases?

Shall we add killer test for those two functions, something based on data
like in http://code.google.com/p/v8/source/browse/trunk/test/cctest/test-strtod.cc

  1. I'm not sure it's such a good idea to use the systems locale for the
    tests. This makes the tests nondeterministic, and also, this means that
    the
    scenario with non-standard locale won't be tested unless someone
    explicitly
    runs the test on a system with different locale. It would be better to
    just
    explicitly set the locale to something that uses comma for just one test
    and then revert it to "C" (ie run the number parsing test twice, once in
    "C" and once in some other locale)

I agree. I have been testing these two functions myself with setlocale and this idea seems working well.

Mateusz: can you review cstring_to_double() and double_to_cstring() just
to make sure VZ and I haven't missed some corner case?

Both functions look solid, I'd just add ERANGE check (see my inline code
comment). Minor issue, for Visual C++ users strcpy_s and snprintf_s could be
preferred, or compile warning disabled.

Thanks for great work!

@mloskot
Copy link
Contributor

mloskot commented Apr 9, 2014

BTW, a side note, I noticed perhaps related issue in MySQL utility parse_num: is it expected to succeed on parsing with conversion:

int a;
parse_num("1.0", a); 

The istringstream succeeds to consume the integral par, but because the failbit is set too, the function throws. Not sure what is expected behaviour.

@vadz
Copy link
Member Author

vadz commented Apr 9, 2014

I'd say it's exactly right, "1.0" is a floating point number and should fail to parse as an int.

@vadz
Copy link
Member Author

vadz commented Apr 21, 2014

I'd like to return to this: is it ok to merge in the current state, possibly with the improvements addressing the issues mentioned (e.g. ERANGE checking) coming later?

IMHO this change at the very least doesn't make things worse, so I'd like to commit it, even if it's not ideal yet. Any objections?

@pfedor
Copy link
Member

pfedor commented Apr 21, 2014

I would prefer for the ERANGE to be present, but I'm fine with submitting as is and adding it later. Mateusz makes the final call.

BTW here's the reason why the MySQL backend checks explicitly for infinity and NaN: http://sourceforge.net/p/soci/mailman/soci-users/thread/CAE7Ac0ADBFO-qv6dshnO5RDv1syKT-qmrgFyf6Z-Z1ffCh5hmQ%40mail.gmail.com/

Thanks,

Aleksander

@mloskot
Copy link
Contributor

mloskot commented Apr 22, 2014

No objections. I agree with @pfedor the ERANGE check would be good to have though.

Once merged, I will see if I can incorporate it into the 3.x line.

p.s. Please, don't rely on me as the final caller and let's keep things democratic ;)

@vadz
Copy link
Member Author

vadz commented Apr 22, 2014

I've finally pushed it as is because I am not sure I'm going to have time to work further on this (I have more important, IMHO, changes to SOCI in the pipeline as well) and the rest is really vastly less critical than the fact that SOCI plain out doesn't work right now in the locales using decimal comma.

For the record, ideally the following would need to be done:

  1. Move MySQL-specific test4() test using test_num() in the common code, there is really no reason to use it for MySQL only.
  2. Move is_infinity_or_nan() helper from src/backends/mysql/common.h to include/private/soci-cstrtod.h and use it there.
  3. Possibly add the check for ERANGE or maybe replace the above helper with this check, not really sure here.
  4. Ensure that the number tests do pass for all backends.

@vadz vadz closed this Apr 22, 2014
@vadz vadz deleted the dtostr branch April 22, 2014 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants