Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qt: Kill the locale dependency bug class by not allowing Qt to mess with LC_NUMERIC #18147

Conversation

practicalswift
Copy link
Contributor

Kill the locale dependency bug class by not allowing Qt to mess with LC_NUMERIC.

Context: #18124 (Recommended reading if you are unsure about how C and C++ locales work in C++)

Guaranteed locale settings before this PR:

Program C locale category LC_NUMERIC Global C++ locale (std::locale)
bitcoind C (classic) C (classic)
bitcoin-qt No guarantee - user-specified C (classic)

Guaranteed locale settings after this PR:

Program C locale category LC_NUMERIC Global C++ locale (std::locale)
bitcoind C (classic) C (classic)
bitcoin-qt C (classic) C (classic)

Background:

Perhaps surprisingly Qt runs setlocale(LC_ALL, "") on initialization. This installs the locale specified by the user's LC_ALL (or LC_*) environment variable as the new C locale.

In contrast bitcoind does not opt-in to localization -- no call to setlocale(LC_ALL, "") (or std::locale::global(std::locale(""))) is made and the environment variables LC_* are thus ignored.

This results in an unfortunate situation where the bitcoind is guaranteed to be running with the classic locale ("C") whereas the locale of bitcoin-qt will vary depending on the user's environment variables.

An example: Assuming the environment variable LC_ALL=de_DE then the call std::to_string(1.23) will return "1.230000" in bitcoind but "1,230000" in bitcoin-qt.

From the Qt documentation:

"On Unix/Linux Qt is configured to use the system locale settings by default. This can cause a conflict when using POSIX functions, for instance, when converting between data types such as floats and strings, since the notation may differ between locales. To get around this problem, call the POSIX function setlocale(LC_NUMERIC,"C") right after initializing QApplication, QGuiApplication or QCoreApplication to reset the locale that is used for number formatting to "C"-locale."


C and C++ locales 101:

#include <clocale>
#include <cstdlib>
#include <iostream>
#include <locale>
#include <sstream>
#include <string>

int main(void)
{
    std::cout << "C locale at beginning of main: " << std::string{setlocale(LC_ALL, nullptr)} << "\n";
    std::cout << "C++ locale at beginning of main: " << std::locale{}.name() << "\n\n";
#if OPT_IN_TO_LOCALIZATION_USING_SET_LOCALE
    setlocale(LC_ALL, "");
#endif
#if OPT_IN_TO_LOCALIZATION_USING_STD_LOCALE_GLOBAL
    std::locale::global(std::locale(""));
#endif
    std::ostringstream oss;
    oss << 1000000;
    std::cout << "std::ostringstream oss; oss << 1000000; oss.str() equals " << oss.str() << "\n";
    std::cout << "std::to_string(1.23) equals " << std::to_string(1.23) << "\n\n";
    std::cout << "C locale at end of main: " << std::string{setlocale(LC_ALL, nullptr)} << "\n";
    std::cout << "C++ locale at end of main: " << std::locale{}.name() << "\n\n";
    std::cout << "setlocale(LC_COLLATE,  nullptr) (read-only operation) = " << std::string{setlocale(LC_COLLATE, nullptr)} << "\n";
    std::cout << "setlocale(LC_CTYPE,    nullptr) (read-only operation) = " << std::string{setlocale(LC_CTYPE, nullptr)} << "\n";
    std::cout << "setlocale(LC_MESSAGES, nullptr) (read-only operation) = " << std::string{setlocale(LC_MESSAGES, nullptr)} << "\n";
    std::cout << "setlocale(LC_MONETARY, nullptr) (read-only operation) = " << std::string{setlocale(LC_MONETARY, nullptr)} << "\n";
    std::cout << "setlocale(LC_NUMERIC,  nullptr) (read-only operation) = " << std::string{setlocale(LC_NUMERIC, nullptr)} << "\n";
    std::cout << "setlocale(LC_TIME,     nullptr) (read-only operation) = " << std::string{setlocale(LC_TIME, nullptr)} << "\n";
}

Let's try with LC_ALL=de_DE ./l8n without any opt-in to locale dependence:

$ clang++ -o l8n l8n.cpp
$ LC_ALL=de_DE ./l8n
C locale at beginning of main: C
C++ locale at beginning of main: C

std::ostringstream oss; oss << 1000000; oss.str() equals 1000000
std::to_string(1.23) equals 1.230000

C locale at end of main: C
C++ locale at end of main: C

setlocale(LC_COLLATE,  nullptr) (read-only operation) = C
setlocale(LC_CTYPE,    nullptr) (read-only operation) = C
setlocale(LC_MESSAGES, nullptr) (read-only operation) = C
setlocale(LC_MONETARY, nullptr) (read-only operation) = C
setlocale(LC_NUMERIC,  nullptr) (read-only operation) = C
setlocale(LC_TIME,     nullptr) (read-only operation) = C

Let's try with LC_NUMERIC=de_DE ./l8n without any opt-in to locale dependence:

$ clang++ -o l8n l8n.cpp
$ LC_NUMERIC=de_DE ./l8n
C locale at beginning of main: C
C++ locale at beginning of main: C

std::ostringstream oss; oss << 1000000; oss.str() equals 1000000
std::to_string(1.23) equals 1.230000

C locale at end of main: C
C++ locale at end of main: C

setlocale(LC_COLLATE,  nullptr) (read-only operation) = C
setlocale(LC_CTYPE,    nullptr) (read-only operation) = C
setlocale(LC_MESSAGES, nullptr) (read-only operation) = C
setlocale(LC_MONETARY, nullptr) (read-only operation) = C
setlocale(LC_NUMERIC,  nullptr) (read-only operation) = C
setlocale(LC_TIME,     nullptr) (read-only operation) = C

Let's try LC_ALL=de_DE ./l8n with opt-in to locale dependence using set_locale:

$ clang++ -DOPT_IN_TO_LOCALIZATION_USING_SET_LOCALE -o l8n l8n.cpp
$ LC_ALL=de_DE ./l8n
C locale at beginning of main: C
C++ locale at beginning of main: C

std::ostringstream oss; oss << 1000000; oss.str() equals 1000000
std::to_string(1.23) equals 1,230000

C locale at end of main: de_DE
C++ locale at end of main: C

setlocale(LC_COLLATE,  nullptr) (read-only operation) = de_DE
setlocale(LC_CTYPE,    nullptr) (read-only operation) = de_DE
setlocale(LC_MESSAGES, nullptr) (read-only operation) = de_DE
setlocale(LC_MONETARY, nullptr) (read-only operation) = de_DE
setlocale(LC_NUMERIC,  nullptr) (read-only operation) = de_DE
setlocale(LC_TIME,     nullptr) (read-only operation) = de_DE

Let's try LC_ALL=de_DE ./l8n with opt-in to locale dependence using std::locale::global:

$ clang++ -DOPT_IN_TO_LOCALIZATION_USING_STD_LOCALE_GLOBAL -o l8n l8n.cpp
$ LC_ALL=de_DE ./l8n
C locale at beginning of main: C
C++ locale at beginning of main: C

std::ostringstream oss; oss << 1000000; oss.str() equals 1.000.000
std::to_string(1.23) equals 1,230000

C locale at end of main: de_DE
C++ locale at end of main: de_DE

setlocale(LC_COLLATE,  nullptr) (read-only operation) = de_DE
setlocale(LC_CTYPE,    nullptr) (read-only operation) = de_DE
setlocale(LC_MESSAGES, nullptr) (read-only operation) = de_DE
setlocale(LC_MONETARY, nullptr) (read-only operation) = de_DE
setlocale(LC_NUMERIC,  nullptr) (read-only operation) = de_DE
setlocale(LC_TIME,     nullptr) (read-only operation) = de_DE

@luke-jr
Copy link
Member

luke-jr commented Feb 14, 2020

Concept NACK.

Locale-enabled functions should be using the user's locale. If we don't want that, we should use locale-independent equivalents.

@practicalswift
Copy link
Contributor Author

practicalswift commented Feb 15, 2020

@luke-jr

I'm unable to parse your message.

In bitcoin-qt we have no less than three(!) locales to consider:

  • The C locale (std::string{setlocale(LC_ALL, nullptr)})
  • The global C++ locale (std::locale{}.name())
  • The Qt locale (QLocale::system().name().toStdString())

Which of these are you suggesting should be set according to what the user has specified in his/hers LC_* environment variables (which I assume is what you mean by "should be using the user's locale")?

@luke-jr
Copy link
Member

luke-jr commented Feb 15, 2020

@practicalswift The user-specified locale should affect all locale code, so all 3 should be set correctly...

@practicalswift
Copy link
Contributor Author

@luke-jr Are you aware that it does not work like that today?

@practicalswift
Copy link
Contributor Author

@luke-jr Are you aware that we never set the global C++ locale to the user-specified locale in any of the programs we produce?

@practicalswift
Copy link
Contributor Author

@luke-jr Friendly ping. Could you clarify? :) I'm trying to make sense of the feedback.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 9, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@laanwj
Copy link
Member

laanwj commented Mar 26, 2020

Locale-enabled functions should be using the user's locale. If we don't want that, we should use locale-independent equivalents.

I agree with Luke-Jr for a change.

We should move to locale-independent functions for locale-independent formatting and functionality (RPC and command line handling and logging and such), not add more and more layers of these kind of increasingly obscure global workarounds to coerce the functions into hopefully being locale-independent.

@practicalswift
Copy link
Contributor Author

@laanwj

I agree that we should use locale-independent functions, but I think you're misunderstanding what I am suggesting.

I think that the implicit setlocale(LC_NUMERIC, "") call we're doing in the current code via Qt is purely unintentional.

AFAICT we rely on QLocale everywhere we want locale dependence, so there is no reason to do setlocale(LC_NUMERIC, "").

From what I can tell this implicit setlocale(LC_NUMERIC, "") via Qt has given us years of locale problems without anyone really noticing that this call is the root cause.

Are you following my reasoning?

If not, can you think of a counter-example where we want the setlocale(LC_NUMERIC, "") call? I.e. where we are not relying on QLocale for the locale dependence we want.

My suggestion is thus that:

  • we should continue to remove calls to locale dependent functions (the whack-a-mole game), and
  • we should not opt-in to locale dependence we don't want (such as setlocale(LC_NUMERIC, "")).

I think we should do both - not either-or :)

What would the downsides be from such an approach? :)

@practicalswift practicalswift force-pushed the kill-locale-dependency-bug-class branch from fd04faa to 26be181 Compare March 26, 2020 21:28
@practicalswift
Copy link
Contributor Author

Rebased :)

@sipa
Copy link
Member

sipa commented Mar 26, 2020

@practicalswift My preference (and I suspect that's what @laanwj means as well) is that in the parts of the codebase that are supposed to be locale-independent, simply no locale-related operations are present (including locale-dependent formatting, or checks of, or setting of, any global locale). That would make those pieces of code maximally reusable.

@practicalswift
Copy link
Contributor Author

@sipa I understand that and I agree with that: I've worked extensively with getting rid of the use of locale-dependent functions in our code base :)

What I still don't understand is how we would be better off by opting in to locale dependence that we do not want.

I don't see how a.) getting rid of usage of locale independent functions and b.) not opting in to locale dependence we do not want would in any way be mutually exclusive :)

To make this more concrete:

Is there a single place in the code base where we want the locale dependency we're now opting in to by the implicit setlocale(LC_NUMERIC, "") call we are making accidentally and indirectly via Qt?

Are we intentionally opting in to locale dependency for the POSIX functions? To me it looks like this is purely by accident. Please correct me if I'm wrong :)

From the Qt documentation:

"On Unix/Linux Qt is configured to use the system locale settings by default. This can cause a conflict when using POSIX functions, for instance, when converting between data types such as floats and strings, since the notation may differ between locales. To get around this problem, call the POSIX function setlocale(LC_NUMERIC,"C") right after initializing QApplication, QGuiApplication or QCoreApplication to reset the locale that is used for number formatting to "C"-locale."

@practicalswift
Copy link
Contributor Author

practicalswift commented Mar 27, 2020

I guess this is like mitigations in general -- an example:

  • a. We want to avoid buffer overflows in our code. We do that by auditing code and fix it as we discover issues.
  • b. If we one day discovered that we were accidentally and globally opted in to a certain subset of buffer overflows via the logic in a third party dependency then we would like to correct that by restoring the default of not opting in to that.

a.) and b.) are clearly not mutually exclusive: we would do b.) immediately and over time do our best to achieve a.) as well.

I fail to see how this case is different :)

This case:

  • a. We want to avoid locale dependency bugs in our code. We do that by auditing code and fix it as we discover issues.
  • b. If we one day discovered that we were accidentally and globally opted in to a certain subset of locale dependency bugs via the logic in a third party dependency then we would like to correct that by restoring the default of not opting in to that.

Help me understand what I'm missing :)

@practicalswift
Copy link
Contributor Author

Closing this issue. Seems like I was unable to gather consensus support for permanently killing off the locale dependency bug class by fixing the root cause (unintended Qt interference with POSIX LC_NUMERIC during Qt initialisation) :)

I guess we'll have to live with locale issues also in the future, but now the root cause is documented at least and this PR can be referenced when we hit locale issues going forward :)

@practicalswift practicalswift deleted the kill-locale-dependency-bug-class branch April 10, 2021 19:40
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants