Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix] Fix encoding issue for Windows #13426

Closed
wants to merge 13 commits into
base: master
from

Conversation

Projects
None yet
10 participants
@ken2812221
Copy link
Member

ken2812221 commented Jun 10, 2018

Fix #13103
Fix #13754

From #13107 (comment), @laanwj suggested that we should keep our internal strings to be always utf-8 encoded.

This PR adds two C++17 compatible macros: u8string and u8path. The reason that I use macros is that developers might not want to pass second parameter utf8 everytime when they call string or create path objects.

I tried to find all string methods and convert them to u8string except for external api calls. Also tried to convert explicit or implicit path to u8path. Tell me if you find something that I missed.

Required: bitcoin-core/leveldb#18

See #13787 for travis run result

Caution: The user must change their config file to be utf-8 encoded if the file contains any non-ascii characters in it. Before 0.18, they are read as ansi-encoded characters on Windows.

@ken2812221

This comment has been minimized.

Copy link
Member Author

ken2812221 commented Jun 11, 2018

Can we use path.imbued.locale instead for this?

We still need to use ansi string when we call bdb and leveldb api, so I think that adding a new function is needed. Also, it needs to add boost::locale dependency in order to use boost locale generator.

@ken2812221 ken2812221 force-pushed the ken2812221:u8path_u8string branch 3 times, most recently Jun 11, 2018

@laanwj laanwj added the Bug label Jun 11, 2018

@ken2812221 ken2812221 force-pushed the ken2812221:u8path_u8string branch 4 times, most recently Jun 12, 2018

@ken2812221 ken2812221 changed the title [WIP, bugfix] Add u8path and u8string to boost to fix #13103 [bugfix] Add u8path and u8string to boost to fix #13103 Jun 12, 2018

@fanquake fanquake requested a review from theuni Jun 12, 2018

@ken2812221 ken2812221 force-pushed the ken2812221:u8path_u8string branch Jun 12, 2018

@theuni

This comment has been minimized.

Copy link
Member

theuni commented Jun 12, 2018

I'm not really comfortable patching boost to make this work. Firstly because it would mean that only depends-builds can build for Windows, but also because I'm just not familiar with it.

Maybe start by upstreaming it and see where the discussion goes?

@ken2812221

This comment has been minimized.

Copy link
Member Author

ken2812221 commented Jun 12, 2018

Firstly because it would mean that only depends-builds can build for Windows

Is there any way to build Bitcoin Core for Windows without depends-builds?
Also, I created a PR boostorg/filesystem#75

@achow101

This comment has been minimized.

Copy link
Member

achow101 commented Jun 12, 2018

Is there any way to build Bitcoin Core for Windows without depends-builds?

Supposedly there is a way to build with MSVC and Visual Studio which doesn't use the depends system.

@ken2812221 ken2812221 force-pushed the ken2812221:u8path_u8string branch 3 times, most recently Jun 13, 2018

@ken2812221

This comment has been minimized.

Copy link
Member Author

ken2812221 commented Jun 13, 2018

I found a way not to patch boost library, it should work with msvc now.

@ken2812221 ken2812221 force-pushed the ken2812221:u8path_u8string branch 3 times, most recently Jun 13, 2018

@ken2812221 ken2812221 changed the title [bugfix] Add u8path and u8string to boost to fix #13103 [bugfix] Add u8path and u8string to fix #13103 Jun 13, 2018

@ken2812221 ken2812221 force-pushed the ken2812221:u8path_u8string branch Jun 13, 2018

@DrahtBot

This comment has been minimized.

Copy link
Contributor

DrahtBot commented Jun 14, 2018

Note to reviewers: This pull request conflicts with the following ones:
  • #13878 (utils: Add fstream wrapper to allow to pass unicode filename on Windows by ken2812221)
  • #13877 (utils: Make fs::path::string() always return utf-8 string on Windows by ken2812221)
  • #13866 (utils: Use _wfopen and _wreopen on Windows by ken2812221)
  • #13864 (validation: Document where we are intentionally ignoring bool return values from validation related functions by practicalswift)
  • #13862 (utils: drop boost::interprocess::file_lock by ken2812221)
  • #13846 (Move src/tinyformat.h to src/tinyformat/tinyformat.h by Empact)
  • #13845 (Include tinyformat as a subtree by Empact)
  • #13815 (util: Add [[nodiscard]] to all {Decode,Parse}... functions returning bool by practicalswift)
  • #13761 (Docs: Create default bitcoin.conf file on startup by leishman)
  • #13756 (wallet: -avoidreuse feature for improved privacy by kallewoof)
  • #13751 (Utils and libraries: Drops the boost/algorithm/string/split.hpp dependency by 251Labs)
  • #13746 (-masterdatadir for datadir bootstrapping by kallewoof)
  • #13734 (gui: Drop boost::scoped_array and use wchar_t API explicitly on Windows by ken2812221)
  • #13723 (PSBT key path cleanups by sipa)
  • #13716 (bitcoin-cli: -stdinwalletpassphrase and non-echo stdin passwords by kallewoof)
  • #13671 (Remove the boost/algorithm/string/case_conv.hpp dependency by 251Labs)
  • #13667 (wallet: Fix backupwallet for multiwallets by domob1812)
  • #13639 ([refactor] Fix the chainparamsbase -> util circular dependency by Empact)
  • #13621 (Check for datadir after the config files were read by Flowdalic)
  • #13389 (Utils and libraries: Fix #13371 - move umask operation earlier in AppInit() by n2yen)
  • #13249 (Make objects in range declarations immutable by default. Avoid unnecessary copying of objects in range declarations. by practicalswift)
  • #13159 (Don't close old debug log file handle prematurely when trying to re-open (on SIGHUP) by practicalswift)
  • #13100 (gui: Add dynamic wallets support by promag)
  • #13088 (Log early messages with -printtoconsole by ajtowns)
  • #12783 (macOS: Disable AppNap by krab)
  • #11911 (Free CDBEnv instances when not in use by ryanofsky)
  • #11625 (WIP: Add BitcoinApplication & RPCConsole tests by ryanofsky)
  • #10973 (Refactor: separate wallet from node by ryanofsky)
  • #10102 ([experimental] Multiprocess bitcoin by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@@ -85,6 +85,8 @@ const int64_t nStartupTime = GetTime();
const char * const BITCOIN_CONF_FILENAME = "bitcoin.conf";
const char * const BITCOIN_PID_FILENAME = "bitcoind.pid";

fs::detail::utf8_codecvt_facet g_utf8;

This comment has been minimized.

@Empact

Empact Jun 15, 2018

Member

static?

This comment has been minimized.

@ken2812221

ken2812221 Jun 15, 2018

Author Member

g_utf8 should be accessible globally.

This comment has been minimized.

@Empact

Empact Jun 16, 2018

Member

Looks like the only use is in util.h, if you implement those as methods in util.cpp, declared in util.h, you can limit variable access to here util.cpp. I may be wrong though! Reviewing on a cellphone. :P

extern fs::detail::utf8_codecvt_facet g_utf8;

#define u8string() string(g_utf8)
#define u8path(str) path(str, g_utf8)

This comment has been minimized.

@Empact

Empact Jun 15, 2018

Member

Is it better to do these as methods in the namespace? Preprocessor use should be minimized IMO.

This comment has been minimized.

@ken2812221

ken2812221 Jun 15, 2018

Author Member

I just thought that pass the second parameter at every .string() and path() is pretty annoying and easy to forget. If most developers don't want to use macro function, I could do a scripted-diff.

This comment has been minimized.

@Empact

Empact Jun 16, 2018

Member

I mean define these macros as functions in the fs namespace that call through as defined here.

This comment has been minimized.

@ken2812221

ken2812221 Jun 16, 2018

Author Member

We could include u8path into fs namespace, however for u8string we couldn't since it's a member function.

This comment has been minimized.

@Empact

Empact Jun 16, 2018

Member

I'll defer to others as to whether this is worthwhile, but absent other options, I would make a function ala u8path for string as well, and hide g_utf8

@Empact

This comment has been minimized.

Copy link
Member

Empact commented Aug 3, 2018

If you call CommandLineToArgvW within gArgs.ParseParameters instead, you can avoid touching src/bitcoind.cpp and src/bench/bench_bitcoin.cpp, at least. There may be other opportunities for minimization.

@ken2812221

This comment has been minimized.

Copy link
Member Author

ken2812221 commented Aug 3, 2018

If you call CommandLineToArgvW within gArgs.ParseParameters

It would break the ParseParameters in tests if we do that.

@ken2812221

This comment has been minimized.

Copy link
Member Author

ken2812221 commented Aug 5, 2018

I'll seperate these into many PRs.

@ken2812221 ken2812221 closed this Aug 5, 2018

@ken2812221 ken2812221 deleted the ken2812221:u8path_u8string branch Aug 6, 2018

laanwj added a commit that referenced this pull request Aug 29, 2018

Merge #13862: utils: drop boost::interprocess::file_lock
1661a47 add unicode compatible file_lock for Windows (Chun Kuan Lee)

Pull request description:

  boost::interprocess::file_lock cannot open the files that contain characters which cannot be parsed by the user's code page on Windows.

  This PR is seperated from #13426 for easier review.

Tree-SHA512: e240479cda65958bf6e1319840b83928b2b50da81d99f4f002fb3b62621370bcd4bcfacd2b8c0678c443a650d6ba53d9d12618b591e5bfd67ac14388a18fd822
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.