New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full UTF-8 support to RPC #7892

Merged
merged 5 commits into from Jun 16, 2016

Conversation

Projects
None yet
6 participants
@laanwj
Member

laanwj commented Apr 16, 2016

(the univalue patch should be taken upstream - the RPC test and doc change should end up here. Posting it here for visibility and because this fixes an ancient issue: #2127. I think this is important to many non-English users)

This adds full UTF-8 support for both on input and output through JSON.

bitcoin-cli usage

Before:

$ src/bitcoin-cli -datadir=/store/tmp/testbtc getnewaddress "рыба"
1HQCE3H87fZ1er1ExLiF5V4a1Kxf46r3J2
$ src/bitcoin-cli -datadir=/store/tmp/testbtc listaccounts
{
...  "\u00c3\u0083\u00c2\u0091\u00c3\u0082\u00c2\u0080\u00c3\u0083\u00c2\u0091\u00c3\u0082\u00c2\u008b\u00c3\u0083\u00c2\u0090\u00c3\u0082\u00c2\u00b1\u00c3\u0083\u00c2\u0090\u00c3\u0082\u00c2\u00b0": 0.00000000
}
$ src/bitcoin-cli -datadir=/store/tmp/testbtc getaccount 1HQCE3H87fZ1er1ExLiF5V4a1Kxf46r3J2
�����±�°

After:

$ src/bitcoin-cli -regtest getaccountaddress "рыба"
mrVZ8GaURJmL3LEUu6ygU2CrUrZ8979iK2
$ src/bitcoin-cli -regtest listaccounts
{
...
  "рыба": 0.00000000
}
$ src/bitcoin-cli -regtest getaccount mrVZ8GaURJmL3LEUu6ygU2CrUrZ8979iK2                                       
рыба

GUI

This also affects the GUI debug console.

Before:

i18n_before

When strings are passed directly (such as for getaccount's return argument) it works fine, but when they go through JSON formatting/parsing, it fails.

After:

i18n_after

RPC test

A test for this new functionality has been added to the wallet.py test.

See the commit message of the first commit for details on what exactly had to be changed in univalue.

@laanwj laanwj added the RPC/REST/ZMQ label Apr 16, 2016

@jonasschnelli

This comment has been minimized.

Show comment
Hide comment
@jonasschnelli

jonasschnelli Apr 16, 2016

Member

Nice!
Concept ACK.

Not sure how we should treat the UniValue changes. Should we try to keep the "upstream" repository bitcoin/univalue in sync?

Member

jonasschnelli commented Apr 16, 2016

Nice!
Concept ACK.

Not sure how we should treat the UniValue changes. Should we try to keep the "upstream" repository bitcoin/univalue in sync?

@MarcoFalke

This comment has been minimized.

Show comment
Hide comment
@MarcoFalke

MarcoFalke Apr 16, 2016

Member

Concept ACK. Tests look good.

"upstream" repository bitcoin/univalue in sync

The patch goes to jgarzik/univalue as bitcoin/univalue is only for bitcoin related patches. (Still in the end we need to merge jgarzik/univalue into bitcoin/univalue because we use this as subtree.)

Member

MarcoFalke commented Apr 16, 2016

Concept ACK. Tests look good.

"upstream" repository bitcoin/univalue in sync

The patch goes to jgarzik/univalue as bitcoin/univalue is only for bitcoin related patches. (Still in the end we need to merge jgarzik/univalue into bitcoin/univalue because we use this as subtree.)

@laanwj laanwj referenced this pull request Apr 18, 2016

Merged

Handle UTF-8 #22

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell May 17, 2016

Member

Neat. What happens with characters like unicode direction modifiers? Can you end up with your terminal in a weird state?

Member

gmaxwell commented May 17, 2016

Neat. What happens with characters like unicode direction modifiers? Can you end up with your terminal in a weird state?

@MarcoFalke

View changes

Show outdated Hide outdated qa/rpc-tests/wallet.py
@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj May 18, 2016

Member

Neat. What happens with characters like unicode direction modifiers? Can you end up with your terminal in a weird state?

I don't think that's possible with just unicode?

ANSI escape sequences are a whole different story. In JSON objects they'll be escaped, but when printing strings (say, for help, when there is a single string result) everything is let through. Not sure whether this should be changed, but this would be a change local to bitcoin-cli, not the unicode framework. I'd suggest doing so in a separate pull.

Nothing new here though: It's always been possible to pass any kind of crap characters from the server (just look at my OP) to bitcoin-cli, this just makes the round-trip sane and conform to UTF-8.

Member

laanwj commented May 18, 2016

Neat. What happens with characters like unicode direction modifiers? Can you end up with your terminal in a weird state?

I don't think that's possible with just unicode?

ANSI escape sequences are a whole different story. In JSON objects they'll be escaped, but when printing strings (say, for help, when there is a single string result) everything is let through. Not sure whether this should be changed, but this would be a change local to bitcoin-cli, not the unicode framework. I'd suggest doing so in a separate pull.

Nothing new here though: It's always been possible to pass any kind of crap characters from the server (just look at my OP) to bitcoin-cli, this just makes the round-trip sane and conform to UTF-8.

@dcousens

This comment has been minimized.

Show comment
Hide comment
@dcousens

dcousens May 19, 2016

Contributor

concept ACK, once-over utACK (did not cross-verify UTF-8 impl.).

Could probably use more tests for that JSONUTF8Writer...

Contributor

dcousens commented May 19, 2016

concept ACK, once-over utACK (did not cross-verify UTF-8 impl.).

Could probably use more tests for that JSONUTF8Writer...

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj May 20, 2016

Member

Could probably use more tests for that JSONUTF8Writer...
ing

There are few direct tests for it, but all the other tests that parses JSON with strings in it is testing it.
What would be nice is doing some fuzzing using https://github.com/laanwj/univalue/tree/2015_11_unifuzz and see that everything is accepted is valid utf-8.

Member

laanwj commented May 20, 2016

Could probably use more tests for that JSONUTF8Writer...
ing

There are few direct tests for it, but all the other tests that parses JSON with strings in it is testing it.
What would be nice is doing some fuzzing using https://github.com/laanwj/univalue/tree/2015_11_unifuzz and see that everything is accepted is valid utf-8.

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell May 25, 2016

Member

@pstratem Any interest in fuzzing this?

Member

gmaxwell commented May 25, 2016

@pstratem Any interest in fuzzing this?

@sipa

This comment has been minimized.

Show comment
Hide comment
@sipa

sipa May 25, 2016

Member

Needs rebase.

Member

sipa commented May 25, 2016

Needs rebase.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 9, 2016

Member

Rebased, done and removed the Python3 TODOs

Member

laanwj commented Jun 9, 2016

Rebased, done and removed the Python3 TODOs

laanwj added some commits Jun 10, 2016

Squashed 'src/univalue/' changes from 2740c4f..f32df99
f32df99 Merge branch '2016_04_unicode' into bitcoin
280b191 Merge remote-tracking branch 'jgarzik/master' into bitcoin
c9a716c Handle UTF-8
bed8dd9 Version 1.0.2.
5e7985a Merge pull request #14 from laanwj/2015_11_escape_plan

git-subtree-dir: src/univalue
git-subtree-split: f32df99
test: add ensure_ascii setting to AuthServiceProxy
Add a setting ensure_ascii to AuthServiceProxy. This setting,
defaulting to True (backwards compatible),
is passed through to json.dumps. If set to False, non-ASCII characters
>0x80 are not escaped. This is useful for testing server
input processing, as well as slightly more bandwidth friendly in case of
heavy unicode usage.
@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 10, 2016

Member

Ok:

This should be ready now.

Member

laanwj commented Jun 10, 2016

Ok:

This should be ready now.

@gmaxwell

This comment has been minimized.

Show comment
Hide comment
@gmaxwell

gmaxwell Jun 15, 2016

Member

ACK

Member

gmaxwell commented Jun 15, 2016

ACK

@laanwj laanwj merged commit 7982fce into bitcoin:master Jun 16, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Jun 16, 2016

Merge #7892: Add full UTF-8 support to RPC
7982fce doc: Mention full UTF-8 support in release notes (Wladimir J. van der Laan)
6bbb4ef test: test utf-8 for labels in wallet (Wladimir J. van der Laan)
a406fcb test: add ensure_ascii setting to AuthServiceProxy (Wladimir J. van der Laan)
60ab9b2 Squashed 'src/univalue/' changes from 2740c4f..f32df99 (Wladimir J. van der Laan)
@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 16, 2016

Member

It looks like the build system doesn't detect changes to univalue source files and will not automatically rebuild the library: if you get RPC test failures concerning unicode, please build from a clean tree

Member

laanwj commented Jun 16, 2016

It looks like the build system doesn't detect changes to univalue source files and will not automatically rebuild the library: if you get RPC test failures concerning unicode, please build from a clean tree

@MarcoFalke

This comment has been minimized.

Show comment
Hide comment
@MarcoFalke

MarcoFalke Jun 16, 2016

Member
make clean
make distclean

should take care of this, I assume.

Member

MarcoFalke commented Jun 16, 2016

make clean
make distclean

should take care of this, I assume.

@laanwj

This comment has been minimized.

Show comment
Hide comment
@laanwj

laanwj Jun 16, 2016

Member

Yes, make clean should be enough.

Member

laanwj commented Jun 16, 2016

Yes, make clean should be enough.

kyuupichan referenced this pull request in kyuupichan/BitcoinUnlimited Apr 30, 2017

Merge #7892: Add full UTF-8 support to RPC
7982fce doc: Mention full UTF-8 support in release notes (Wladimir J. van der Laan)
6bbb4ef test: test utf-8 for labels in wallet (Wladimir J. van der Laan)
a406fcb test: add ensure_ascii setting to AuthServiceProxy (Wladimir J. van der Laan)
60ab9b2 Squashed 'src/univalue/' changes from 2740c4f..f32df99 (Wladimir J. van der Laan)

codablock added a commit to codablock/dash that referenced this pull request Sep 16, 2017

Bump univalue version
This was probably missed while backporting Bitcoin PR #7892

codablock added a commit to codablock/dash that referenced this pull request Sep 19, 2017

Bump univalue version
This was probably missed while backporting Bitcoin PR #7892

codablock added a commit to codablock/dash that referenced this pull request Dec 27, 2017

Bump univalue version
This was probably missed while backporting Bitcoin PR #7892

codablock added a commit to codablock/dash that referenced this pull request Dec 28, 2017

Bump univalue version
This was probably missed while backporting Bitcoin PR #7892
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment