Skip to content
This repository has been archived by the owner on Apr 3, 2019. It is now read-only.

Add NFKD normalization + Fix for Japanese #27

Merged
merged 5 commits into from Jul 13, 2015
Merged

Conversation

dabura667
Copy link
Contributor

This library was not using normalization at all, which for Spanish and Japanese would have produced invalid seeds. (So anyone using this library to generate wallets from Spanish or Japanese phrases must use the old non-normalizing version to recover their funds. I doubt anyone is there... but a warning might be necessary? or maybe create a new function for generating non-normalized seed?)

Chinese and English were ok, as their wordlists were pre-normalized (so unorm.nfkd(words) == words) so they should be fine as is.


Also, I added in one change for Japanese, mentioned on the BIP39:
Japanese must be shown to the user being separated by an ideographic space. This is crucial to ensure users don't accidentally view 2 words as 1 word.

Ex.

"a part" // this is a normal ASCII space
"a part" // this is an ideographic space

It doesn't seem that necessary when letters are so small, but looking at Japanese.

"あさ ごはん" // this is a normal ASCII space
"あさ ごはん" // this is an ideographic space

It makes a huge difference, and the latter must be shown to the user.

Also notice that ideographic space will be replaced by ASCII space when NFKD normalized, so while the words themselves are NFKD in the wordlist, because Japanese requires non-NFKD ideographic spaces for the "phrase" string, I have placed a catch-all NFKD in the call to pbkdf2 around this.phrase

Users in Japan will also likely input the phrase using ideographic spaces to input it, so I NFKD the mnemonic input to outward facing functions.

@dabura667
Copy link
Contributor Author

My main motivation for this fix is because it seems like Copay might use BIP39 in backups some way, and I was checking to see the ideographic space usage and luckily I did, as Spanish would have been generating bad wallets (non-BIP39-standard).

Also note that Japanese phrases are unique in that ALL Japanese characters are "breakable" and thus textwrap will break a word in the middle on a line break. Care must be taken to ensure a word is not broken on the line break and shown to the user.

Reference: (breadwallet is still trying to get it right... it is difficult)
voisine/breadwallet-ios#231
voisine/breadwallet-ios@dd1bfae

@dabura667
Copy link
Contributor Author

Wait a sec, I will add Japanese test vectors.

@dabura667 dabura667 closed this Jul 12, 2015
@dabura667
Copy link
Contributor Author

I tried to alter the test vectors to also use a Japanese passphrase with pbkdf2 instead of just "TREZOR" so that it could test normalizing of passphrase as well.

I realized how long it will take, and I don't have enough time, so I will just submit this PR as is.

Here are the Japanese test vectors I have prepared:
non-normalized strings: https://raw.githubusercontent.com/bip32JP/bip32JP.github.io/6f6090b49bb718711904468bce99a73770e09071/test_JP_BIP39.json
normalized (except for spaces) strings: https://raw.githubusercontent.com/bip32JP/bip32JP.github.io/377f72c5087533c34c79ba02335d1fbc5509dfa5/test_JP_BIP39.json

@dabura667 dabura667 reopened this Jul 12, 2015
@matiu
Copy link
Contributor

matiu commented Jul 13, 2015

LGTM, great work.

@pnagurny
Copy link

LGTM

matiu added a commit that referenced this pull request Jul 13, 2015
Add NFKD normalization + Fix for Japanese
@matiu matiu merged commit b7ad160 into bitpay:master Jul 13, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants