Fix two errors in the BIP 39 French wordlist #622

nym-zone · 2018-01-01T05:45:57Z

The BIP 39 French wordlist contains two significant technical errors:

Byte Order Marker (BOM) U+FEFF at the beginning of the first line, preceding the word “abaisser”.
No newline '\n' char terminating the last line, after “zoologie”.

The former may cause user loss of funds. An implementation which generates a mnemonic phrase and also turns it into a BIP 39 seed value may feed the string "<U+FEFF>abaisser" to the KDF, while displaying the word “abaisser” to the user. Of course, it cannot be expected that the user would enter "<U+FEFF>abaisser" upon attempt to restore a wallet. In the face of a buggy wordlist, whitespace handling and normalization cannot be absolutely relied on to remove a notoriously mischievous character. Those who provide technical support may be well advised to ask French users with unrestorable wallets, “Did your mnemonic phrase contain the word ‘abaisser’?”

The latter broke the shell script I use to massage wordlists into C sources when building easyseed.

I know of only one commonplace platform where software regularly prepends UTF-8 files with a spurious U+FEFF, and oftentimes omits a line terminator on the last line even when asked to create a Unix ('\n') text file. It is RECOMMENDED that new wordlists be examined for correctness using standard shell tools on a sane platform.

The BIP 39 wordlist contained two significant technical errors: - Byte Order Marker (BOM) U+FEFF at the beginning of the first line, preceding the word "abaisser". - No newline '\n' char terminating the last line, after "zoologie". The former may cause user loss of funds. An implementation which generates a mnemonic phrase and also turns it into a BIP 39 seed value may feed the string "<U+FEFF>abaisser" to the KDF, while displaying the word "abaisser" to the user. Of course, it cannot be expected that the user would enter "<U+FEFF>abaisser" upon attempt to restore a wallet. In the face of a buggy wordlist, whitespace handling and normalization cannot be absolutely relied on to remove a notoriously mischievous character. Those who provide technical support may be well advised to ask French users with unrestorable wallets, "Did your mnemonic phrase contain the word 'abaisser'?" The latter broke the shell script I use to massage wordlists into C sources when building https://github.com/nym-zone/easyseed . I know of only one commonplace platform where software regularly prepends UTF-8 files with a spurious U+FEFF, and oftentimes omits a line terminator on the last line even when asked to create a Unix ('\n') text file. It is RECOMMENDED that new wordlists be examined for correctness using standard shell tools on a sane platform.

dabura667 · 2018-01-01T06:30:20Z

ACK 50c4f12

I checked all other current wordlists and they all:

Did not have the BOM
Did have a trailing LF line break

I pulled @nym-zone 's branch locally and inspected the file using a hex editor to double check the new french.txt is fixed.

As for @nym-zone's concerns:

The only wallet I know of that shows users French phrases is Copay, and they don't use the files as-is from the BIP.
https://github.com/bitpay/bitcore-mnemonic/blob/master/lib/words/french.js

If there is a wallet that uses the exact file from the BIP that shows french users phrases in French. That warning applies... but I don't think any wallet does so.

nym-zone · 2018-01-01T07:29:50Z

Thanks, @dabura667. I should have clarified, I discovered this when writing a mnemonic phrase generator tool which embeds all eight wordlists currently in the BIP repository. I reported French, because that’s what was broken. Insofar as I could when dealing with multiple languages I do not know, I exercised reasonable care to assure the integrity of all the wordlists.

(Then I wrote an automated battery of runtime tests against the compile-time SHA-256 hashes of the files, in case somebody out there may have buggy tools which mangle UTF-8 when building...)

I just hexdumped Copay’s french.js to double-check; and its string representation is fine as for the ironically self-abasing “abaisser”.

nym-zone · 2018-01-05T21:11:39Z

Checking additional implementations in the wild, I have not (yet?) found any which carry the spurious U+FEFF. But it is not only a matter of Copay.

From a popular implementation, widely used because it runs in a web browser: iancoleman/bip39@3a8dbe9, checking wordlist_french.js:

00000030  20 3a 20 57 4f 52 44 4c  49 53 54 53 3b 0a 57 4f  | : WORDLISTS;.WO|
00000040  52 44 4c 49 53 54 53 5b  22 66 72 65 6e 63 68 22  |RDLISTS["french"|
00000050  5d 20 3d 20 5b 0a 22 61  62 61 69 73 73 65 72 22  |] = [."abaisser"|

From NovaCrypto/BIP39@5ecf568 (see also), checking French.java:

00000520  20 6e 65 77 20 53 74 72  69 6e 67 5b 5d 7b 0a 20  | new String[]{. |
00000530  20 20 20 20 20 20 20 20  20 20 20 22 61 62 61 69  |           "abai|
00000540  73 73 65 72 22 2c 0a 20  20 20 20 20 20 20 20 20  |sser",.         |

I will keep my eye out for other French-supporting BIP 39 implementations.

Notice: This is hidden behind the -W flag; see 8aaa6f3. This is not exactly the wordlist proposed in the pull request. It is the russian.txt from farazdagi/bips@a59cc3e, as modified by approximately the following command: uconv -f utf-8 -t utf-8 -x '::nfkd;' | sort -s The *result* has been confirmed to not have any leading BOM, and to have a final line terminated with '\n' (bitcoin/bips#622). I did not yet examine the source for these issues. The *result* russian.txt SHA-256 hash: a8d7b9d8bdd3816eddd2aeb98718ad586d8e7dd8c364a944c072cdf3cd6bcb05

Notice: This is hidden behind the -W flag; see 8aaa6f3. This is not exactly the wordlist proposed in the pull request. It is the czech.txt from zizelevak/bips@b7f682f, as modified by approximately the following command: uconv -f utf-8 -t utf-8 -x '::nfkd;' < czech.txt | \ LC_ALL=C LANG=C sort -s > normalized/czech.txt The *result* has been confirmed to not have any leading BOM, and to have a final line terminated with '\n' (bitcoin/bips#622). I did not yet examine the source for these issues. I did examine the source to confirm that no lines had any trailing whitespace (see 08a05b4). SHA-256 hash for the *resulting* czech.txt: 195136b3ba0f3099a9df625e0963f4efb56625b91c3a76bc5b4a9466a26880f7

nym-zone · 2018-01-09T05:13:10Z

ping @Kirvx @NicolasDorier @vosine @luke-jr

Bugfix on #152, fbe7196.

This is a simple technical fix which changes no functionality in correct implementations, but can help prevent implementation errors.

Kirvx · 2018-01-09T07:46:27Z

ACK.
Thanks @nym-zone and @dabura667.

@voisine Are breadwallet french users affected?

…t masterpubkey

nym-zone · 2018-01-09T08:25:13Z

Thanks, @Kirvx.

I didn’t mean to imply anything about Breadwallet; I was simply trying to get the attention of persons with significant involvement in #152 and/or maintenance of BIP 39 wordlists and/or BIP maintenance. But, good idea! Now, I checked the only Breadwallet French file I could find on a brief search, BreadWallet/fr.lproj/BIP39Words.plist from breadwallet/breadwallet-legacy@fd14d42; and it is fine:

$ hd -s 173 -n 16 BIP39Words.plist
000000ad  3c 73 74 72 69 6e 67 3e  61 62 61 69 73 73 65 72  |<string>abaisser|
000000bd

(That file is delimited by XML, not newlines; so as long as “zoologie” is there, that’s fine also.)

By the way, thank you for your work creating this list. I strongly urge that BIP 39 should have broad language support; and French is an important language. I can see in #152 how much work was put in to make and refine the list. Too bad, it seems abaisser wanted to abase itself.

Kirvx · 2018-01-09T12:17:33Z

😃

Thank you for checking breadwallet @nym-zone 👍

luke-jr · 2018-01-10T06:05:29Z

@slush0 @prusnak @voisine @ebfull

NicolasDorier · 2018-01-11T02:16:17Z

would be nice to add a test vector. NBitcoin also has french hard coded... I could not find a proper tool on windows (sublime, and vscode don't work for some reason) showing me hidden characters though.

evoskuil · 2018-01-11T02:32:56Z

Try notepad++

nym-zone · 2018-01-11T04:23:31Z

@NicolasDorier:

would be nice to add a test vector.

From the nym-zone/easyseed@c7d698a set of a dozen languages’ test vectors, I give you French test vectors in a convenient JSON format. Please run them with your implementation.

As stated in the preceding commit log from two days ago (nym-zone/easyseed@5f35cd0, q.v.), these vectors are specifically designed to flunk implementations which do not perform proper Unicode NFKD normalization—even with words not containing diacritics (including the English wordlist, too).

(I may rename or modify things; but the versioned link will obviously remain stable.)

nym-zone · 2018-01-11T05:05:13Z

Pertinent to the below, but also as a general point: U+FEFF is not removed or otherwise affected by Unicode NFKD normalization. To shortcut discussion of Unicode character properties, here is an empirical object demonstration with an ICU utility:

$ echo -n $'a\ufeffb' | uconv -x '::nfkd;' | hd
00000000  61 ef bb bf 62                                    |a...b|
00000005

Of course, due to its double-personality as “ZERO WIDTH NO-BREAK SPACE” (ZWNBSP), it’s also zero-width—and therefore invisible on any display which supports Unicode:

$ echo $'a\ufeffb'
ab

(Check, there is an extra character in between there.)

Thus if it gets fed to PBKDF2 for BIP 39 seed generation, and the user is shown the corresponding mnemonic, then the user will write down a phrase which will not restore the wallet. That’s why I panicked when I saw this:

$ head -c16 french.txt | hd
00000000  ef bb bf 61 62 61 69 73  73 65 72 0a 61 62 61 6e  |...abaisser.aban|
00000010

@NicolasDorier, IWordlistSource.cs from MetacoSA/NBitcoin@45a0ad9 does not contain this bug:

$ hd -s 80771 -n 19 IWordlistSource.cs
00013b83  22 66 72 65 6e 63 68 22  2c 20 22 61 62 61 69 73  |"french", "abais|
00013b93  73 65 72                                          |ser|
00013b96

If you used a Windows text editor to somehow copypaste that, I would not expect it to have picked up the U+FEFF. AFAIK, the Windows tools which (unnecessarily) produce a Byte Order Marker for octet-stream UTF-8 tend to not consider it part of the text. But then, I also would not rely on this.

My greater concern is automatically processed text. I know for a fact that the U+FEFF can be slurped up from french.txt, because that’s what happened to me. In my Makefile, I use Unix shell tools to turn the wordlist .txt files into C const char * arrays. The resulting French C array did contain the U+FEFF before I fixed it.

Had I foisted that on users, somebody would have lost funds sooner or later.

(Apologies for the double post. My bad.)

NicolasDorier · 2018-01-13T13:46:53Z

@nym-zone I dodged a bullet.

In .NET, the default encoding is UTF8 BOM: when reading a file it checks BOM, and if present, remove it from the string.

When writing to the file, the default UTF8 BOM bytes is added.

However, UTF8Encoding.GetBytes() does not emit it, even if the UTF8Encoding objects has BOM setting set, for that a call to UTF8Encoding.GetPreamble() is needed.

TL;DR: My implementation is conform... by chance. (Tested against your vector)
Hopefully the test vectors of BIP39 (which I added since beginning) should have caught the bug though.

dabura667 · 2018-01-13T15:32:27Z

@NicolasDorier the test vector generator included with Trezor-mnemonic library has all 0x00 as a vector generator for edge cases which results in all first words except the last.

cornfeedhobo · 2018-07-04T22:38:17Z

Can we get a resolution here? My understanding of these wordlist definitions negates the need for the BOM, and it is not recommended for use.

NicolasDorier · 2018-07-05T11:53:37Z

why this has not been merged?

voisine · 2018-07-05T17:29:26Z

ACK

cornfeedhobo · 2018-08-01T00:24:29Z

@Kirvx Could we get your ACK on this?

Kirvx · 2018-08-01T04:59:22Z

ACK

cornfeedhobo · 2018-08-01T22:17:29Z

@luke-jr Anything else needed to get this merged?

nym-zone mentioned this pull request Jan 5, 2018

French word list and test vectors NovaCrypto/BIP39#4

Merged

nym-zone referenced this pull request in iancoleman/bip39 Jan 5, 2018

jsBip39 and wordlists updated with new languages

3a8dbe9

This was referenced Jan 8, 2018

BIP39: Adds Russian word list #432

Closed

Czech wordlist for BIP0039 #493

Merged

nym-zone referenced this pull request in breadwallet/breadwallet-legacy Jan 9, 2018

localized recovery phrases, user supplied phrases now verified agains…

fd14d42

…t masterpubkey

luke-jr added the Proposed BIP modification label Jan 10, 2018

nym-zone referenced this pull request in MetacoSA/NBitcoin Jan 11, 2018

BIP39 French language

45a0ad9

NicolasDorier added a commit to MetacoSA/NBitcoin that referenced this pull request Jan 13, 2018

Add tests vector of nym-zone (bitcoin/bips#622)

d91fa4a

iancoleman mentioned this pull request Jul 4, 2018

Remove erroneous prefix bits from BIP-0039 french wordlist #697

Closed

luke-jr merged commit 29a8781 into bitcoin:master Aug 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix two errors in the BIP 39 French wordlist #622

Fix two errors in the BIP 39 French wordlist #622

nym-zone commented Jan 1, 2018

dabura667 commented Jan 1, 2018

nym-zone commented Jan 1, 2018

nym-zone commented Jan 5, 2018

nym-zone commented Jan 9, 2018

Kirvx commented Jan 9, 2018

nym-zone commented Jan 9, 2018

Kirvx commented Jan 9, 2018

luke-jr commented Jan 10, 2018

NicolasDorier commented Jan 11, 2018

evoskuil commented Jan 11, 2018

nym-zone commented Jan 11, 2018

nym-zone commented Jan 11, 2018

NicolasDorier commented Jan 13, 2018

dabura667 commented Jan 13, 2018

cornfeedhobo commented Jul 4, 2018 •

edited

Loading

NicolasDorier commented Jul 5, 2018

voisine commented Jul 5, 2018

cornfeedhobo commented Aug 1, 2018

Kirvx commented Aug 1, 2018

cornfeedhobo commented Aug 1, 2018

Fix two errors in the BIP 39 French wordlist #622

Fix two errors in the BIP 39 French wordlist #622

Conversation

nym-zone commented Jan 1, 2018

dabura667 commented Jan 1, 2018

nym-zone commented Jan 1, 2018

nym-zone commented Jan 5, 2018

nym-zone commented Jan 9, 2018

Kirvx commented Jan 9, 2018

nym-zone commented Jan 9, 2018

Kirvx commented Jan 9, 2018

luke-jr commented Jan 10, 2018

NicolasDorier commented Jan 11, 2018

evoskuil commented Jan 11, 2018

nym-zone commented Jan 11, 2018

nym-zone commented Jan 11, 2018

NicolasDorier commented Jan 13, 2018

dabura667 commented Jan 13, 2018

cornfeedhobo commented Jul 4, 2018 • edited Loading

NicolasDorier commented Jul 5, 2018

voisine commented Jul 5, 2018

cornfeedhobo commented Aug 1, 2018

Kirvx commented Aug 1, 2018

cornfeedhobo commented Aug 1, 2018

cornfeedhobo commented Jul 4, 2018 •

edited

Loading