BIP39: Added ukrainian wordlist #442

Bohdat · 2016-09-05T12:09:29Z

No description provided.

luke-jr · 2016-09-05T20:10:52Z

voisine · 2016-09-13T00:53:11Z

this needs to be NFKD normalized, which you can do with the following perl script:

#!/usr/bin/perl

use Unicode::Normalize;
use strict;
use warnings;
use open qw(:std :utf8);

while (<>) {
    print NFKD("$_");
}

voisine · 2016-09-13T17:46:48Z

Looks good to me, but I'd like a second ukranian speaker to go over the list and verify it meets the word list criteria before ACKing

greenaddress · 2016-09-13T19:45:04Z

we reviewed the words (ukranian speaker) and they look OK - however the list doesn't seem sorted (run sort on it, export LANG=C first if you don't have it set).

If sorted it allows faster processing (binary search) and we think it is worthwhile doing it.

zerko · 2016-09-23T16:50:37Z

Doesn't look like words are identifiable by first four letters.

slush0 · 2016-09-23T21:38:11Z

Script for validating all BIP39 defined rules (like uniqueness of first four letters) is here:
https://github.com/trezor/python-mnemonic/blob/master/test_mnemonic.py

Maybe it will need fixes for UTF-8 (eventually slight rewrite for python3 which handle UTF much better), but passing such tests is needed for adding into BIP.

slush0 · 2016-09-23T22:01:44Z

Okay, I run test_mnemonic.py (with Python3 - with no problems) and it gave me such list of duplicates: http://pastebin.com/ztBqDT9q

There were some other minor errors, but this need some work.

Access to proposed wordlists not (yet) accepted into the BIP repository is now controlled by the -W option. This option is and will remain undocumented in the manpage and other user documentation. The purpose of this undocumented flag is to permit testing proposed wordlists and generating test vectors, without inducing users to rely on wordlists which may be changed or removed at any time without notice. Users MUST NOT generate actual wallets based on proposed wordlists; so doing could result in unrecoverable wallets and permanent funds loss. I am now ready to import three more proposed wordlists I had not hereto seen amongst BIP pull requests: - Russian, bitcoin/bips#432 - Ukrainian, bitcoin/bips#442 - Czech, bitcoin/bips#493 Wordlist already imported for testing, now hidden behind -W option: - Indonesian, bitcoin/bips#621 (b2f66ba, special-cased d03ddae) Further changes hereby made, with due apologies for a non-atomic commit: - Integrating the -W option necessitated a general cleanup and overhaul of the options-handling code. - Whilst overhauling the options, I noticed that the documented -P option functionality was broken/nonexistent. Fixed.

Notice: This is hidden behind the -W flag; see 8aaa6f3. This is not exactly the wordlist proposed in the pull request. The file ukrainian.txt from Bohdat/bips@152fc59 has a bug, in addition to the usual normalization and sorting concerns: A trailing space (0x20) and tab (0x09, '\t') after the word at original index 1393, 1-based line number 1394, and before the newline '\n'. The problem was first identified by failure of easyseed's extensive internal self-tests, followed by examination of the problem with cmp(1) and hex dumps to diagnose the difference between the wordlist in my source tree, and the wordlist printed on stdout by `easyseed -W -P -l uk`. The following is edited for line length limits in the git log, but it adequately shows the problem: $ grep -E '[[:space:]]$' ukrainian.txt | hd 00000000 d0 bf d1 96 d1 81 d0 bd d1 8f 20 09 0a $ grep -En '[[:space:]]$' ukrainian.txt 1394:пісня <*end of line is here*> It is fixed with the following command: $ sed -E -e 's/[[:space:]]+$//' < ukrainian.txt > ukfix1/uk_fixed0.txt After verification that this command made no other changes, it is normalized and sorted: $ ls -l ukrainian.txt ukfix1/uk_fixed0.txt -rw-r--r-- 1 user user 24550 Jan 7 21:26 ukfix1/uk_fixed0.txt -rw-r--r-- 1 user user 24552 Jan 7 20:31 ukrainian.txt $ diff -u3 ukrainian.txt ukfix1/uk_fixed0.txt [...showing only the desired line changed...] $ uconv -f utf-8 -t utf-8 -x '::nfkd;' < uk_fixed0.txt | \ LC_ALL=C LANG=C sort -s > uk_fixed1.txt $ mv -i uk_fixed1.txt ../../easyseed/wordlist/ukrainian.txt mv: overwrite '../../easyseed/wordlist/ukrainian.txt'? y (Note with ref to 234c66c: When normalizing and sorting the russian.txt list, I forgot to force the locale for `sort(1)`. I verified that this makes no difference, and the 234c66c russian.txt is correct. It *does* make a very large difference for the Ukrainian wordlist.) SHA-256 hash for the resulting ukrainian.txt: 612ee29e1fa13dc38c9e1b31c7ef980db8f3c8dd30f1c9377170d1b10e895dc9

nym-zone · 2018-01-08T09:10:56Z

At nym-zone/easyseed@08a05b4, I have created a bugfixed ukrainian.txt which is NFKD-normalized and binary-sorted, and fixes one technical bug.

The ukrainian.txt from Bohdat/bips@152fc59 contains a trailing space (0x20) then tab (0x09, '\t') after the word at original index 1393 (1-based line number 1394), before the newline '\n'. The problem was first identified by failure of easyseed’s extensive internal self-tests, followed by examination with cmp(1) and hex dumps to diagnose the difference between the wordlist in my source tree, and the wordlist printed on stdout by easyseed -W -P -l uk.

The following commands pinpoint the problem:

$ grep -E '[[:space:]]$' ukrainian.txt | hd
00000000  d0 bf d1 96 d1 81 d0 bd  d1 8f 20 09 0a           |.......... ..|
0000000d
$ echo "\"`grep -En '[[:space:]]$' ukrainian.txt`\""
"1394:пісня 	"

(@dabura667, perhaps you may want to add that to your punch-list of technical checks.)

It is fixed with the following command:

$ sed -E -e 's/[[:space:]]+$//' < ukrainian.txt > ukfix1/uk_fixed0.txt

After verification that this command made no other changes, the list is normalized and sorted:

$ ls -l ukrainian.txt ukfix1/uk_fixed0.txt
-rw-r--r-- 1 user user 24550 Jan  7 21:26 ukfix1/uk_fixed0.txt
-rw-r--r-- 1 user user 24552 Jan  7 20:31 ukrainian.txt
$ diff -u3 ukrainian.txt ukfix1/uk_fixed0.txt
[...showing only the desired line changed...]
$ uconv -f utf-8 -t utf-8 -x '::nfkd;' < uk_fixed0.txt | \
	LC_ALL=C LANG=C sort -s > uk_fixed1.txt
$ mv -i uk_fixed1.txt ../../easyseed/wordlist/ukrainian.txt
mv: overwrite '../../easyseed/wordlist/ukrainian.txt'? y

SHA-256 hash for the resulting ukrainian.txt:

612ee29e1fa13dc38c9e1b31c7ef980db8f3c8dd30f1c9377170d1b10e895dc9

These are generated with easyseed and the bip39_vectorgen.sh script from 5f35cd0. There are vectors in twelve languages: The eight in the BIP repository, and four more with pending proposals. Three of the vectors for proposed languages are for wordlists which I have modified: - Russian (234c66c, bitcoin/bips#432) - Ukrainian (08a05b4, bitcoin/bips#442) - Czech (ba25dfa, bitcoin/bips#493) The wordlist for Indonesian (bitcoin/bips#621) is unmodified from the proposal. Ironically, easyseed does not yet self-test itself with these. That will be added in a future release, to verify consistency between builds. For now, I publish these to aid in interoperability testing between implementations.

kittyandrew · 2021-06-14T22:01:51Z

Hello, I've almost started working on my own list for this, and found this pr. Can someone tell me what exactly here needs to be fixed/updated/reviewed?

Follow up question: are there any preferences for nouns-verbs-adjectives? There are (at least) few special words that translate to "and", "or", "there" etc.

Edit: In addition, there are many closely related words and different forms of the same word - working on those.

Edit 2: probably will have to create new PR in the end, because current contributor is inactive.

luke-jr · 2021-07-02T21:29:08Z

For now, the author(s) of BIP 39 have decided not to accept any further word lists into BIP 39 itself, and encourage adding new ones to the WLIPs repo here: https://github.com/p2w34/wlips

Added ukrainian wordlist

b705eda

luke-jr added the Proposed BIP modification label Sep 5, 2016

Normalized under NFKD

b991ce5

Sorted with sort command

fb81332

Bohdat added 2 commits October 3, 2016 13:04

Fixed to pass verification script

563de78

Replaced some words; reduced number of verbs

152fc59

nym-zone mentioned this pull request Jan 8, 2018

Czech wordlist for BIP0039 #493

Merged

ldz1 mentioned this pull request Nov 14, 2018

Remove uncompleted bip39 wordlists. libbitcoin/libbitcoin-system#1074

Closed

DonaldTsang mentioned this pull request Dec 24, 2018

Binary Lists grempe/diceware#44

Closed

22 tasks

luke-jr closed this Jul 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BIP39: Added ukrainian wordlist #442

BIP39: Added ukrainian wordlist #442

Bohdat commented Sep 5, 2016

luke-jr commented Sep 5, 2016

voisine commented Sep 13, 2016

voisine commented Sep 13, 2016

greenaddress commented Sep 13, 2016 •

edited

Loading

zerko commented Sep 23, 2016

slush0 commented Sep 23, 2016

slush0 commented Sep 23, 2016

nym-zone commented Jan 8, 2018

kittyandrew commented Jun 14, 2021 •

edited

Loading

luke-jr commented Jul 2, 2021

BIP39: Added ukrainian wordlist #442

BIP39: Added ukrainian wordlist #442

Conversation

Bohdat commented Sep 5, 2016

luke-jr commented Sep 5, 2016

voisine commented Sep 13, 2016

voisine commented Sep 13, 2016

greenaddress commented Sep 13, 2016 • edited Loading

zerko commented Sep 23, 2016

slush0 commented Sep 23, 2016

slush0 commented Sep 23, 2016

nym-zone commented Jan 8, 2018

kittyandrew commented Jun 14, 2021 • edited Loading

luke-jr commented Jul 2, 2021

greenaddress commented Sep 13, 2016 •

edited

Loading

kittyandrew commented Jun 14, 2021 •

edited

Loading