-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BIP39: Added ukrainian wordlist #442
Conversation
this needs to be NFKD normalized, which you can do with the following perl script:
|
Looks good to me, but I'd like a second ukranian speaker to go over the list and verify it meets the word list criteria before ACKing |
we reviewed the words (ukranian speaker) and they look OK - however the list doesn't seem sorted (run sort on it, export LANG=C first if you don't have it set). If sorted it allows faster processing (binary search) and we think it is worthwhile doing it. |
Doesn't look like words are identifiable by first four letters. |
Script for validating all BIP39 defined rules (like uniqueness of first four letters) is here: Maybe it will need fixes for UTF-8 (eventually slight rewrite for python3 which handle UTF much better), but passing such tests is needed for adding into BIP. |
Okay, I run test_mnemonic.py (with Python3 - with no problems) and it gave me such list of duplicates: http://pastebin.com/ztBqDT9q There were some other minor errors, but this need some work. |
Access to proposed wordlists not (yet) accepted into the BIP repository is now controlled by the -W option. This option is and will remain undocumented in the manpage and other user documentation. The purpose of this undocumented flag is to permit testing proposed wordlists and generating test vectors, without inducing users to rely on wordlists which may be changed or removed at any time without notice. Users MUST NOT generate actual wallets based on proposed wordlists; so doing could result in unrecoverable wallets and permanent funds loss. I am now ready to import three more proposed wordlists I had not hereto seen amongst BIP pull requests: - Russian, bitcoin/bips#432 - Ukrainian, bitcoin/bips#442 - Czech, bitcoin/bips#493 Wordlist already imported for testing, now hidden behind -W option: - Indonesian, bitcoin/bips#621 (b2f66ba, special-cased d03ddae) Further changes hereby made, with due apologies for a non-atomic commit: - Integrating the -W option necessitated a general cleanup and overhaul of the options-handling code. - Whilst overhauling the options, I noticed that the documented -P option functionality was broken/nonexistent. Fixed.
Notice: This is hidden behind the -W flag; see 8aaa6f3. This is not exactly the wordlist proposed in the pull request. The file ukrainian.txt from Bohdat/bips@152fc59 has a bug, in addition to the usual normalization and sorting concerns: A trailing space (0x20) and tab (0x09, '\t') after the word at original index 1393, 1-based line number 1394, and before the newline '\n'. The problem was first identified by failure of easyseed's extensive internal self-tests, followed by examination of the problem with cmp(1) and hex dumps to diagnose the difference between the wordlist in my source tree, and the wordlist printed on stdout by `easyseed -W -P -l uk`. The following is edited for line length limits in the git log, but it adequately shows the problem: $ grep -E '[[:space:]]$' ukrainian.txt | hd 00000000 d0 bf d1 96 d1 81 d0 bd d1 8f 20 09 0a $ grep -En '[[:space:]]$' ukrainian.txt 1394:пісня <*end of line is here*> It is fixed with the following command: $ sed -E -e 's/[[:space:]]+$//' < ukrainian.txt > ukfix1/uk_fixed0.txt After verification that this command made no other changes, it is normalized and sorted: $ ls -l ukrainian.txt ukfix1/uk_fixed0.txt -rw-r--r-- 1 user user 24550 Jan 7 21:26 ukfix1/uk_fixed0.txt -rw-r--r-- 1 user user 24552 Jan 7 20:31 ukrainian.txt $ diff -u3 ukrainian.txt ukfix1/uk_fixed0.txt [...showing only the desired line changed...] $ uconv -f utf-8 -t utf-8 -x '::nfkd;' < uk_fixed0.txt | \ LC_ALL=C LANG=C sort -s > uk_fixed1.txt $ mv -i uk_fixed1.txt ../../easyseed/wordlist/ukrainian.txt mv: overwrite '../../easyseed/wordlist/ukrainian.txt'? y (Note with ref to 234c66c: When normalizing and sorting the russian.txt list, I forgot to force the locale for `sort(1)`. I verified that this makes no difference, and the 234c66c russian.txt is correct. It *does* make a very large difference for the Ukrainian wordlist.) SHA-256 hash for the resulting ukrainian.txt: 612ee29e1fa13dc38c9e1b31c7ef980db8f3c8dd30f1c9377170d1b10e895dc9
At nym-zone/easyseed@08a05b4, I have created a bugfixed The The following commands pinpoint the problem:
(@dabura667, perhaps you may want to add that to your punch-list of technical checks.) It is fixed with the following command:
After verification that this command made no other changes, the list is normalized and sorted:
SHA-256 hash for the resulting
|
These are generated with easyseed and the bip39_vectorgen.sh script from 5f35cd0. There are vectors in twelve languages: The eight in the BIP repository, and four more with pending proposals. Three of the vectors for proposed languages are for wordlists which I have modified: - Russian (234c66c, bitcoin/bips#432) - Ukrainian (08a05b4, bitcoin/bips#442) - Czech (ba25dfa, bitcoin/bips#493) The wordlist for Indonesian (bitcoin/bips#621) is unmodified from the proposal. Ironically, easyseed does not yet self-test itself with these. That will be added in a future release, to verify consistency between builds. For now, I publish these to aid in interoperability testing between implementations.
Hello, I've almost started working on my own list for this, and found this pr. Can someone tell me what exactly here needs to be fixed/updated/reviewed? Follow up question: are there any preferences for nouns-verbs-adjectives? There are (at least) few special words that translate to "and", "or", "there" etc. Edit: In addition, there are many closely related words and different forms of the same word - working on those. Edit 2: probably will have to create new PR in the end, because current contributor is inactive. |
No description provided.