Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BIP39: Adds Russian word list #432

Closed
wants to merge 2 commits into from
Closed

Conversation

farazdagi
Copy link

I've tried to follow guidelines defined in other languages.

бушлат
бывать
быль
быть
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"бывать" and "быть" are too similar imo and I would exclude both of them tbh

@UdjinM6
Copy link
Contributor

UdjinM6 commented Aug 12, 2016

Also

итак
когда
кроме
кстати
куда
либо
ловко
между
наверх
назад
налево
нигде
никак
нынче
однажды
около
откуда
отнюдь
отсюда
оттого
оттуда
плохо
полтора
помимо
поперек
почему
против
путем
пятеро
пяток
пять
ранее
сбоку
сверху
сегодня
сейчас
сзади
слегка
смело
снизу
снова
совсем
сорок
сразу
также
твой
теперь
тогда
тоже
точно
триста
туго
туда
уйма
целиком
четыре
явно
якобы
ярко
ясно

All of these above do not fit noun/verb/adj criteria - should be removed or mentioned in criteria imo. There also are some "numeric-like" words like "первый", "тысяча" etc which I'm not sure about too but probably they are ok.

@farazdagi
Copy link
Author

@UdjinM6 Thanks for comments, will go through them today, and push updated list to this PR.

@farazdagi
Copy link
Author

farazdagi commented Aug 14, 2016

I've spend considerable amount of time manually going through word list and:

  • applying all suggestions made above (thanks again @UdjinM6)
  • making sure that only nouns/verbs/adjectives are used (mostly nouns)
  • making sure that words are distinct enough from each other (improved Levenshtein distance)

Please review and let me know if there are any issues left.

@UdjinM6
Copy link
Contributor

UdjinM6 commented Aug 14, 2016

Very nice! IMO the list looks much better now 👍

PS. And btw, thanks for submitting this PR!

@luke-jr
Copy link
Member

luke-jr commented Aug 14, 2016

@Bohdat
Copy link

Bohdat commented Sep 5, 2016

Here is some very familiar words I have found:
арка арфа
банк танк
бард барс
батон бутон
бинт бунт
бочка точка
брак брат
букет буфет
вахта шахта
весть честь
взвод вывод
взор узор
влияние слияние
волк воля
волк толк
вход уход
глава слава
гном гром
губа гуща
губа шуба
дата хата
день тень
диск риск
дума душа
душа суша
жара фара
задор затор
замок зарок
игла игра
имение умение
кабель кафель
кабель табель
капля цапля
катер шатер
козел котел
койка кошка
конверт концерт
корнет корсет
кубок кусок
куча туча
лента рента
лечение течение
магия мафия
метр мэтр
модель модуль
мост рост
народ наряд
нация рация
нейлон нейрон
нива ниша
нить шить
нога нота
норма форма
нота рота
олень осень
оплата уплата
ответ отчет
паек парк
пакт факт
пальто сальто
певец перец
пена цена
петь путь
петь сеть
пила пища
пила сила
план плац
плита элита
повар товар
пруд труд
пугать ругать
путь суть
река рука
сбруя струя
сеть суть
слон стон
смена стена
сосед сосуд
удав удар
хобот хохот
цинк цирк
чадо чудо
челнок чеснок
штаб штат

@voisine
Copy link
Contributor

voisine commented Sep 13, 2016

this needs to be NFKD normalized, which you can do with the following perl script:

#!/usr/bin/perl

use Unicode::Normalize;
use strict;
use warnings;
use open qw(:std :utf8);

while (<>) {
    print NFKD("$_");
}

@greenaddress
Copy link
Contributor

reviewed the words - looked OK. The list of words is also sorted so that's great.

@dabura667
Copy link

NFKD normalization needed.

Be sure to resort after normalization.

Japanese forgot to do so, :-( (oops!)

@jonathancross
Copy link
Contributor

Ping @farazdagi – Seems this still needs to be normalized?

@Sjors
Copy link
Member

Sjors commented Jun 30, 2017

A general observation about adding more languages to BIP 39 is that English now has broad wallet support. If a new language is only supported by a small number of wallets, this could lead to (unintended) vendor-lockin.

If someone writes down their mnemonic and puts in a vault, they should be able to take it out 50 years later and have a reasonable chance of finding software that can still import it.

Perhaps getting BIP 39 (or something similar) recognized as an ISO standard would be a good step towards durability, before adding more languages.

агат
агент
агрегат
адажио
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

адажио is a quite rare word, is it okay to use it here?

nym-zone added a commit to nym-zone/easyseed that referenced this pull request Jan 7, 2018
Access to proposed wordlists not (yet) accepted into the BIP repository
is now controlled by the -W option.  This option is and will remain
undocumented in the manpage and other user documentation.

The purpose of this undocumented flag is to permit testing proposed
wordlists and generating test vectors, without inducing users to rely on
wordlists which may be changed or removed at any time without notice.
Users MUST NOT generate actual wallets based on proposed wordlists; so
doing could result in unrecoverable wallets and permanent funds loss.

I am now ready to import three more proposed wordlists I had not hereto
seen amongst BIP pull requests:

 - Russian, bitcoin/bips#432
 - Ukrainian, bitcoin/bips#442
 - Czech, bitcoin/bips#493

Wordlist already imported for testing, now hidden behind -W option:

 - Indonesian, bitcoin/bips#621 (b2f66ba, special-cased d03ddae)

Further changes hereby made, with due apologies for a non-atomic commit:

 - Integrating the -W option necessitated a general cleanup and overhaul
   of the options-handling code.

 - Whilst overhauling the options, I noticed that the documented -P
   option functionality was broken/nonexistent.  Fixed.
nym-zone added a commit to nym-zone/easyseed that referenced this pull request Jan 7, 2018
Notice:  This is hidden behind the -W flag; see 8aaa6f3.

This is not exactly the wordlist proposed in the pull request.  It is
the russian.txt from farazdagi/bips@a59cc3e, as modified by
approximately the following command:

	uconv -f utf-8 -t utf-8 -x '::nfkd;' | sort -s

The *result* has been confirmed to not have any leading BOM, and to have
a final line terminated with '\n' (bitcoin/bips#622).  I did not yet
examine the source for these issues.

The *result* russian.txt SHA-256 hash:
a8d7b9d8bdd3816eddd2aeb98718ad586d8e7dd8c364a944c072cdf3cd6bcb05
@dabura667
Copy link

dabura667 commented Jan 8, 2018

@Sjors BIP39 states

The conversion of the mnemonic sentence to a binary seed is completely independent from generating the sentence. This results in rather simple code; there are no constraints on sentence structure and clients are free to implement their own wordlists

And

software must compute a checksum for the mnemonic sentence using a wordlist and issue a warning if it is invalid.

Which means "If you can't detect (or don't know the wordlist) the checksum, show a warning, but ALLOW THE SEED TO BE GENERATED"

But almost every single wallet used their "developer common sense" which states "if there exists a checksum. Always check it, and always fail loudly and stop everything"... which makes sense.

It is the fault of BIP39 which was made to contradict developer common sense that is at fault.

But to be honest. Electrum supports all BIP39 wordlists, because it actually follows the BIP, and if it doesn't recognize the wordlist, it shows a warning but generates the wallet anyways. I have recovered many wallets using Electrum.

Ironically, Electrum's developer pointed out this contradiction, the authors ignored it, Thomas asked to have his name removed because of this and other problems, and now Electrum is the only wallet that implements BIP39 correctly in this aspect.

@nym-zone
Copy link
Contributor

nym-zone commented Jan 8, 2018

At nym-zone/easyseed@234c66c, I have created a Unicode NFKD-normalized and binary-sorted russian.txt from farazdagi/bips@a59cc3e as modified by approximately the following command:

uconv -f utf-8 -t utf-8 -x '::nfkd;' < russian.txt | \
	sort -s > normalized/russian.txt

(I originally forgot to force the "C" locale for sort(1); but I later checked, and found it did not make a difference for this list in my environment. It did make a difference for the proposed Ukrainian and Czech lists.)

The result has been confirmed to not have a leading BOM, and to have a final line terminated with '\n' (#622). I did not yet examine the source for these issues.

SHA-256 hash for the resulting russian.txt:
a8d7b9d8bdd3816eddd2aeb98718ad586d8e7dd8c364a944c072cdf3cd6bcb05

@nym-zone
Copy link
Contributor

nym-zone commented Jan 8, 2018

@Sjors:

A general observation about adding more languages to BIP 39 is that English now has broad wallet support. If a new language is only supported by a small number of wallets, this could lead to (unintended) vendor-lockin.

If someone writes down their mnemonic and puts in a vault, they should be able to take it out 50 years later and have a reasonable chance of finding software that can still import it.

The answer to vendor lock-in is independent implementations. BIP 39’s simplicity facilitates that. In ten days of occasional side-work, I have written a BIP 39 implementation with extensive self-tests which generates mnemonics in any language for which a wordlist is available in the BIP repository. It can output a BIP 32 xprv extended master private key for wallet restoration (although this feature is not yet documented in the manpage). Restoration to xprv from a user-input mnemonic in any language will be added in the near future. This is written in standard C/mostly standard POSIX. Anybody with technical competence who urgently needed to restore a wallet could whip up a barebones/no-tests/no-checksum-check/no-manpage mnemonic-to-xprv tool as a little afternoon project.

I have C code on my disk with copyright dates from almost 40 years ago—actually, if memory serves, the oldest date I have seen in my platform’s source tree is exactly 1978. Likewise, I expect that my freely available C11 code will compile with minimal changes for decades to come.

When such tools are available and easy to produce ab initio, where is the vendor lock-in? Wallets don’t need multi-language support to restore from an xprv.

I am glad to see new languages being proposed and added. The important part is to get the wordlist right before it’s carved into the standard, baked into implementations, and used for wallets containing actual people’s actual money. That is important.

nym-zone added a commit to nym-zone/easyseed that referenced this pull request Jan 11, 2018
These are generated with easyseed and the bip39_vectorgen.sh script from
5f35cd0.  There are vectors in twelve languages:  The eight in the BIP
repository, and four more with pending proposals.

Three of the vectors for proposed languages are for wordlists which I
have modified:

 - Russian (234c66c, bitcoin/bips#432)
 - Ukrainian (08a05b4, bitcoin/bips#442)
 - Czech (ba25dfa, bitcoin/bips#493)

The wordlist for Indonesian (bitcoin/bips#621) is unmodified from the
proposal.

Ironically, easyseed does not yet self-test itself with these.  That
will be added in a future release, to verify consistency between builds.
For now, I publish these to aid in interoperability testing between
implementations.
@ZilvinasKucinskas
Copy link

So is it ok to implement this Russian wordlist in the wallet?

What are the rules of accepting language to BIP39 by the community?

@dabura667
Copy link

You can implement any wordlist you want, and Electrum will properly recover it. (Though it will not detect checksum errors)

Other wallets are poorly implemented.

@DonaldTsang
Copy link

#789

@luke-jr luke-jr closed this Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.