Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Portuguese wordlist to BIP39 #998

Merged
merged 1 commit into from Dec 20, 2020
Merged

Conversation

sabotag3x
Copy link
Contributor

The Portuguese wordlist was carefully checked manually by Portuguese and Brazilians in order to achieve a high level of quality. All the words are commonly used in both countries.

In addition the Portuguese wordlist was revised using Python in order to check the Levenshtein distance, words already used in other mnemonic sets and first 4 characters rules.

More details on the word selection process can be found in the Bitcointalk's Portuguese section.

Portuguese wordlist rules:

  1. Words can be uniquely determined typing the first 4 characters.
  2. No accents or special characters.
  3. No complex verb forms.
  4. No plural words, unless there's no singular form.
  5. No words with double spelling.
  6. No words with the exact sound of another word with different spelling.
  7. No offensive words.
  8. No words already used in other language mnemonic sets.
  9. The words which have not the same spelling in Brazil and in Portugal are excluded.
  10. No words that remind negative/sad/bad things.
  11. No very similar words with 1 letter of difference.

@bitmover-studio
Copy link
Contributor

The idea to create this wordlist began in bitcointalk.org forum. This is the thread where all details were discussed along its creation:
https://bitcointalk.org/index.php?topic=5272106.

We used python scripts to help us check for those rules:

  1. Words can be uniquely determined typing the first 4 characters.
  2. No words already used in other language mnemonic sets.
  3. No very similar words with 1 letter of difference. (Levenshtein distance > 1)

@promag
Copy link
Member

promag commented Sep 19, 2020

See #720.

@bitmover-studio
Copy link
Contributor

bitmover-studio commented Sep 20, 2020

See #720.

That list has many problems:

1 - It is inactive for 2 years.

2- duplicated words:
58 - ampola
60 -ampola

3 - Problems with repeated word from spanish:
bonsai

4 - Words that cannot be uniquely determined typing the first 4 characters.
[53, 569, 570, 630, 721, 765, 1060, 1120, 1690, 1894]
esquadro - esquerda esqui
ferrolho - ferrugem
garrafa - garrote
gracejo - gracioso
magnata - magno
sentado - sentido
trilho - trilogia

5 - And Levenshtein distance < 1
['acidez - avidez',
'adiante - diante',
'aflito - afoito',
'afoito - aflito',
'afta - anta',
'agito - apito',
'agulha - fagulha',
'alho - olho',
'alvo - alho',
'anexo - nexo',
'anta - afta',
'apito - apto',
'apto - apito',
'areia - aveia',
'argila - argola',
'argola - argila',
'assado - passado',
'ator - fator',
'aveia - veia',
'avidez - acidez',
'bafo - safo',
'bagulho - barulho',
'bainha - rainha',
'balada - salada',
'balsa - valsa',
'banho - ganho',
'barata - batata',
'barulho - bagulho',
'basta - besta',
'batata - barata',
'beato - boato',
'beco - bico',
'beira - feira',
'belo - selo',
'bento - tento',
'besta - festa',
'bico - beco',
'bloco - floco',
'boato - beato',
'boldo - bolso',
'bolha - rolha',
'bolso - boldo',
'bossa - fossa',
'botina - rotina',
'brado - irado',
'brando - brado',
'briga - brita',
'brilho - trilho',
'brita - briga',
'bromo - broto',
'broto - bromo',
'bula - lula',
'bule - bula',
'busto - custo',
'butano - tutano',
'cabelo - camelo',
'cabo - nabo',
'cacho - tacho',
'caixa - faixa',
'camelo - cabelo',
'caro - raro',
'casca - lasca',
'ceia - veia',
'cera - hera',
'cereja - cerveja',
'cerrado - errado',
'cerveja - cereja',
'cidade - idade',
'cisco - risco',
'coceira - coleira',
'coelho - joelho',
'coice - foice',
'colar - molar',
'coleira - moleira',
'copeiro - coveiro',
'corja - coruja',
'corno - morno',
'coruja - corja',
'corvo - corno',
'couro - touro',
'coveiro - copeiro',
'cuia - guia',
'cunhado - punhado',
'custo - busto',
'demente - semente',
'dente - rente',
'diante - adiante',
'dica - doca',
'diodo - iodo',
'doador - voador',
'dobrado - dourado',
'doca - dica',
'doceiro - roceiro',
'dois - pois',
'domador - doador',
'domo - gomo',
'dotado - lotado',
'dourado - dobrado',
'dublado - nublado',
'dueto - gueto',
'efeito - refeito',
'efusivo - elusivo',
'eira - feira',
'eixo - seixo',
'elusivo - efusivo',
'embolado - empolado',
'empolado - embolado',
'enxame - exame',
'errado - cerrado',
'escola - esmola',
'esmola - escola',
'exame - vexame',
'facada - sacada',
'fagulha - agulha',
'faixa - caixa',
'falta - malta',
'fasor - fator',
'fator - fasor',
'favela - fivela',
'febre - lebre',
'feio - seio',
'feira - fera',
'feixe - peixe',
'feno - seno',
'fera - hera',
'festa - fresta',
'feto - reto',
'figa - viga',
'fita - figa',
'fivela - favela',
'fixo - lixo',
'floco - foco',
'fluxo - luxo',
'focinho - mocinho',
'foco - toco',
'fogo - logo',
'foice - coice',
'folia - polia',
'fonte - monte',
'forno - morno',
'forte - morte',
'fosco - foco',
'fossa - bossa',
'freio - frevo',
'frente - rente',
'fresta - festa',
'frevo - trevo',
'friagem - triagem',
'fronte - frente',
'frota - rota',
'funil - fuzil',
'fuzil - funil',
'galho - ganho',
'ganho - galho',
'garoto - maroto',
'gaveta - gazeta',
'gazeta - gaveta',
'geada - gemada',
'gelo - selo',
'gemada - geada',
'gemido - temido',
'genro - tenro',
'giga - viga',
'goela - moela',
'goleiro - poleiro',
'gomo - domo',
'gongo - longo',
'gorro - jorro',
'gosto - rosto',
'gralha - tralha',
'grato - prato',
'gruta - truta',
'gueto - dueto',
'guia - gula',
'gula - lula',
'hera - fera',
'hiena - viena',
'horto - torto',
'idade - cidade',
'ilustre - lustre',
'impune - imune',
'imune - impune',
'inapto - inepto',
'incolor - indolor',
'inculto - insulto',
'indolor - incolor',
'inepto - inapto',
'inferno - inverno',
'insulto - inculto',
'inverno - inferno',
'iodo - diodo',
'irado - virado',
'isolado - solado',
'janela - panela',
'jarro - jorro',
'jato - tato',
'jeito - peito',
'joelho - coelho',
'jogo - logo',
'joio - jogo',
'jorro - jarro',
'jota - rota',
'juba - tuba',
'julho - junho',
'junho - julho',
'juro - ouro',
'ladeira - madeira',
'lama - lhama',
'lareira - ladeira',
'lasca - casca',
'lastro - mastro',
'latente - patente',
'lavado - levado',
'lavrado - lavado',
'lebre - febre',
'legado - negado',
'leigo - meigo',
'leito - peito',
'lenda - tenda',
'lenha - lenda',
'lesado - pesado',
'lesma - resma',
'levado - nevado',
'lhama - lama',
'ligado - legado',
'ligeiro - lixeiro',
'limbo - lombo',
'limpo - olimpo',
'lividez - vividez',
'lixa - rixa',
'lixeiro - ligeiro',
'lixo - luxo',
'locador - tocador',
'logo - longo',
'loja - soja',
'lombo - tombo',
'lona - tona',
'longo - logo',
'lotado - dotado',
'lula - gula',
'lustre - ilustre',
'luxo - lixo',
'machado - rachado',
'macio - macro',
'macro - micro',
'madeira - ladeira',
'magno - mogno',
'malhado - malvado',
'malta - falta',
'malvado - malhado',
'mangue - sangue',
'maroto - garoto',
'mastro - lastro',
'mato - tato',
'meia - veia',
'meigo - leigo',
'melado - velado',
'mesa - meia',
'miado - mimado',
'micro - macro',
'mimado - rimado',
'mocinho - moinho',
'moedor - roedor',
'moela - goela',
'mogno - morno',
'moinho - mocinho',
'molar - colar',
'moleira - coleira',
'molho - olho',
'monge - monte',
'monte - morte',
'morno - mogno',
'morse - morte',
'morte - morse',
'moto - mato',
'mudez - nudez',
'mugido - rugido',
'munido - zunido',
'murro - urro',
'nabo - nato',
'nato - tato',
'navio - pavio',
'negado - nevado',
'nevado - negado',
'nexo - anexo',
'nobreza - pobreza',
'noivo - novo',
'nojo - novo',
'nono - sono',
'nora - tora',
'nosso - vosso',
'novo - nono',
'nublado - dublado',
'nudez - mudez',
'oceano - octano',
'ocioso - odioso',
'octano - oceano',
'odioso - ocioso',
'olho - molho',
'olimpo - limpo',
'orelha - ovelha',
'osso - vosso',
'ouro - touro',
'ousado - usado',
'ovelha - orelha',
'pagem - vagem',
'pampa - tampa',
'panela - janela',
'parado - tarado',
'parto - perto',
'passado - assado',
'patente - potente',
'pavio - navio',
'peito - perto',
'peixe - feixe',
'peludo - veludo',
'penhor - senhor',
'pensado - pesado',
'pente - rente',
'pequisa - pesquisa',
'perito - perto',
'perto - perito',
'pesado - pescado',
'pescado - pesado',
'pesquisa - pequisa',
'peste - pente',
'picado - pirado',
'pirado - virado',
'pobreza - nobreza',
'poeira - zoeira',
'poente - potente',
'pois - dois',
'poleiro - goleiro',
'polia - polpa',
'polpa - polia',
'pombo - tombo',
'pontal - postal',
'porco - pouco',
'porque - torque',
'posse - tosse',
'postal - pontal',
'potente - poente',
'pouco - rouco',
'pouso - pouco',
'praga - praia',
'praia - praga',
'pranto - prato',
'prato - preto',
'prazo - prato',
'pregado - prezado',
'preto - reto',
'prezado - pregado',
'profeta - proveta',
'proveta - profeta',
'prumo - rumo',
'punhado - cunhado',
'punido - zunido',
'rabada - rajada',
'rachado - machado',
'rainha - bainha',
'raio - raso',
'raiz - raio',
'rajada - rabada',
'ralo - talo',
'raro - raso',
'raso - raro',
'reator - reitor',
'recente - repente',
'redator - redutor',
'redutor - sedutor',
'refeito - efeito',
'regente - repente',
'reitor - reator',
'rente - pente',
'repente - regente',
'resma - lesma',
'reto - preto',
'rifado - rimado',
'rimado - rifado',
'ripa - rixa',
'risada - visada',
'risco - cisco',
'rixa - ripa',
'roceiro - roteiro',
'rodado - rogado',
'roedor - moedor',
'rogado - rodado',
'rolante - volante',
'rolha - bolha',
'rolo - tolo',
'rombo - tombo',
'rosto - gosto',
'rota - jota',
'roteiro - roceiro',
'rotina - botina',
'roubo - rouco',
'rouco - roubo',
'roxo - rolo',
'rugido - mugido',
'ruivo - uivo',
'rumo - prumo',
'sacada - salada',
'sadio - vadio',
'safira - safra',
'safo - bafo',
'safra - safira',
'salada - sacada',
'sangue - mangue',
'sarda - sarna',
'sarna - sarda',
'sebo - seno',
'secto - septo',
'seda - seja',
'sedutor - redutor',
'seio - seno',
'seita - seiva',
'seiva - seita',
'seixo - seio',
'seja - soja',
'selado - velado',
'selo - silo',
'semente - somente',
'senhor - penhor',
'seno - sono',
'sentado - sentido',
'sentido - sentado',
'septo - secto',
'setor - vetor',
'silo - siso',
'silvo - silo',
'siso - silo',
'sitiado - situado',
'situado - sitiado',
'socado - sovado',
'sogro - soro',
'soja - soma',
'solado - sovado',
'soma - soja',
'somente - semente',
'sono - soro',
'sonso - sono',
'soro - sono',
'sovado - solado',
'suado - sugado',
'suco - sulco',
'sueco - sulco',
'sugado - suado',
'sujo - suco',
'sulco - sueco',
'tacho - cacho',
'taipa - tampa',
'tala - vala',
'talo - tolo',
'tampa - taipa',
'tanto - tento',
'tapado - tarado',
'tarado - tarjado',
'tarjado - tarado',
'tato - tatu',
'tatu - tato',
'tecido - temido',
'teia - veia',
'temido - tecido',
'tenda - lenda',
'tenor - tensor',
'tenro - tento',
'tensor - tenor',
'tento - tenro',
'testado - tostado',
'tigela - tijela',
'tijela - tigela',
'tintura - tontura',
'toalha - tralha',
'tocador - locador',
'toco - troco',
'tolo - topo',
'tomada - topada',
'tombo - rombo',
'tona - tosa',
'tontura - tintura',
'topada - tomada',
'topo - tolo',
'tora - tosa',
'torque - porque',
'torto - horto',
'tosa - tora',
'tosse - posse',
'tostado - testado',
'touro - ouro',
'tralha - toalha',
'trama - trava',
'trava - trova',
'treco - troco',
'treta - truta',
'trevo - treco',
'triagem - friagem',
'trilho - brilho',
'troco - treco',
'trova - trava',
'trufo - trunfo',
'trunfo - trufo',
'truta - treta',
'tuba - juba',
'tucano - tutano',
'turbo - turvo',
'turco - turvo',
'turvo - turco',
'tutano - tucano',
'uivo - ruivo',
'umidade - unidade',
'unidade - umidade',
'urina - usina',
'urro - urso',
'urso - urro',
'usado - ousado',
'usina - urina',
'vadio - vazio',
'vaga - zaga',
'vagem - viagem',
'vaia - veia',
'vaidade - validade',
'vala - valsa',
'validade - vaidade',
'valsa - vala',
'vasto - visto',
'vazio - vadio',
'veado - velado',
'vedado - velado',
'veia - vaia',
'velado - veludo',
'veludo - velado',
'vetor - setor',
'vexame - exame',
'viagem - virgem',
'vibrado - virado',
'videira - viseira',
'viela - vitela',
'viena - viela',
'viga - vigia',
'vigia - viga',
'virado - vibrado',
'virgem - viagem',
'visada - risada',
'viseira - videira',
'visto - xisto',
'vitela - viela',
'vividez - lividez',
'voador - doador',
'volante - votante',
'vosso - osso',
'votante - volante',
'xisto - visto',
'zaga - vaga',
'zoeira - poeira',
'zunido - punido']

Additionally, in the remaning words there are a lot of words which are negative and offensive, such as defunto.

@ninjastic
Copy link
Contributor

ninjastic commented Sep 27, 2020

Just squashed all the 151 commits into a single one. Also added @brenorb as a co-author.

@sabotag3x
Copy link
Contributor Author

@luke-jr

Co-authored-by: Breno Rodrigues Brito <brenorb@gmail.com>
Co-authored-by: ninjastic <ninjasticdev@protonmail.com>
Co-authored-by: sabotag3x <sabotage.sta@gmail.com>
Co-authored-by: bitmover <67111541+bitmover-studio@users.noreply.github.com>
Co-authored-by: alegotardo <40860228+alegotardo@users.noreply.github.com>
Co-authored-by: kuthullu <kuthullu@gmail.com>
Co-authored-by: Trimegistus <trimegisto@rocketmail.com>
@sabotag3x
Copy link
Contributor Author

@slush0 @prusnak @voisine @ebfull

So, I know that you may only care about the english list and that's why no new wordlist have been accepted in recent years.

However, BIP-0039 was created to help users to restore their wallets as it's easier to write down 12 words than 64 random characters. (well, you know that better than me since you are the authors)

I'll use your own words: "a group of easy to remember words"

English words aren't easy to remember for non-english speakers. As well as portuguese words may not be easy for you, for example. In addition, a foreign language is more likely to cause typos and, at worst, make people lose their BTC.

More than 250 million people speak portuguese, it's one of the most widely spoken languages in the world and it's the native language in Brazil, Portugal, Angola, Mozambique and other smaller countries. Moreover, few of them speak english.

My point is that the BIP-0039 method should be easier for non-english speakers as well.

PS: Let me know if you need more portuguese speakers to review the wordlist before accepting it.

@p2w34
Copy link

p2w34 commented Oct 23, 2020

My comment may sound harsh for both the list creators and the maintainers of BIP0039 but nevertheless, I am still going to make it.
By the list creators, I mean not only the Portuguese list but all recently created lists. This applies also to me, as I made this mistake as well.

Before the beginning of your work - have you asked any of BIP0039 maintainers whether there is a chance that your work will be merged in? Especially in the presence of many unmerged word lists proposals for other languages? By having a look at closed PRs one can see when exactly the last PR with a word list was merged in - it should be discouraging. However, the word lists creators seem to ignore this fact and then try to somehow push it through.

A massive amount of time is wasted on all those hanging forever PRs. I would be really glad to see a clear direction set here. Anything would be better than the current situation.

If the new word lists are meant to be never accepted then I would expect that the maintainers would clearly state the new word lists are not accepted. The ones already merged in would be the official BIP0039 word lists or this could be limited to just the English list. Does it solve the problem? No, because the word lists are for sure needed. I do not know how to proceed from this point (most likely new BIP, discussion on the mailing list etc.), but thanks to it people would not waste time!

@fortesp
Copy link

fortesp commented Nov 4, 2020

Why is this not merged yet?

@mateusnds
Copy link

Hey, list will be merged?

@fortesp
Copy link

fortesp commented Dec 17, 2020

Sorry to say, but i am not sure at this point who or what exactly is the Bitcoin "community" when we have pull requests such as this, waiting to be merged for months. Some pull requests are even for years. This is the case to actually ask who is the "boss" of this project? Because community here does not seem to exist.

@sipa
Copy link
Member

sipa commented Dec 17, 2020

@fortesp BIPs are a mechanism for publishing ideas/proposals. Accepting changes to those proposals is the BIP's authors responsibility. If they don't like a particular change, you're always welcome to publish your own competing proposal.

@brenorb
Copy link
Contributor

brenorb commented Dec 17, 2020

I'm not sure we have a large community of Portuguese speaking people who also speak English and use Github. However, isn't it really one more strong argument in favor of having a Portuguese BIP039?
I'm not really sure of what's missing for this BIP to be merged.

@sipa
Copy link
Member

sipa commented Dec 17, 2020

@brenorb Agreement from the BIP authors is the only thing that matters.

@fortesp
Copy link

fortesp commented Dec 18, 2020

@sipa Agreed. But there is no feedback from any of the authors that i can see. Not sure what is missing or maybe not compliant, i did not check it myself to be honest.

@brenorb
Copy link
Contributor

brenorb commented Dec 20, 2020

@sipa Ok, I'm one of the BIP authors and I'm pretty sure we all agree on it. What is the next step we need to do in practice? Is there a specific button to click? Can you show us the step-by-step process?

@prusnak
Copy link
Contributor

prusnak commented Dec 20, 2020

Ok, I'm one of the BIP authors and I'm pretty sure we all agree on it.

You are not a BIP39 author. I am (one of them).

Let's get this merged in.

Edit: ACK

@brenorb
Copy link
Contributor

brenorb commented Dec 20, 2020

@prusnak sorry, for the mistake. I'm one of the authors of this proposal.

@luke-jr
Copy link
Member

luke-jr commented Dec 20, 2020

Let's get this merged in.

@prusnak Interpreting that as an ACK; let me know if I should revert

@luke-jr luke-jr merged commit cf0b529 into bitcoin:master Dec 20, 2020
@brenorb brenorb mentioned this pull request Jun 11, 2021
crypto-punk added a commit to crypto-punk/bips that referenced this pull request Sep 20, 2022
Merge pull request bitcoin#998 from sabotag3x/master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet