Skip to content
Aaron Toponce edited this page May 21, 2023 · 19 revisions

Password and Passphrase Generator Motivations

I set out to create the most extensive password generator out there, covering every possible situation while providing a clean UX without a lot of options, knobs, and buttons to confuse the user, and make the passwords less secure.

Diceware

This generator supports all the official lists linked on diceware.com. Every word list is 7,776 entries meant to be used with 5 fair six-sided dice. Each word provides about 12.9248-bits of entropy. As of July 20, 2022, the support list is as follows:

  • BG: Bulgarian (Assen Vassilev's list)
  • CA: Catalan
  • CN: Chinese (Pinyin input)
  • CZ: Czech
  • DA: Danish
  • DE: German
  • EL: Greek
  • EN: English
    • 8k word list (default)
    • Alan Beale alternate list
    • Natural Language Passphrases list
  • EO: Esperanto
  • ES: Spanish
  • ET: Estonian
  • EU: Basque (Euskara)
  • FI: Finnish
  • HU: Hungarian
  • IT: Italian
  • IW: Hebrew
  • JP: Japanese (Romanji)
  • LA: Latin
  • MI: Maori
  • NL: Dutch
  • NO: Norwegian
  • PL: Polish
  • PT: Portuguese
  • RO: Romanian
  • RU: Russian
  • SK: Slovak
  • SL: Slovenian
  • SV: Swedish
  • TR: Turkish

EFF

In the middle of 2016, the Electronic Frontier Foundation created their own Diceware lists that use more frequently used English words that are easier to remember and type than the original Diceware word list.

They created three word lists:

  • The long list- Mimics the Diceware word list with 7,776 entries to be used with 5 fair six-sided dice. The words average a length of around 7 characters per word. Each word provides about 12.9248-bits of entropy.
  • The short list (Default)- A shorter word list with 1,296 entries to be used with 4 fair six-sided dice. Each word has a maximum length of 5 characters. Each word provides about 10.3399-bits of entropy.
  • The distant list- Another short word list of 1,296 entries, but each word has a minimum edit distance of three, and a shared prefix length of 3 characters. Each word provides about 10.3399-bits of entropy.

Regarding the FANDOM word lists

In 2018, the EFF created 4 additional themed word lists (Game of Thrones, Harry Potter, Star Trek, and Star Wars) with 4,000 unique words each to be used with a d20 die. Unfortunately, the word lists have encoding bugs, which I fixed, but those fixes have not been patched upstream. Further, the lists are lightly themed, so they don't really stand out as themed passphrases, but more as generic English passphrases. They are under the "Unofficial" option group in the drop-down menu.

Cryptocurrency

Bitcoin (English)

The Bitcoin word list is designed to be a mnemonic code or sentence for generating deterministic Bitcoin wallets. The list provides 2,048 words which provides exactly 11-bits of entropy per word. The English word list has an edit distance of 4 characters, making it possible for automation systems to autocomplete the words after the first four characters have been typed. The words are also verbally unambiguous, making it desirable for passwords spoken in noisy environments, such as server rooms. Every generated passphrase has a check word calculated via SHA-256 per the BIP-0039 proposal.

As of July 20, 2022, the supported list is as follows:

  • CN: Chinese (Simplified and Traditional)
  • CZ: Czech
  • EN: English (default)
  • ES: Spanish
  • FR: French
  • IT: Italian
  • JP: Japanese (Hiragana)
  • KO: Korean (Hangul)
  • PT: Portuguese

Monero

The Monero word lists are designed to be a mnemonic code or sentence for generating deterministic Monero wallets. The list provides 1,626 words which provides approximately 10.667 bits of entropy per word. Every generated passphrase has a check word calculated via CRC32 per the repository.

As of July 20, 2022, the supported list is as follows:

  • CN: Chinese
  • DE: German
  • EN: English
  • EO: Esperanto
  • ES: Spanish
  • FR: French
  • IT: Italian
  • JBO: Lojban
  • JP: Japanese (Hiragana)
  • NL: Dutch
  • PT: Portuguese
  • RU: Russian

Alternate

This is a number of miscellaneous word lists of varying size and purposes. These are other language lists that have not been adopted by Diceware officially, so they're placed here in the meantime:

  • Afrikaans: 6567 words
  • Belarusian: 5,676 words
  • Elvish: 7,776 words
  • Klingon: 2,604 words
  • Mongolian: 4,124 words
  • Serbian: 8,670 words
  • Ukranian: 7,000 words

In terms of English word lists, they are:

  • Acronyms: 43,785 words (currently using all available English word lists)
  • Colors: 1,029 words
  • Common Words Only: 18,632 words (currently using a selected subset of available English word lists)
  • Deseret Alphabet: 7,776 words
  • Every Word List: 43,785 words (currently using all available English word lists)
  • Lord of the Rings: 7,776 words
  • PGP: 512 words
  • Pokerware: 5,304 words
  • RockYou: 7,776 words
  • S/KEY: 2,048 words
  • Shavian Alphabet: 7,776 words
  • Simpsons: 5,000 words
  • Trump: 8,192 words
  • Verb, Adjective, Noun
    • Verbs: 432 words
    • Adjectives: 373 words
    • Nouns: 402 words
  • Wordle: 5,790 words

Acronyms

An effort to help make passphrases more memorable, or at least more interesting. Every English word list is combined into one large list, then a random candidate word is selected. A random word is then generated starting with each character from the candidate word. For example, if the candidate acronym word is "fireball", the passphrase might be "FAY-importer-retrain-easts-bikers-apers-livers-Lebron".

Colors

New "Colors" generator, using color names for the passphrase. Each color name is styled with the foreground color for that name. If the color is too light (based on a luminosity threshold), the word is outlined to make it easier for viewing. It's still ASCII text, so it can be copied to your clipboard. The colors names have come from Wikipedia, W3C, X11, and "Name That Generator" ntg.js.

People struggle recalling passwords and passphrases, so using color names colored with the name of that color may help in recalling the passphrase. If you can remember the color itself, maybe that will help in remembering the name of that color, and as an extension, remembering the whole passphrase. Those with synesthesia may especially benefit.

There are 1,029 unique colors in the word list, providing about 10-bits of entropy per word.

Common Words Only

This project ships a lot of English word lists, so those built specifically with common English words in mind are combined to create one large list filled with unique lowercase words. This increases the entropy per word and as a result, lowers the number of random words per passphrase.

Deseret Alphabet

This is an alternate English alphabet invented by the Mormons in the 19th century after they settled in Utah. While in never gained widespread adoption, it flourished for a time and saw a number of Mormon publications typeset in the Deseret Alphabet, such as the Book of Mormon. The Deseret Alphabet is a pure phonetic alphabet. It is part of the Unicode standard.

Every Word List

Using every ASCII English word list at my disposal, a massive word list of unique words is available to maximize the entropy per word. Because many lists are being utilized, many obscure words will be available for random selection.

Lord of the Rings

While the EFF organization created the Harry Potter, Game of Thrones, Star Trek, and Star Wars word lists to be dossed with a d20, they missed possibly one of the greatest fantasy words of all time. This word list comes straight from Eyeware project and uses the 8k word list.

Elvish

This generator is for entertainment value only. The word list consists of 7,776 words, making it suitable for Diceware, and provides about 12.9248-bits of entropy per word. However, because the generator is strictly electronic, and I haven't assigned dice roll values to each word, I may bump this up to 8,192 words providing exactly 13-bits of entropy per word. The word list was built from the Eldamo lexicon.

Klingon

The is another generator that is strictly for entertainment value only. As I say that, I personally know two people who speak (fluent?) Klingon, so maybe this generator will be of value to them. This word list comes from the Klingon Pocket Dictionary, and my word list provides exactly 2,604 unique words from the 3,028 words in the Klingon language. Thus, each word provides about 11.3465-bits of entropy.

PGP

The PGP word list was created to make reading hexadecimal strings easier and phonetically unambiguous. It comprises of exactly 256 words providing exactly 8-bits of entropy per word. This generator works well in noisy environments, such as server rooms, where passwords need to be spoken from one person to another to enter into a physical terminal.

Pokerware

Many people know about Diceware and the fact that you can use a standard set of 6-sided dice to create passphrases. However, Christopher Wellons created a passphrase system using a standard 52-card deck of playing cards. To be fair, the system is a little more cumbersome that tossing dice, but it's still a unique way to generated passphrases, so it's included here. Other playing card methods include Cardware by Sam Schlinkert, and Deckware which turns a shuffled deck into a unique hexadecimal string that you could use with Niceware (all the -wares).

Rockyou

In 2009, the RockYou company experienced a data breach where over 32 million user accounts and passwords were leaked to the Internet. I took the top 7,776 most commonly used RockYou passwords from that data breach to compile a word list for passphrases. This list is used solely as an educational tool to show that even though the list is made up of exposed passwords, secure passphrases can still be created from it. Each word provides about 12.9248-bits of entropy.

S/KEY

This is technically a one-time password system defined in RFC 1790. At the end of the RFC, it includes a list of 2,048 unique words of 1-4 characters which are used here. Every word is in uppercase.

Shavian Alphabet

Like this Deseret Alphabet, this is an alternative English spelling alphabet. However, this was created as part of a contest as dictated by playright Bernard Shaw in his will. Shaw was an advocate of English spelling reform, so a trust was set aside to create an alternative phonetic alphabet with a 1:1 correspondence to English phonemes. Rondald Kingsley Read won the competition and this alphabet is the result.

Simpsons

This is a list of 5,000 words, providing about 12.2877-bits of entropy per word. The goal of this generator is also educational to show that any source of words can be used for a password generator, include a television series of episodes. However, because this list contains the most commonly spoken 5k words from the Simpson's episodes, a good balance of verbs, nouns, adjectives, etc. are supplied. As such, the generated passphrases seem to be easier to read, and less noun-heavy than the Diceware or EFF word lists. These passphrases may just be the easiest to recall.

Trump

This generator was initially built for entertainment purposes, but ended up having the advantage of providing a good balanced passphrase of nouns, verbs, adjectives, etc. much like the Simpson's generator. As such, these passphrases may be easier to recall, because they are more likely to read as valid sentences than the Diceware or EFF generators. This list is pulled from Donald J. Trump's Twitter account. The list is always growing, currently at 5,186 words providing about 12.3404-bits of entropy per word.

Verb, Adjective, Noun

Inspired by Ryan Castelluci with Storybits, I created 3 very small alphabets of verbs, adjectives, and nouns using the Oxford 5,000 list. The three lists are free of prefix words, free of suffix words, fully decodable, and consist of very short words. Even though the lists themselves are short, a verb/adjective/noun triplet contains ~25.94 bits of entropy. The verbs were adjusted to find the shortest verb form (past tense, plural, etc.) and the nouns were all made plural. The result if the triplet should be an almost-perfect English phrase to aid in memorability.

Wordle

Wordle took the Internet by storm in 2022, so much so that the New York Times acquired it outright (likely not wanting to make the same mistake they did with sudoku). Every Wordle word is 5 characters in length. Not wanting to step on copyrights, additional words have been added to the official list to place the use of the word list under the fair use clause. Because I'm not selling the word list or otherwise making money off it, I should be clear of any copyright claims. However, if the New York Times send their lawyers after me, the list will be pulled. Enjoy it while it lasts?

Pseudowords

The pseudowords generator is a cross between unreadable/unpronounceable random strings and memorable passphrases. They are pronounceable, even if the words themselves are gibberish. They are generally shorter in practice than passphrases, and longer than pure random strings. The generators are here to show what you can do with random pronounceable strings.

Apple, Inc.

Via macOS and iOS, Safari can suggest passwords in HTML form fields for users when registering for services. The passwords are pronounceable pseudowords using syllables with the structure of "consonant-vowel-consonant". Three hyphenated words of two syllables each are generated, where one character is randomly capitalized, and another character is replaced with a digit on a word boundary. This ensures that every password has 16 lowercase alphabetic characters, 1 uppercase alphabetic character, 1 digit, and 2 hyphens. Thus, this password structure should meet the bare minimum requirements for most services with strict password requirements.

See my Twitter thread breaking down the password structure and analyzing its security. Each "word" provides roughly 24-bits of entropy (it changes as more words are appended), so a two pseudoword password only provides about 48-bits of entropy, which is too low for this generator, as the minimum is 55-bits. Three words provide about 72-bits of entropy, and four words provide about 95-bits of entropy, which is why you'll only see 3-word and 4-word passwords with this generator here.

Bubble Babble

Bubble Babble is a hexadecimal encoder, with builtin checksumming, initially created Antti Huima, and implemented in the original proprietary SSH tool (not the one by the OpenSSH developers). Part of the specification is that every encoded string begins and ends with "x". However, rather than encode data from the RNG, it is randomly generating 5-characters words in the syntax of "". As such, each 5-character word, except for the end points, provides 21*5*21*5*21=231,525 unique combinations, or about 17.8208-bits of entropy. The end points are in the syntax of "x" or "x, which is about 21521*5=11,025 unique combinations, or about 13.4285-bits of entropy.

Daefen

Daefen is an encoding system for encoding binary into pronounceable words. It encodes the binary data into base 3456, and replaces each with a syllable. The syllables are then joined together to create a phrase. The goal isn't necessarily to memorize the phrase, but instead jot it down on a piece of paper or speak it to another. The intended use is to build mnemonics for private keys. In this project, the pseudowords are hyphenated.

Koremutake

Koremutake is another encoding system designed to create pronounceable words from very large numbers. It was initially designed to create a pronounceable URL for URL shorteners. Unlike Daefen and Bubble Babble however, Koremutake was specifically designed with memorization in mind. The syllables are short and verbally distinct. In this project, the syllables are all concatenated together to create one long string.

Lepron

Like many of the generators here, this Lepron is designed to encode binary strings and pronounceable pseudowords (part 1 is linked, here is part 2, part 3, and part 4 of the series). Each pseudoword is 3-4 syllables in length. Pseudowords are hyphenated.

Letterblock Diceware

Along with Bubble Babble, this pseudoword generator includes an integrated checksum to ensure you typed the phrase correctly. Letterblock Diceware was inspired by Diceware proper, but instead of full words, relies on English bigrams in an attempt to build a pronounceable pseudoword. Unfortunately in practice, most of the generated strings are not pronounceable. Still, it's unique in it attempt to create pronounceable strings with English bigrams, so it's worth inclusion here if for nothing else than the discussions that may come about as a result.

Munemo

Munemo is another binary encoding system, like many others in this category, but also supports signed integers (negative and positive numbers). An encoded negative number with Munemo will start with "xa". Because half of the numbers are negative, this implementation will first randomly determine if to generate a negative number or not, then generate the rest of the string. As such, on average, half of the generated passwords should start with "xa".

Proquints

Another binary-to-text encoding system meant for pronounceable phrases. Proquints generated five-character words that alternate consonant-vowel-consonant-vowel-consonant and provide 16 bits of coverage. What's unique about this encoding scheme is the recognition that it can be used as a mnemonic password. It suggests generating random 32, 48, or 64 bits, converting them to proquints, then telling the user their password.

Urbit

Urbit is a decentralized personal server platform with the aim of deconstructing the client-server model in favor of a federated network of personal servers in a peer-to-peer network with a consistent digital identity. The network is built up with a 128-bit Urbit ID space that consists of "galaxies", "stars", "planets", "moons", and "comets". This ID space uses a mnemonic for identifying components in the network. The ID always starts with the tilde "~" character.

Random

These are random strings provided as a last resort for sites or accounting software that have very restrictive password requirements. These passwords will be some of the shortest generated while meeting the same minimum entropy requirement. Because these passwords are not memorable, they should be absolutely stored in a password manager (you should be using one anyway).

There are two groupings of passwords here: 7-bit graphical ASCII characters and non-ASCII Unicode.

ASCII

  • Base-94- Uses all graphical U.S. ASCII characters (does not include horizontal space). Each character provides about 6.5546-bits of entropy. This password will contain ambiguous characters.
  • Base-85- This is also defined as Ascii85. Each character provides about 6.4094-bits of entropy. This password will contain ambiguous characters.
  • Base-64 (-_)- Uses all digits, lowercase, and uppercase Latin characters, but uses "-_" instead of "+/". This is URL-safe and filename-safe, meaning that any character in this generator can be safely used in URLs as well filenames.
  • Base-62- Uses all digits, lowercase, and uppercase Latin characters. Each character provides about 5.9542-bits of entropy. This password will contain ambiguous characters. This password is also probably the "safest" complex password in terms if input validation on web forms.
  • Base-58- Uses the same encoding for Bitcoin addresses. Uses base-64 encoding, but excludes the IOl0+/ characters. Each character provides about 5.8580-bits of entropy. This password might contain ambiguous characters depending on the user.
  • Base-52- Uses only uppercase and lowercase Latin characters. Each character provides about 5.7004-bits of entropy. This password will contain ambiguous characters.
  • Base-45- Defined by RFC 9285. However, the underscore is used in place of the space. Uses only digits, uppercase Latin characters, and a select set of punctuation characters.. Each character provides about 5.4918-bits of entropy. This password will contain ambiguous characters.
  • Base-36- Uses only digits and lowercase Latin characters. Each character provides about 5.1699-bits of entropy. This password will contain ambiguous characters.
  • Base-32- Uses the characters defined in RFC 4648, which strives to use an unambiguous character set. Each character provides exactly 5-bits of entropy.
  • Base-26- Uses lowercase Latin characters only. Each character provides about 4.7004-bits of entropy. This password might contain ambiguous characters if you struggle with "i" "l" and "j" or "u" and "v".
  • Base-16- Uses all digits and lowercase characters "a" through "f". Each character provides exactly 4-bits of entropy. This password will contain fully unambiguous characters.
  • Base-10- Uses strictly the digits "0" through "9". This is mostly useful for PINs or other applications where only digits are required. Each digits provides about 3.3219-bits of entropy. This password will contain fully unambiguous characters.
  • Base-8- Uses digits from 0 through 7. Each digit provides exactly 3-bits of entropy. This password will contain fully unambiguous characters.
  • Base-2- Uses only digits 0 and 1. Useful for random binary strings in examples or documentation. Each digit provides exactly 1-bit of entropy. This password will contain fully unambiguous characters.
  • Base-4- A type of base-4 using the characters "A", "C", "G", and "T". Each character provides exactly 2-bits of entropy. This password will contain fully unambiguous characters.
  • Coin Flips- An alternate base-2 system with "T" for tails and "H" for heads.

Unicode

BEWARE USING UNICODE FOR PASSWORDS. The point of using Unicode for passwords is almost entirely for research and testing. As the Internet has globally connected all of us, as a service provider and developer, these passwords can be used for testing Unicode support in your platform. As and end user, they really should not be used.

  • Emoji- A fun, but possibly dangerous generator, using Unicode characters defined in the Emoji standard. This password may or may not contain ambiguous characters, depending on how reliably you can tell similar but distinct emoji from each other.

  • ISO 8859-1- This is the full 8-bit ASCII table, with control characters and whitespace removed. As such, the original 94 graphical 7-bit ASCII characters are included, as well and the 94 graphical ASCII characters as defined by ISO 8859-1 which differs from Windows-1252. This can be used as a base point for language testing, including Spanish, Italian, and Portuguese, among at least 30 others. With 188 unique characters, this provides about 7.5546-bits per character.

  • Latin Extended- This is the Latin Extended Additional character table with a full set of 256 graphical characters, providing exactly 8 bits entropy per character. This table differs from Latin Extended A and Latin Extended B. Like the other Unicode generators in this area, this can be useful for password input testing.

  • Mac OS Roman- A character encoding created by Apple Computer, Inc. for use by Macintosh computers. Mac OS Romaan includes the standard 8-bit ASCII table and defines another 128 characters common in several Western languages. As with ISO 8869-1, control and non-graphical characters are removed. The Apple logo U+F8FF in the Corporate Private Use Area has also been removed.

See the further explanation below:

Emoji

With the rise of Unicode and the UTF-8 standard, and the near ubiquitous popularity of smartphones and mobile devices, having access to non-Latin character sets is becoming easier and easier. As such, password forms are more likely supporting UTF-8 on input to allow Cyrillic, Coptic, Arabic, and East Asian ideographs. So, if Unicode is vastly becoming the norm, why not take advantage of it while having a little fun?

This generator uses the font provided by twemoji-colr from Mozilla. There are 3,624 emoji glyphs provided by that font, yielding about 11.8233-bits per glyph. One side-effect, is that even though there is a character count in the generator box, each glyph may be more than 1 byte, so some input forms may count that glyph as more than 1 character. Regardless, the minimum entropy is met, so the emoji password is still secure.