Each entry in the match sequence needs to add some inherent entropy #48

pde · 2014-08-02T00:23:56Z

zxcvbn decomposes each password into a match sequence, and then for each match says, "aha, I can find this part in an English dictionary (7 bits)", "this next piece is a name (4 bits), "this is brute force (9 bits)".

There is an inherent entropy to changing models each time. It's probably not much (2-6 bits per entry in the match sequence, I'm guessing) but at the moment zxcvbn is underestimating passwords that jump between a number of these.

pyramids · 2014-08-12T19:25:18Z

The original author referred to this entropy as structural entropy, and made a documented decision to ignore it ("It’s difficult to formulate a sound model for structural entropy; statistically, I don’t happen to know what structures people choose most, so I’d rather do the safe thing and underestimate", https://tech.dropbox.com/2012/04/zxcvbn-realistic-password-strength-estimation/).

pde · 2014-08-14T02:22:14Z

A very simple but decent model would be to observe the frequency of all
structures across a large password dataset. If dropbox doesn't want to
do this, there are a few blog posts by people with very large password
dbs who might help.

A slightly better model would be to make the structure probabilities
conditional on the preceding structure in a given password.

Or you could go overboard and use
PPM.

Sounds like a fun project for an undergraduate thesis or an intern
somewhere :)

On Tue, Aug 12, 2014 at 12:25:22PM -0700, Björn Stein wrote:

The original author referred to this entropy as structural entropy, and made a documented decision to ignore it ("It’s difficult to formulate a sound model for structural entropy; statistically, I don’t happen to know what structures people choose most, so I’d rather do the safe thing and underestimate", https://tech.dropbox.com/2012/04/zxcvbn-realistic-password-strength-estimation/).

Reply to this email directly or view it on GitHub:
#48 (comment)

Peter Eckersley pde@eff.org
Technology Projects Director Tel +1 415 436 9333 x131
Electronic Frontier Foundation Fax +1 415 436 9993

lowe · 2015-09-24T06:22:26Z

Agreed. @pde, thanks for reporting, I know it's been a while :) Extra entropy for each entry in the match sequence is coming soon. I have a simpler scheme in mind than what you propose, and will update this thread with more soon.

lowe · 2015-10-24T06:02:24Z

After experimenting with different models over the last two weeks, a reasonable length penalty is now implemented in 4.0.1. Try it out, and check the docs in scoring.coffee to see how it works. Feedback appreciated!

lowe mentioned this issue Sep 24, 2015

Singe characters match dictionaries #61

Closed

lowe closed this as completed Oct 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Each entry in the match sequence needs to add some inherent entropy #48

Each entry in the match sequence needs to add some inherent entropy #48

pde commented Aug 2, 2014

pyramids commented Aug 12, 2014

pde commented Aug 14, 2014

lowe commented Sep 24, 2015

lowe commented Oct 24, 2015

Each entry in the match sequence needs to add some inherent entropy #48

Each entry in the match sequence needs to add some inherent entropy #48

Comments

pde commented Aug 2, 2014

pyramids commented Aug 12, 2014

pde commented Aug 14, 2014

lowe commented Sep 24, 2015

lowe commented Oct 24, 2015