Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bruteforce should score by frequency/rank #35

Closed
tewalds opened this issue Oct 29, 2013 · 2 comments
Closed

bruteforce should score by frequency/rank #35

tewalds opened this issue Oct 29, 2013 · 2 comments

Comments

@tewalds
Copy link

tewalds commented Oct 29, 2013

A very common password style is take the first letter of each word in a sentence/phrase, possibly with some substitutions. This leads to a fairly random looking password that is easy to remember, but hard to brute force. The letters are not randomly distributed however, as they're related to the frequencies of letters as the first letter of words. There are far more words starting with s,c,p than with x,z,y,q or numbers. Thus instead of treating it as cardinality 26 for any lower case letter, treat each letter individually based on its rank in the list of 95 printable characters.

You could get this rank by using something like this:
$ cat /usr/share/dict/american-english | cut -c 1 | uniq -c | sort -n
or $ cat /usr/share/dict/american-english | cut -c 1 | tr A-Z a-z | sort | uniq -c | sort -n
Alternatively you could get the rank based on the character frequencies in the password lists, which would help with the frequencies for numbers and special characters.

@lowe
Copy link
Collaborator

lowe commented Sep 24, 2015

Hi @tewalds, I'm going to close this as a wontfix, but thanks anyway for suggesting this approach. My issue is that this adds too big an assumption about how people choose passwords; I haven't seen data to support that this is a common strategy, and even if 10% of all passwords used this scheme, it would still be incorrect to apply to 90% of other passwords.

@lowe lowe closed this as completed Sep 24, 2015
@tewalds
Copy link
Author

tewalds commented Sep 24, 2015

Instead of simply closing this as wontfix, why not check the character frequencies in passwords that don't follow simple patterns. You've got a large password database so this should be pretty easy to check. If this is a valid way of reducing the entropy as shown in real passwords, then real password crackers probably use it effectively and it should be used here. If it doesn't reduce entropy much, then wontfix seems perfectly reasonable. I've read articles about password crackers downloading giant corpuses (lyrics, movie quotes, wikipedia, etc) and using sentences from there fairly effectively, though I'm having trouble finding a reference for that. What I did find is another similar idea of using markov chains, which should have similar effects: https://www.trustwave.com/Resources/SpiderLabs-Blog/Hashcat-Per-Position-Markov-Chains/ .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants