Deprecating Sketch based password oracle.
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
INSTALL
NEWS
README
THANKS
deprecating_sketch.py
deprecating_sketch_test.py
language_model.py
language_model_test.py
password.lst
password_oracle.py
password_oracle_test.py
utils.py
utils_test.py
version

README

Apassword_oracle

Short explanation:

An "oracle" is an external service you can ask questions of.  A
"password oracle" is one that answers questions about password
quality.

This password oracle tracks whether or not a password as been assigned
"recently" where "recent" is tunable and by default "not within the
last 65,536 password allocations." 

It uses a novel data-structure call a "deprecating sketch".  This is a
type of counting bloom-filter and somewhat similar but different than
the "counting-sketch" that Schechter, Herley and Mitzenmacher propose
in http://research.microsoft.com/pubs/132859/popularityISeverything.pdf

This algorithum differed from Schechter et al in how the counters are
used.  Their sketch tracks total membership and occationally increases
a global threshold.  Mine maintains a ring buffer that tracks old
assignments and decrements counters when the password expires from the
ring buffer.  Schechter et al allows "credit" to be built up so a
password can be used many times in succession after a long period of
disuse.  My approach does not allow this.

What makes these class of "password popularity" checkers so neat is if
an adversary downloads a copy of the server's database they will not
be able to extract any useful passwords from it.

Instructions on how to use this server can be found in the docstring
of password_oracle.py

Long explanation: 

Way back in the olden days of computing password security meant
/etc/shadow files, salts, and picking passwords that would be hard for
a determined cracker to guess given a small cluster of faster
computers.

Times have changed.  No longer is your adversary a student picking away
at the internal security of their school's timeshare Unix machine.
The new adversary is in the cloud.

Modern online "cloud" services have a number of tools at their
disposal to prevent automated password guessing attacks on their
networks.  Captcha, per IP address throttle limits and per account
throttle limits.

Some sites (like Google) use captcha to block automated password login
attempts.  In the long run this won't work, computer vision is
improving at a scary fast pace, human vision slogs along to the whims
of evolutionary pressure over thousands and millions of years.  Pretty
soon we'll be scratching our heads and asking our personal AI's to
tell us what the captcha says.  Scratch that.

IP based throttling doesn't work - there is simply no go way to tell
if a particular IP is a household on a DSL, a large companies office
or an entire nation.  IP address exhaustion doesn't help this matter.
And bad-guys have no qualms using bot-nets to spread their nefarious work
across lots and lots of IP addresses.

The only thing that works is "user-name" based throttling.  For the
most part 1990's style password brute forcing can be prevented by
throttling the rate at which the doorknob to any users account can be
turned.

Put yourself in the shoes of a cracker.  You have unlimited IP
addresses.  The only real limit you have is how many passwords you can
try for one account.  How do you maximize your haul.

Pick common passwords.  In the Rock You password corpus over 1% of all
users selected "123456" as their password - the most efficient way to
steal accounts is to just try "123456" on each account and move on if
it doesn't work ... if facebook is the same that should net you a cool
half a million or so accounts.  

So, how do you defend against this?  

You could restrict how often passwords are reused.  Twitter disallows
their 390 most common passwords.  But there are few problems with
that.  First, it requires you have your passwords stored someplace in
the clear so you can figure out what your most common passwords are.
Companies make headlines doing that.

The second problem is users are really predictable in how they modify
their passwords to address password restrictions.  Guess what number
50% of all users that put a number in the middle of their passwords
choose - 4.  And if you look at the text that comes before and after
the "4" its obvious "4" is a stand-in for "for."  Tell a user "you
can't use 123456" and they are likely to use "1234567" instead.  Or
write in "l33tsp3ak."  Simple password restrictions simply shift the
peak around.  You also don't want ti make password selection too
difficult; if the user has to think too hard they are likely to go
away and use another service.  Ideally you want to dynamically measure
the number of times a password occurs and deny the user when the
password they are selecting is "too common", but 

So how do we choose between these two evils; living without the
ability to know which passwords are common and having clear text
passwords?

Turns out we can have our cake and eat it to with a "bloom-filter." 

A bloom-filter is a probabilistic data structure that tracks
approximate set membership; it trades off accuracy and the
possibility it will accidentally claim an item is in a set that it
isn't in exchange for a dramatically reduced storage space.
Bloom-filters have a nice sweet spot where they throw away enough
information that if a bad guy got hold of the filter they would not be
able to reconstruct the set membership (i.e. the clear-text passwords)
but at the same time we have a low enough false positive rate that the
filter is still useful.

A traditional bloom-filter is a write-once structure; once you add
something you can't take it out.  Its okay if more than one user uses
the same password, we just don't want any one password to be used too
often.  This algorithm changes that somewhat, instead of using bits,
we store integers and maintain a deque that tracks when it is time to
remove items from the bloom-filter again.

This won't be the first algorithm to approach this problem.  There is
a fair amount of prior work dating back to the 90's about bloom-filters
and passwords.  Microsoft research recently published a paper
proposing a different approach called a counting sketch. 

The difference between the counting sketch and the deprecating sketch
is how never before used passwords are treated on a long running
system.  Suppose you have a sketch configured to not allow any
password to be used for more than one in every thousand accounts, and
your service as grown to the point where it has a million users.
Suppose a new user created 10 accounts in a short span of time and
tried to assign them all the same albeit never seen before password.
A counting sketch would allow this.  A deprecating sketch would not.
 LocalWords:  docstring