Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorimic Complexity Attack on Perl 5.6.1, 5.8.0 #6544

Closed
p5pRT opened this issue May 29, 2003 · 10 comments
Closed

Algorimic Complexity Attack on Perl 5.6.1, 5.8.0 #6544

p5pRT opened this issue May 29, 2003 · 10 comments

Comments

@p5pRT
Copy link

@p5pRT p5pRT commented May 29, 2003

Migrated from rt.perl.org#22371 (status was 'resolved')

Searchable as RT22371$

@p5pRT
Copy link
Author

@p5pRT p5pRT commented May 29, 2003

From scrosby@cs.rice.edu

Hello. We have analyzed this software to determine its vulnerability
to a new class of DoS attacks that related to a recent paper. ''Denial
of Service via Algorithmic Complexity Attacks.''

This paper discusses a new class of denial of service attacks that
work by exploiting the difference between average case performance and
worst-case performance. In an adversarial environment, the data
structures used by an application may be forced to experience their
worst case performance. For instance, hash tables are usually thought
of as being constant time operations, but with large numbers of
collisions will degrade to a linked list and may lead to a 100-10,000
times performance degradation. Because of the widespread use of hash
tables, the potential for attack is extremely widespread. Fortunately,
in many cases, other limits on the system limit the impact of these
attacks.

To be attackable, an application must have a deterministic or
predictable hash function and accept untrusted input. In general, for
the attack to be signifigant, the applications must be willing and
able to accept hundreds to tens of thousands of 'attack
inputs'. Because of that requirement, it is difficult to judge the
impact of these attack without knowing the source code extremely well,
and knowing all ways in which a program is used.

As part of this project, I have examined perl versions 5.6.1 and
5.8.0, and have attacked the hash function. Thus any script that may
hashes untrusted input is vulnerable to our attack. Furthermore, the
structure of the hash functions allows our fast collision generation
algorithm to work. This means that any script written in perl that
hashes a large number of keys from an untrusted source is potentially
subject to a severe performance degradation. Our paper shows a 2000x
degradation when hashing 90,000 strings.

Depending on the application or script, this could be a critical DoS.

The solution for these attacks on hash tables is to make the hash
function unpredictable via a technique known as universal
hashing. Universal hashing is a keyed hash function where, based on
the key, one of a large set hash functions is chosen. When
benchmarking, we observe that for short or medium length inputs, it is
comparable in performance to simple predictable hash functions such as
the ones in Python or Perl. Our paper has graphs and charts of our
benchmarked performance.

I highly advise using a universal hashing library, either our own or
someone elses. As is historically seen, it is very easy to make silly
mistakes when attempting to implement your own 'secure' algorithm.

The abstract, paper, and a library implementing universal hashing is
available at http​://www.cs.rice.edu/~scrosby/hash/.

Scott

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 17, 2003

From @jhi

Thanks for your extensive research on this. We have actually tinkered away
at better hash functions for quite a while inside Perl, but so far we have mostly
been worried about (1) speed (2) good spreading (3) speed (4) did I mention speed?
DoS has not been our worry so far.

I am by no means a cryptographer or a (discrete) mathematician, but before
making a (yet another) change of the hash function, I would definitely like to
learn a bit more of the art and our options and trade-offs. With my tongue
firmly in my cheek, why should we trust *your* universal hashing function? :-)

A quick googling (for "fast universal hashing") for example shows Krovetz' and
Rogoway's "Fast universal hashing with small keys and no preprocessing​: the PolyR
construction" from 2000. Since the speed of one of Perl's most basic data structures,
ta-dah-- hashes, depends on the speed of the hash function, I really would like the
hash function to be as fast as possible. If we are "secure" against evil hash keys,
but slow as a sloth, I don't know that we did much good.

As you point out, there are other ways to "DoS the language", either by constructing
degenerate data for qsort (the latest Perl does not use qsort anymore, however,
but instead mergesort, which should be rock solid against bad data), and especially
by using exponential regular expressions. (Though of course that is usually more
likely to be caused by silly programmer than by malicious data.)

I am currently (fervently) hoping to get Perl 5.8.1 out soon(ish), and I don't know
that I want to start changing the hash function at this late a stage. We currently
use Bob Jenkins' algorithm, so in theory we could go for keying that (pseudo)randomly
at Perl startup, but you advise against that kind of ad-hoc tweaking, understandably.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 18, 2003

From scrosby@cs.rice.edu

On 17 Jun 2003 19​:46​:38 -0000, Jarkko Hietaniemi (via RT) <perlbug-followup@​perl.org> writes​:

Thanks for your extensive research on this. We have actually
tinkered away at better hash functions for quite a while inside
Perl, but so far we have mostly been worried about (1) speed (2)
good spreading (3) speed (4) did I mention speed? DoS has not been
our worry so far.

I am by no means a cryptographer or a (discrete) mathematician, but
before making a (yet another) change of the hash function, I would
definitely like to learn a bit more of the art and our options and
trade-offs. With my tongue firmly in my cheek, why should we trust
*your* universal hashing function? :-)

No need at all. A full survey of universal hashing was beyond the
scope of the paper. My advice is to wait a few months, after I present
at USENIX. With luck, I'll either learn about new fast functions at
that conference, which I'll reference on the web page, or it may focus
the academic community into either better universal hash functions, or
sufficiently strong functions for pragmatic programs.

A quick googling (for "fast universal hashing") for example shows
Krovetz' and Rogoway's "Fast universal hashing with small keys and
no preprocessing​: the PolyR construction" from 2000. Since the
speed of one of Perl's most basic data structures, ta-dah-- hashes,
depends on the speed of the hash function, I really would like the
hash function to be as fast as possible. If we are "secure" against
evil hash keys, but slow as a sloth, I don't know that we did much
good.

Actually, I did a quick retrofit of perl to use the UHASH library, and
actually got about a 10% speedup on inserting one of the
test-files. (I think that was my P4 laptop. I think my P2-450 got a
15% slowdown.) I advise benchmarking it on real software; I've found
microbenchmarks can sometimes be very meaningless. Also, the paper
shows that the difference between the hash functions we tested is
almost completely dominated by the cost an L2 cache miss. On the
microbenchmark, XOR drops from being 6x as fast to only a bit
faster.

As you point out, there are other ways to "DoS the language",

either by constructing degenerate data for qsort (the latest Perl
does not use qsort anymore, however, but instead mergesort, which
should be rock solid against bad data),

Mergesort should be good.

and especially by using exponential regular expressions.

(Though of course that is usually more likely to be caused by silly
programmer than by malicious data.)

I've yet to find/successfully design an exponential regular expression
so far, but identifying regular expressions that may on 'carefully
chosen' input experience exponential, quartric, cubic, or quadratic
behavior is a topic I am actively researching and currently seeking
related work on. I would like to identify for any regular expression
the worst-case input, and the cost of that input.

Would you happen to have any good references to perl's regexp engine
design or information related to bad-case regular expression.

I am currently (fervently) hoping to get Perl 5.8.1 out soon(ish),
and I don't know that I want to start changing the hash function at
this late a stage.

One thing to keep in mind is that someone else showed on perlmonks[1]
how a carefully chosen CGI POST request would consume about three
minutes of CPU time with 10k inputs; 250kb of data. Were they to send
40k inputs, it would take almost an hour of CPU for the module to
process that megabyte. This is rather severe.

We currently use Bob Jenkins' algorithm, so in theory we could go
for keying that (pseudo)randomly at Perl startup, but you advise
against that kind of ad-hoc tweaking, understandably.

I do, but jenkin's with a random secret key is what is used by the
linux kernel. I know of no guarenteed security properties of the hash
function. I suspect it may have statistical weaknesses, however, they
probably don't matter in this case. It is probably 'good enough'.

If you think the above issue with CGI.pm is severe enough to warrant
5.8.1 being less vulnerable, may I suggest using a keyed variant of
Jenkin's or ---if you want provable security---a universal hash as a
short-term fix, with the possibility or expectation of replacing it
later with something different after a consideration all of the
alternatives? One possibility for later is to perhaps change the
infrastructure to allow a script to declare whether it wants to be
resource-bounded on regexp matching and use a secure hash function or
not.

Scott

[1] http​://www.perlmonks.org/index.pl?node_id=262468

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 18, 2003

From @jhi

Attached is a patch that will patch a current development version of Perl
(or 5.8.1-to-be) to pseudorandomly key the "Bob Jenkins" algorithm currently
used. It seems to help the hash "exploit code" (blowout.pl), unsurprisingly.
That's the good news. The bad news is that Data​::Dumper (a standard Perl
module) now gives different results for different runs of Perl because the output
depends on the ordering of the hash keys. I managed to munge the regression
suite of Data​::Dumper to work with this (amusingly, I used the same trick as we
used for EBCDIC...the curious will check the patch to see how-- the downside
is that the test is not quite that exact anymore).

The neat effect is that now this​:

./perl -le '@​a="a".."z";@​a{@​a}=();print keys %a'

gives a different result for each run. That _really_ should teach people
not to expect any particular ordering of hash keys...

If one needs to emulate the old behaviour, one can set the environment
variable PERL_HASH_INIT.

One can get a 5.8.1-to-be snapshot at least for a while as describd here​:
http​://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-06/msg00475.html

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 18, 2003

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 19, 2003

From scrosby@cs.rice.edu

On 18 Jun 2003 21​:11​:49 -0000, Jarkko Hietaniemi (via RT) <perlbug-followup@​perl.org> writes​:

Attached is a patch that will patch a current development version of Perl
(or 5.8.1-to-be) to pseudorandomly key the "Bob Jenkins" algorithm currently
used.

There's a typo at this line​:

+hash keys would cause Perl to consume large amounts of time because
+internal structure of hashes would badly degenerate. In Perl 5.8. the
  ^^^^^^^
+hash function is randomly perturbed by a pseudorandom seed which makes

Other than that, the patch looks good.

Scott

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 19, 2003

From dwallach@cs.rice.edu

Adding length restrictions to the CGI's input might cause problems for
people with large forms or those uploading files. If you do it, you
probably want to make it a configuration parameter that somebody can
turn on if/when they decide they absolutely need it.

Validating that all the form elements are actually desired by the
application before inserting them in a hash table (your %goodones table)
  absolutely will solve the problem for you. The only question is
whether there are CGI scripts out there that expect form elements to be
there but which are never explicitly named (i.e., can you make sure
%goodones isn't missing anything important). You could potentially
break some existing scripts this way.

The correct answer, of course, is to fix the underlying hash function.
As stop-gap measures, limiting the size of the input and limiting the
input values to desired ones will certainly help address the problem.

Thanks,

Dan

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 20, 2003

From scrosby@cs.rice.edu

On Thu, 19 Jun 2003 17​:05​:19 +0200 (CEST), Tels <perl_dummy@​bloodgate.com> writes​:

Somebody should probably also forward this to the group which is re-writing
Matt Wright (sp?) CGI scrip collection. I forgot their URL and name...
getting old.

The purpose of research is to generate and then disseminate
results. If you can identify anyone else who would be interested in
the paper, go ahead and inform them.

* length restriction​: when taking parameters from the user/browser/client,
restrict the length of the data. For instance, a simple formmail script
doesn't need to accept 40 Meg of data, when all it does is relay a short
text message from a website user.

The catch is that a total input as low as 250kb is enough to DoS a
server for several minutes. A 1MB threshold is enough for almost an
hour. And the server cannot know what a correct limit should be for
all scripts. A limit of 250kb is too small, for instance, to upload
many PDF files.

Drawbacks​: the length reduction should be done *outside* the cgi script.
Easy if you wrote your own server using Net​::Server, hard if it is a script
simple running under Apache. The reason is that once you constructed a Perl
scalar, you might already running out of memory. OTOH, restricting the
length to like 10 Kbyte drwats this attack effectively, right?

Perhaps. 10 KByte means at most a thousand or so attack inputs. Cost
to the victim would be greatly reduced; at most only a few
seconds. Cost to the attacker to generate is moderate, only a week or
so of CPU time. Still a relatively low-bandwidth attack, 10KB of data
to DOS a server for a couple of seconds, but much less interesting.

* strict parameter checking​: While I already implemented this (in a way
that would be too complicated to explain on this margin here), the problem
is that first all parameters are put into a hash, and then checked. The
attack however works when the data goes into the hash, so *boom*

This also works, but requires changing the API.

That would avoid the attack, right?

Probably yes, at least within the CGI script. However, if the
paramaters were inserted into a different hash table (say, in the
webserver), then all bets are off.

Scott

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Sep 4, 2003

From @jhi

Since the hash randomization is in for Perl 5.8.1, I'm marking the problem ticket as
resolved.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Sep 4, 2003

@jhi - Status changed from 'new' to 'resolved'

@p5pRT p5pRT closed this Sep 4, 2003
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant