Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly DoS #198

Closed
himura opened this issue Apr 28, 2014 · 6 comments · Fixed by #202
Closed

Possibly DoS #198

himura opened this issue Apr 28, 2014 · 6 comments · Fixed by #202

Comments

@himura
Copy link

himura commented Apr 28, 2014

When parsing untrusted document, FromJSON instances of integer types, is not safe.

It is not surprising that the FromJSON Integer is not safe, but it produce the same result even in Int, Int32 and Int64.

The below example eats a lot of memory, and killed by OOM Killer.

import Data.Aeson
import Data.Int

main = do
  let Just [n] = decode "[1e1000000000]" :: Maybe [Int32]
  print $ n > 0
@yuga
Copy link

yuga commented Apr 28, 2014

instance Read Integer had a similar issue with exponential notation. https://ghc.haskell.org/trac/ghc/ticket/5688

@bos
Copy link
Collaborator

bos commented Apr 28, 2014

I've reported this to @basvandijk as basvandijk/scientific#10.

@basvandijk
Copy link
Member

I'm on it.

This is caused by the aeson parser using rational which converts a Scientific value to the desired fractional value (in this case to a Scientific again) using realToFrac. This causes a big Integer to be allocated which will fill all memory.

To solve it, besides providing rational:

rational :: Fractional a => Parser a
rational = scientifically realToFrac

attoparsec should provide:

scientific :: Parser Scientific
scientific = scientifically id

which avoids using realToFrac so no big Integer is allocated.

Next aeson should use this parser instead of rational and in the FromJSON instances for bounded integrals (Int, Int8, Int16, Int32, Int64, Word, Word8, Word16, Word32 and Word64) it should detect when the exponent is too large and fail the parse in that case:

instance FromJSON Int where
    parseJSON = parseBoundedIntegral
    {-# INLINE parseJSON #-}

parseBoundedIntegral :: Integral a => Value -> Parser a
parseBoundedIntegral = withScientific "Integral" $ \s ->
    if Scientific.coefficient s == 0
      then pure 0
      else
        if Scientific.base10Exponent s > limit
          then fail "to big"
          else pure $ floor s
  where
    limit = 20 -- (10^20 :: Integer) > fromIntegral (maxBound::Word64)
{-# INLINE parseBoundedIntegral #-}

I will push some pull requests first thing tomorrow.

basvandijk added a commit to basvandijk/attoparsec that referenced this issue Apr 29, 2014
@basvandijk
Copy link
Member

@bos could you release attoparsec with the new scientific parser? Then we can patch aeson to use it instead of rational. This is the first step of fixing this bug.

basvandijk added a commit to basvandijk/aeson that referenced this issue Apr 30, 2014
Integral types can now be parsed safely (bounded space usage).

Parsing fractional types (Float and Double) still allows an attacker to
fill up the memory of the target system by supplying a number with a
large exponent. Fixing this is on my TODO list.
@basvandijk
Copy link
Member

I have fixed this for parsing Integral types in my fix-DoS branch.

Note that parsing Fractional types (Float and Double) still allows an attacker to fill up the memory of the target system.

Some interesting facts discovered during the process of fixing this:

  • This bug has existed since aeson-0.2.0.0. Basically since the release that introduced the Number type. So it's not caused by the move to Scientific.

  • I found a related bug in GHC(i): When you type in a Rational literal with a big exponent like:

    > let e = 1e1000000 :: Rational
    ...

    GHCi will quickly fill up all memory! This is caused by the code generator generating code for creating a literal Integer with the value 1*10^1000000. You can see the bytecode generated by invoking GHCi with -ddump-bcos. A similar problem occurs when compiling a program with GHC.

So the last thing remaining is fixing the parser for fractional values.

@basvandijk
Copy link
Member

This should be fixed by #202.

mamash pushed a commit to TritonDataCenter/pkgsrc-wip that referenced this issue Sep 6, 2014
changelog:
0.12.1.2

* Fixed the incorrect tracking of capacity if the initial buffer was
  empty (haskell/attoparsec#75)

0.12.1.1

* Fixed a data corruption bug that occurred under some circumstances
  if a buffer grew after prompting for more input
  (haskell/attoparsec#74)

0.12.1.0

* Now compatible with GHC 7.9

* Reintroduced the Chunk class, used by the parsers package

0.12.0.0

* A new internal representation makes almost all real-world parsers
  faster, sometimes by big margins.  For example, parsing JSON data
  with aeson is now up to 70% faster.  These performance improvements
  also come with reduced memory consumption and some new capabilities.

* The new match combinator gives both the result of a parse and the
  input that it matched.

* The test suite has doubled in size.  This made it possible to switch
  to the new internal representation with a decent degree of
  confidence that everything was more or less working.

* The benchmark suite now contains a small family of benchmarks taken
  from real-world uses of attoparsec.

* A few types that ought to have been private now are.

* A few obsolete modules and functions have been marked as deprecated.
  They will be removed from the next major release.

0.11.3.0

* New function scientific is compatible with rational, but parses
  integers more efficiently (haskell/aeson#198)

0.11.2.0

* The new Chunk typeclass allows for some code sharing with Ed
  Kmett's parsers package: http://hackage.haskell.org/package/parsers

* New function runScanner generalises scan to return the final state
  of the scanner as well as the input consumed.
basvandijk added a commit that referenced this issue May 11, 2015
The old rational parser applied realToFrac (fromRational . toRational)
to the Scientific number to construct a new Scientific number. This had
the disadvantage that scientific numbers with big exponents like
1e1000000000 are converted from (Scientific 1 1000000000) to
(Scientific (1*10^1000000000) 0). If the Integer coefficient of the
latter is later evaluated it will allocate all memory.

This is the first step needed to fix #198.
basvandijk added a commit that referenced this issue May 11, 2015
Integral types can now be parsed safely (bounded space usage).

Parsing fractional types (Float and Double) still allows an attacker to
fill up the memory of the target system by supplying a number with a
large exponent. Fixing this is on my TODO list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants