Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Denial of service when parsing JSON object with keys that have the same hash code #314

Open
plokhotnyuk opened this issue Oct 22, 2018 · 3 comments

Comments

@plokhotnyuk
Copy link

commented Oct 22, 2018

Sub-quadratic decreasing of throughput when number of JSON object fields (with keys that have the same hash code) is increasing

On contemporary CPUs parsing of such JSON object (with a sequence of 100000 fields like below that is ~1.6Mb) can took more than 100 seconds:

{
"!!sjyehe":null,
"!!sjyeiF":null,
"!!sjyej'":null,
"!!sjyfIe":null,
"!!sjyfJF":null,
...
}

Below are results of the benchmark where size is a number of such fields:

[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                         (size)  (value)   Mode  Cnt        Score       Error  Units
[info] ExtractFieldsBenchmark.readArgonaut       1     null  thrpt    5  969947.719 ± 47977.521  ops/s
[info] ExtractFieldsBenchmark.readArgonaut      10     null  thrpt    5  220650.585 ± 27891.347  ops/s
[info] ExtractFieldsBenchmark.readArgonaut     100     null  thrpt    5    7580.773 ±  1298.528  ops/s
[info] ExtractFieldsBenchmark.readArgonaut    1000     null  thrpt    5      71.919 ±     7.111  ops/s
[info] ExtractFieldsBenchmark.readArgonaut   10000     null  thrpt    5       0.444 ±     0.089  ops/s
[info] ExtractFieldsBenchmark.readArgonaut  100000     null  thrpt    5       0.010 ±     0.002  ops/s

Reproducible Test Case

To run that benchmarks on your JDK:

  1. Install latest version of sbt and/or ensure that it already installed properly:
sbt about
  1. Clone jsoniter-scala repo:
git clone https://github.com/plokhotnyuk/jsoniter-scala.git
  1. Enter to the cloned directory and checkout for the specific branch:
cd jsoniter-scala
git checkout argonaut-DoS-by-hashmap-collisions
  1. Run benchmarks using a path parameter to your JDK:
sbt -no-colors 'jsoniter-scala-benchmark/jmh:run -jvm /usr/lib/jvm/jdk-11/bin/java -wi 3 -i 5 .*ExtractFieldsBench.*Argonaut.*'
@kevinmeredith

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2018

@plokhotnyuk - I wasn't able to reproduce it in:

$amm
Loading...
Welcome to the Ammonite Repl 1.0.3
(Scala 2.12.4 Java 1.8.0_121)
If you like Ammonite, please support our development at www.patreon.com/lihaoyi

@ import $ivy.`io.argonaut::argonaut:6.2.2` 
import $ivy.$                            

@ import argonaut._, Argonaut._ 
import argonaut._, Argonaut._

@ case class Foo(num: BigDecimal) 
defined class Foo

@ val str = """{ "num" : 1e1000000000}""" 
str: String = "{ \"num\" : 1e1000000000}"

@ val json = str.parse 
json: Either[String, Json] = Right(JObject(JsonObjectInstance(Map("num" -> JNumber(JsonDecimal("1e1000000000"))), Vector("num"))))

@ val jsonUnsafe = json.right.get 
jsonUnsafe: Json = JObject(JsonObjectInstance(Map("num" -> JNumber(JsonDecimal("1e1000000000"))), Vector("num")))

@ DecodeJson.derive[Foo].decodeJson(jsonUnsafe) 
res14: DecodeResult[Foo] = DecodeResult(Right(Foo(1E+1000000000)))
@kevinmeredith

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2018

Could you please tell me how to, perhaps, modify my example to reproduce the error that you raised in this issue?

@plokhotnyuk

This comment has been minimized.

Copy link
Author

commented Oct 24, 2018

@kevinmeredith It is a case about collisions in a hash map that is used for internal representation of JSON object like here:
playframework/play-json#186
spray/spray-json#277
json4s/json4s#553

Please start from reproducible steps that are in description of the issue, then look in to the benchmark code to see how that messages were build or just add a printing statement to see how them look like.

But, the case that was mentioned by you is also unsafe for users of the parsed message... Just try to pass the parsed value of that big decimal number to the following function and see what will happen:

scala> val f = (x: BigDecimal) => x + 1
f: BigDecimal => scala.math.BigDecimal = $$Lambda$1210/93981118@790ac3e0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.