Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scala 3 parser can be DoS'ed by JSON object with keys that have the same hash code #416

Closed
plokhotnyuk opened this issue Nov 21, 2022 · 4 comments

Comments

@plokhotnyuk
Copy link
Contributor

Sub-quadratic decreasing of Scala 3 parser throughput when number of JSON object fields (with keys that have the same hash code) is increasing

On contemporary CPUs parsing of such JSON object (with a sequence of 100000 fields like below that is ~1.6Mb) can took more than 5 minutes for Scala 3 parser:

{
"!!sjyehe":null,
"!!sjyeiF":null,
"!!sjyej'":null,
"!!sjyfIe":null,
"!!sjyfJF":null,
...
}

Below are results of the benchmark where size is a number of such fields:

[info] Benchmark                           (size)   Mode  Cnt        Score   Error  Units
[info] ExtractFieldsReading.uPickle             1  thrpt        690803.518          ops/s
[info] ExtractFieldsReading.uPickle            10  thrpt        119127.307          ops/s
[info] ExtractFieldsReading.uPickle           100  thrpt          8654.470          ops/s
[info] ExtractFieldsReading.uPickle          1000  thrpt           118.763          ops/s
[info] ExtractFieldsReading.uPickle         10000  thrpt             0.885          ops/s
[info] ExtractFieldsReading.uPickle        100000  thrpt             0.003          ops/s

uPickle version

2.0.0

Scala version

3.2.1

JDK version

11+

Steps to reproduce

To run that benchmarks on your JDK:

  1. Install latest version of sbt and/or ensure that it already installed properly:
sbt about
  1. Clone jsoniter-scala repo:
git clone --depth 100 https://github.com/plokhotnyuk/jsoniter-scala.git
  1. Enter to the cloned directory:
cd jsoniter-scala
  1. Run benchmarks:
sbt 'jsoniter-scala-benchmarkJVM/jmh:run -wi 1 -i 1 ExtractFieldsReading.uPickle'
@lolgab
Copy link
Member

lolgab commented Dec 12, 2022

Here a self-contained reproduction copy pasted from the jsoniter-scala benchmark:

import upickle.default._

object Main {
  def zeroHashCodeStrings: Iterator[String] = {
    def charAndHash(h: Int): Iterator[(Char, Int)] = ('!' to '~').iterator.map(ch => (ch, (h + ch) * 31))

    for {
      (ch0, h0) <- charAndHash(0)
      (ch1, h1) <- charAndHash(h0)
      (ch2, h2) <- charAndHash(h1) if ((h2 + 32) * 923521 ^ (h2 + 127) * 923521) < 0
      (ch3, h3) <- charAndHash(h2) if ((h3 + 32) * 29791 ^ (h3 + 127) * 29791) < 0
      (ch4, h4) <- charAndHash(h3) if ((h4 + 32) * 961 ^ (h4 + 127) * 961) < 0
      (ch5, h5) <- charAndHash(h4) if ((h5 + 32) * 31 ^ (h5 + 127) * 31) < 0
      (ch6, h6) <- charAndHash(h5) if (h6 + 32 ^ h6 + 127) < 0
      (ch7, _) <- charAndHash(h6) if h6 + ch7 == 0
    } yield new String(Array(ch0, ch1, ch2, ch3, ch4, ch5, ch6, ch7))
  }

  val jsonString =
    zeroHashCodeStrings
      .map(s => ujson.write(s))
      .take(1000000)
      .mkString("{", s":null,", ":null}")

  case class Foo()
  implicit val rw: ReadWriter[Foo] = macroRW[Foo]

  def main(args: Array[String]): Unit = {
    // in any Scala version
    ujson.read(jsonString)
    // Scala 3 only
    upickle.default.read[Foo](jsonString)
  }
}

lihaoyi added a commit that referenced this issue Feb 27, 2023
It appears the pathological behavior is gone, and this test can now pass
successfully on both Scala 2 and Scala 3.

It takes ~2x as long on Scala 3 as Scala 2 (~16s vs ~8s), not obvious
why, but at least the "hangs forever" behavior seems to be gone
@lihaoyi lihaoyi closed this as completed Feb 27, 2023
@lolgab
Copy link
Member

lolgab commented Feb 27, 2023

It would be nice if we could fix the problem for ujson as well. We need to use a Map implementation which is not vulnerable to this attack. This is a great time, since it might be a binary incompatible change. Should I open another issue for ujson?

@lihaoyi
Copy link
Member

lihaoyi commented Feb 27, 2023

sure open an issue. Not sure how feasible it is to change though, we would lose the ordering guarantees that LinkedHashMap gives us which I'm not convinced is a good tradeoff for this edge case. But open an issue and we can discuss there

@plokhotnyuk
Copy link
Contributor Author

plokhotnyuk commented Feb 27, 2023

@lihaoyi @lolgab How about an option of using Java's LinkedHashMap with a Scala wrapper for that?

Circe and dijon JSON parsers use this approach, which also brings a bit greater performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants