Skip to content

CWE-1333: Inefficient Regular Expression Complexity #19513

@vivek807

Description

@vivek807

Description

The product uses a regular expression with a worst-case computational complexity that is inefficient and possibly exponential.

Summary

A DATASOURCE WRITE user can hang Overlord worker threads indefinitely with a single sampler request, degrading or denying service on the cluster control plane.

Root cause

Line numbers pinned to druid-31.0.2@230605ec33db326c37154a03bcc4edfccc40203b.

processing/src/main/java/org/apache/druid/data/input/impl/RegexInputFormat.java:50-60:

public RegexInputFormat(
    @JsonProperty("pattern") String pattern,
    @JsonProperty("listDelimiter") @Nullable String listDelimiter,
    @JsonProperty("columns") @Nullable List<String> columns
)
{
  this.pattern = pattern;
  this.listDelimiter = listDelimiter;
  this.columns = columns;
  this.compiledPatternSupplier = Suppliers.memoize(() -> Pattern.compile(pattern));
}

RegexInputFormat compiles @JsonProperty pattern with no complexity/length limit and applies Matcher.matches() per line. The sampler runs in the Overlord JVM (CliOverlord.java:460); TimedShutoffInputSourceReader only checks the volatile closed flag at iterator boundaries (:89-101) and cannot interrupt an in-progress Matcher.matches(). Attacker also controls timeoutMs and can set it to 0.

Exploit scenario (static hypothesis — unverified):
DATASOURCE WRITE user POSTs sampler spec with InlineInputSource.data='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaX' and RegexInputFormat.pattern='^(.*a){20}$', timeoutMs=0. Overlord thread enters catastrophic backtracking and never returns to the iterator boundary. A few concurrent requests exhaust the Jetty pool.


Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions