Skip to content

Java stack overflow while matching cssUrlPattern #12

@sebastian-nagel

Description

@sebastian-nagel

While matching a URL embedded in CSS as url(...) escaped with 8192 single quotes before and after the ExtractingParseObserver causes a stack overflow. See wat_wet_stack_overflow_test.warc.gz for the problematic WARC record.

java.lang.StackOverflowError
        at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4182)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4785)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
        at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
        at java.util.regex.Pattern$Ques.match(Pattern.java:4182)
... (16000 lines stripped)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4604)
        at java.util.regex.Pattern$Start.match(Pattern.java:3461)
        at java.util.regex.Matcher.search(Matcher.java:1248)
        at java.util.regex.Matcher.find(Matcher.java:637)
        at java.util.regex.Matcher.replaceAll(Matcher.java:951)
        at org.archive.resource.html.ExtractingParseObserver.patternCSSExtract(ExtractingParseObserver.java:485)
        at org.archive.resource.html.ExtractingParseObserver.handleStyleNode(ExtractingParseObserver.java:233)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions