-
-
Notifications
You must be signed in to change notification settings - Fork 17
Description
HtmlUnitNekoDOMBuilder
causes odd error in highly concurrent environment such as:
Cannot invoke "Object.hashCode()" because "k" is null
java.lang.NullPointerException: Cannot invoke "Object.hashCode()" because "k" is null
at org.htmlunit.cyberneko.util.FastHashMap.get(FastHashMap.java:92)
at org.htmlunit.cyberneko.HTMLElements.getElement(HTMLElements.java:644)
at org.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2134)
at org.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:914)
at org.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:336)
at org.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:294)
at org.htmlunit.cyberneko.xerces.parsers.AbstractXMLDocumentParser.parse(AbstractXMLDocumentParser.java:79)
at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.parse(HtmlUnitNekoDOMBuilder.java:757)
at org.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parse(HtmlUnitNekoHtmlParser.java:196)
at org.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:300)
at org.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:219)
at org.htmlunit.WebClient.loadWebResponseInto(WebClient.java:682)
at org.htmlunit.WebClient.loadWebResponseInto(WebClient.java:576)
at com.xceptance.xlt.engine.XltWebClient.loadWebResponseInto(XltWebClient.java:1007)
at org.htmlunit.WebClient.getPage(WebClient.java:494)
at org.htmlunit.WebClient.getPage(WebClient.java:403)
When checking the code, this could actually not happen unless something is going on concurrently. HTMLElements
is a class that is not read-only once created, but has state that changes (see getElement(final String ename, final Element element)
). It will update its local caches to reduce lookup cost. But that state is not synchronized because that would render the speed up void.
So, we should not share instances of HtmlElements
across threads but HtmlUnitNekoDOMBuilder
does that by declaring
private static final HTMLElements HTMLELEMENTS;
private static final HTMLElements HTMLELEMENTS_WITH_CMD;
Also for efficiency reasons, but this causes memory-thread-safety issues in return.
I don't have a solution yet and also it is hard to write a test case for it (but is possible) but the review was already enough to find that. The more dynamic the HTML is, with elements that have different tag name casing, the more often that might happen.