I am seeing the exception:
java.lang.ArrayIndexOutOfBoundsException: -1
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$CurrentEntity.read(HTMLScanner.java:1901)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanAttribute(HTMLScanner.java:3075)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanAttribute(HTMLScanner.java:2900)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanStartElement(HTMLScanner.java:2747)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2127)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:937)
at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:443)
at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:394)
at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.parse(HtmlUnitNekoDOMBuilder.java:758)
at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parse(HtmlUnitNekoHtmlParser.java:236)
at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parseHtml(HtmlUnitNekoHtmlParser.java:179)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:280)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:163)
When trying to parse the following file: https://gist.github.com/jzheaux/18f32257c66a02f95c6f0f9243a913ae
Or, I've got a test here to reproduce:
@Test
public void test() throws Exception {
HTMLConfiguration htmlConfiguration = new HTMLConfiguration();
String content = "<html blah=\"" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfunfun" +
"funfunfun&fin\"></html>";
InputStream byteStream = new ByteArrayInputStream(content.getBytes());
XMLInputSource inputSource = new XMLInputSource("", "", "", byteStream, "UTF-8");
htmlConfiguration.parse(inputSource);
}
I believe the problem is with this commit, which tries to rewind after it has read ahead to look for an entity.
The rewind here, if performed soon after the fCurrentEntity refreshes its buffer, could rewind past the beginning, setting the offset to a negative value:
if (match == null) {
// we can't rewind if at EOF
if (nextChar != -1) {
final String consumed = str.toString();
fCurrentEntity.rewind(consumed.length() - 1); // <-- here
str.clear();
str.append('&');
}
}
This logic here:
private void rewind(int i) {
offset -= i;
characterOffset_ -= i;
columnNumber_ -= i;
}
may be problematic since it can set offset to what appears to be an invalid value.
I am seeing the exception:
When trying to parse the following file: https://gist.github.com/jzheaux/18f32257c66a02f95c6f0f9243a913ae
Or, I've got a test here to reproduce:
I believe the problem is with this commit, which tries to rewind after it has read ahead to look for an entity.
The rewind here, if performed soon after the
fCurrentEntityrefreshes its buffer, could rewind past the beginning, setting the offset to a negative value:This logic here:
may be problematic since it can set
offsetto what appears to be an invalid value.