Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HtmlPage.save() does not create the top-level html when one of the resources comes from a blocked dns entry #57

Closed
lestephane opened this issue Jun 30, 2019 · 2 comments

Comments

@lestephane
Copy link

I use a dns blacklist to prevent various tracking services. It's the standard dnscrypt-proxy blacklist.

When I call save() on an HtmlPage that makes use of a resource hosted at one of the blacklisted domains, the save() call causes an UnknownHostException to be thrown. There is a directory for the page with resources in it, but the top-level html is not created.

I don't care about those trackers or any crap that would have had to be fetched from them, whatever is in the HtmlPage at that moment is good enough for me to have and to save.

Is there a way to ignore such errors, and just save whatever the html page contains?

Exception in thread "main" java.net.UnknownHostException: googleads.g.doubleclick.net: Name or service not known
	at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
	at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)
	at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)
	at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)
	at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)
	at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:394)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
	at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:193)
	at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1402)
	at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1321)
	at com.gargoylesoftware.htmlunit.html.HtmlImage.downloadImageIfNeeded(HtmlImage.java:534)
	at com.gargoylesoftware.htmlunit.html.HtmlImage.getWebResponse(HtmlImage.java:506)
	at com.gargoylesoftware.htmlunit.html.XmlSerializer.getAttributesFor(XmlSerializer.java:273)
	at com.gargoylesoftware.htmlunit.html.XmlSerializer.readAttributes(XmlSerializer.java:184)
	at com.gargoylesoftware.htmlunit.html.XmlSerializer.printOpeningTag(XmlSerializer.java:170)
	at com.gargoylesoftware.htmlunit.html.XmlSerializer.printXml(XmlSerializer.java:109)
	at com.gargoylesoftware.htmlunit.html.XmlSerializer.printXml(XmlSerializer.java:119)
	at com.gargoylesoftware.htmlunit.html.XmlSerializer.printXml(XmlSerializer.java:119)
	at com.gargoylesoftware.htmlunit.html.XmlSerializer.save(XmlSerializer.java:78)
	at com.gargoylesoftware.htmlunit.html.HtmlPage.save(HtmlPage.java:2246)
@rbri
Copy link
Member

rbri commented Jun 30, 2019

I think this is more or less a bug. Will work on this.

@rbri
Copy link
Member

rbri commented Jul 2, 2019

This is fixed now. Many thanks for the report.
Will inform via https://twitter.com/HtmlUnit if a new snapshot is avaliable.

@rbri rbri closed this as completed Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants