-
-
Notifications
You must be signed in to change notification settings - Fork 184
Closed
Description
Sample page from a site I am trying to scrape:
https://en.tutiempo.net/records/lemg/1-may-2023.html
Much of the html is dynamically loaded with javascript, but that should work, right?
Inspecting the html-code in Chrome, I see it has hundreds of <td>
-elements.
But the code below gives an empty list.
Anyone can see what the issue is?
I am using htmlunit-2.70.0.
import java.net.URL;
import com.gargoylesoftware.htmlunit.HttpMethod;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.WebRequest;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class Test {
public static void main(String[] args) throws Exception {
String url = "https://en.tutiempo.net/records/lemg/1-may-2023.html";
URL u = new URL(url);
HttpMethod m = HttpMethod.GET;
WebRequest request = new WebRequest(u, m);
WebClient webClient = new WebClient();
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setRedirectEnabled(true);
HtmlPage page = webClient.getPage(request);
System.out.println("page: " + page.getElementsByTagName("td"));
webClient.close();
}
}
Metadata
Metadata
Assignees
Labels
No labels