ERROR: Couldn't resolve host name #13

gordchan · 2016-03-07T07:38:05Z

I had just installed this package, and tried to read a local html file:

require(htmltab)'

file <- file.path("Test", "01.html")
test <- htmltab(file)

However the following error was returned:
'Error in curl::curl_fetch_memory(url, handle = handle) : Couldn't resolve host name'

Not sure why curl is involved or did I missed something?

Thanks!

The text was updated successfully, but these errors were encountered:

crubba · 2016-03-07T08:19:05Z

It certainly shouldn't use curl for that operation.
Does that error appear when reading other html files from the local hard drive? What's the value of file? Have you tried parsing the file first (XML::htmlParse(file)) and then passing it to htmltab?

gordchan · 2016-03-07T09:23:30Z

I found out that I could only read directly from an URL.

Even if I download the html from https://en.wikipedia.org/wiki/Demography_of_the_United_Kingdom and read from the local copy I would get the same error message.

I should mention that I am using my R on Rstudio server. But I have never experienced this error before. Not even with the XML package.

NB. I have never tried htmlParse before, I'll see if it would work.

gordchan · 2016-03-08T09:05:14Z

Thanks for the tip Christian!

I have parsed the html by htmlParse() before passing it to htmltab().
Now I could read the file without the curl error. However I keep getting weird dataset after reading in the html table.

Now that if I run this html file through the codes:
html.txt

        html2 <- file.path("Test", "html.html")
        parse2 <- htmlParse(html2)
        test2.xls <- htmltab(parse2, which = 4, header = 1:2)

All of the data columns are read as the column names:

crubba · 2016-03-09T00:23:37Z

So, the problem with this table is that it is not very well constructed. A row tag (tr) that opens in the beginning only closes at the very end, and this makes the job very hard for htmltab. I will be looking into ways to detect and correct such malformedness in the future. For the moment, you can suppress the construction of a header by setting header = 0; that should help a bit.

htmltab(parse2, which = 4, header = 0)

It still shreds the last column though. Maybe rvest's html_table function does a better job here.

gordchan · 2016-03-09T01:49:37Z

I see the problem. Thanks.

After some trial-and-error I've got the best out of the html with this:

test2.xls <- htmltab(parse2, which = "//table[@id='datatable']", header = 1:2, complementary = FALSE)
names(test2.xls) <- c(1:5)

This is workable with some processing:

Thanks again :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR: Couldn't resolve host name #13

ERROR: Couldn't resolve host name #13

gordchan commented Mar 7, 2016

crubba commented Mar 7, 2016

gordchan commented Mar 7, 2016

gordchan commented Mar 8, 2016

crubba commented Mar 9, 2016

gordchan commented Mar 9, 2016

ERROR: Couldn't resolve host name #13

ERROR: Couldn't resolve host name #13

Comments

gordchan commented Mar 7, 2016

crubba commented Mar 7, 2016

gordchan commented Mar 7, 2016

gordchan commented Mar 8, 2016

crubba commented Mar 9, 2016

gordchan commented Mar 9, 2016