Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error downloading multiple pages #16

Closed
hassannasir opened this issue May 25, 2019 · 1 comment
Closed

error downloading multiple pages #16

hassannasir opened this issue May 25, 2019 · 1 comment

Comments

@hassannasir
Copy link

@hassannasir hassannasir commented May 25, 2019

I am trying to scrap some pages with the help of polite() and map(). But I am getting following error:

[[1]]
{xml_document}
Error in nchar(desc) : invalid multibyte string, element 2

And the instead of scrapping all pages in the giving range, it only scraps, the first page over and over for entire loop.

library(polite)
library (rvest)
library(purrr)

dawnsession <- bow("https://www.dawn.com")

dawnsession

dates <- seq(as.Date("2019-04-01"), as.Date("2019-04-30"), by="days")

fulllinks <- map(dates, ~scrape(dawnsession, params = paste0("archive/",.x)) )

links <- map(fulllinks, ~html_nodes(.x, ".mb-4") %>%
      
      html_nodes(".story__link") %>%
      
      html_attr("href"))
@dmi3kno
Copy link
Owner

@dmi3kno dmi3kno commented May 25, 2019

Let me just say that https://www.dawn.com/archive/ is not a scrapeable path. You would have discovered it if you nod()ed to every new path you intend to scrape. The params argument in scrape is not able to reset the path. It was never intended to navigate to a new path, only to re-scrape the page with new query parameters.

Having said that, there seems to be some room to improve output from these function to capture situations like that and correct it by internally routing to nod which would re-check if the path is allowed.

@dmi3kno dmi3kno closed this in 0578bd9 Jun 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.