tidyRSS fails to parse feeds: "xmlXPathEval: evaluation failed" #31

alastairrushworth · 2020-01-11T15:09:57Z

Thanks for the amazing tidyRSS package, I find it very useful indeed! Thought I'd get in touch to file a quick issue as I've noticed that quite a number of feeds don't parse correctly.

For example:

# tested with v1.2.11
library(tidyRSS)
tidyfeed("http://abigailsee.com/feed.xml")

Returns the error:

Error in xpath_search(x$node, x$doc, xpath = xpath, nsMap = ns, num_results = 1) : 
  xmlXPathEval: evaluation failed

I think the feed is ok, and it seems like tidyfeed gathers the feed ok, but something goes awry with the parsing somewhere? I noticed this issue with several other feeds that I've copied below

feed_vec <- 
  c("http://abigailsee.com/feed.xml",
    "https://adamgoodkind.com/feed.xml",
    "http://adomingues.github.io/feed.xml",
    "http://aebou.rbind.io/index.xml",
    "http://agrarianresearch.org/blog/?feed=rss2",
    "http://akosm.netlify.com/index.xml",
    "http://alburez.me/feed.xml",
    "http://alexmorley.me/feed.xml",
    "https://alexwhan.com/index.xml",
    "http://allthingsr.blogspot.com/feeds/posts/default?alt=rss",
    "http://allthiswasfield.blogspot.com/feeds/posts/default?alt=rss",
    "http://almostrandom.netlify.com/index.xml",
    "http://altran-data-analytics.netlify.com/index.xml",
    "https://www.amitkohli.com/index.xml",
    "http://analisisydecision.es/feed/",
    "http://andysouth.github.io/feed.xml",
    "http://annakrystalli.me/index.xml",
    "http://annarborrusergroup.github.io/feed.xml",
    "http://anotherblogaboutr.blogspot.com/feeds/posts/default?alt=rss",
    "http://anpefi.eu/index.xml",
    "https://fishandwhistle.net/index.xml",
    "https://www.ardata.fr/index.xml",
    "http://arnab.org/blog/atom.xml",
    "http://arunatma.blogspot.com/feeds/posts/default?alt=rss",
    "http://asbcllc.com/feed.xml",
    "http://ashiklom.github.io/feed.xml",
    "http://aurielfournier.github.io/feed.xml",
    "http://austinwehrwein.com/index.xml")

I'm working on a side project at the moment that involves about 3K RSS feeds, which I'm happy to share once I've tidied up a bit, it might be helpful with identifying other edge cases - I know how finicky RSS feeds can be! I'm also happy to help with this issue if you can point me in the right direction!

Thanks,

Alastair

The text was updated successfully, but these errors were encountered:

RobertMyles · 2020-01-11T16:33:05Z

Hi Alastair,

Yeah, RSS feeds can be a pain, and I've seen this error a few times. I'm not sure off the top of my head where exactly it pops up. I'll have a look as soon as I can, but if you're interested in contributing, it's probably happening in one of the *_parse functions. I'm trying to clean up a lot of little things in the package for a 1.3 version, so your list will help a lot. In the meantime, I'll leave this open until I can figure out the source of the error.

Rob

RobertMyles · 2020-01-13T17:21:19Z

I had a quick chance to look at this today and with the dev version I'm getting:

> tidyfeed("http://abigailsee.com/feed.xml")
# A tibble: 5 x 5
  feed_link    item_title          item_date_published item_description                 item_link             
  <chr>        <chr>               <dttm>              <chr>                            <chr>                 
1 http://abig… What makes a good … 2019-08-13 00:00:00 "<!--excerpt.start-->\n<p><em>T… http://abigailsee.com…
2 http://abig… Deep Learning, Str… 2018-02-21 00:00:00 "<!--excerpt.start-->\n<html>\n… http://abigailsee.com…
3 http://abig… Four deep learning… 2017-08-30 00:00:00 "<head>\n<script src=\"https://… http://abigailsee.com…
4 http://abig… Four deep learning… 2017-08-30 00:00:00 "<head>\n<script src=\"https://… http://abigailsee.com…
5 http://abig… Taming Recurrent N… 2017-04-16 00:00:00 "<!--excerpt.start-->\n<p><em>T… http://abigailsee.com…

With your vector of feeds, I get:

purrr::map(feed_vec, ~ {
  stfeed <- purrr::safely(tidyfeed)
  ret <- stfeed(.x)
  if (is.null(ret$error)) {
    print("Feed OK")
  } else {
    print("Feed unavailable")
  }
})

[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed unavailable"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed unavailable"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed unavailable"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"
[1] "Feed OK"

I'll try to get 1.3 finished as soon as possible, as you can see, it fixes most of these problems. In the meanwhile, if you'd like to try the dev version, it should help.

alastairrushworth · 2020-01-14T18:52:06Z

Hi Rob - that's perfect, I think that fixes it completely for me. Thanks a lot for that, I'll stick to dev until 1.3.

I'll drop you a note when I've got the long list of feeds tidied up, in case it can help.

Cheers!

RobertMyles · 2020-01-15T09:55:17Z

That would be a help, thanks Alastair.

RobertMyles closed this as completed Jan 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tidyRSS fails to parse feeds: "xmlXPathEval: evaluation failed" #31

tidyRSS fails to parse feeds: "xmlXPathEval: evaluation failed" #31

alastairrushworth commented Jan 11, 2020

RobertMyles commented Jan 11, 2020

RobertMyles commented Jan 13, 2020

alastairrushworth commented Jan 14, 2020

RobertMyles commented Jan 15, 2020

tidyRSS fails to parse feeds: "xmlXPathEval: evaluation failed" #31

tidyRSS fails to parse feeds: "xmlXPathEval: evaluation failed" #31

Comments

alastairrushworth commented Jan 11, 2020

RobertMyles commented Jan 11, 2020

RobertMyles commented Jan 13, 2020

alastairrushworth commented Jan 14, 2020

RobertMyles commented Jan 15, 2020