Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIH Reporter RSS feed error #16

Closed
ThatPhageGuy opened this issue Apr 12, 2018 · 2 comments
Closed

NIH Reporter RSS feed error #16

ThatPhageGuy opened this issue Apr 12, 2018 · 2 comments

Comments

@ThatPhageGuy
Copy link

Hi DataWookie,

I'm new to R, so I apologize if I'm repeating an issue that has already been addressed. I'm trying to parse info from the NIH Reporter RSS feed, which is in an XML format. Here's the code I'm trying to use:
library(feedeR) library(XML) library(tidyverse) Test <- feed.extract("https://projectreporter.nih.gov")

I've tried without loading in XML and tidyverse (they're for other functions I'm hoping to do later), and I'm still getting the same error messages. It's a rather extensive list, but here's a short subset:

attributes construct error
Couldn't find end of Start Tag a line 2103
Opening and ending tag mismatch: u line 2103 and a
Opening and ending tag mismatch: b line 2103 and u
Opening and ending tag mismatch: li line 2103 and b
Opening and ending tag mismatch: head line 8 and li
AttValue: " or ' expected
attributes construct error
Couldn't find end of Start Tag a line 2103
Opening and ending tag mismatch: u line 2103 and a
Opening and ending tag mismatch: b line 2103 and u
Opening and ending tag mismatch: li line 2103 and b
Opening and ending tag mismatch: html line 7 and li
Extra content at the end of the document
Error: 1: Opening and ending tag mismatch: meta line 19 and head
2: Opening and ending tag mismatch: img line 73 and div
3: Entity 'nbsp' not defined
4: xmlParseEntityRef: no name
5: Entity 'nbsp' not defined
6: Opening and ending tag mismatch: input line 127 and legend
7: Opening and ending tag mismatch: legend line 124 and fieldset
8: EntityRef: expecting ';'
9: Opening and ending tag mismatch: input line 137 and form
10: Opening and ending tag mismatch: fieldset line 123 and li
11: Opening and ending tag mismatch: form line 122 and ul
12: Opening and ending tag mismatch: img line 267 and a
13: Opening and ending tag mismatch: img line 268 and a
14: Opening and ending tag mismatch: img line 269 and a
15: Opening and ending tag mismatch: a line 269 and div
16: Opening and ending tag mismatch: a line 268 and div
17: Opening and ending tag mismatch: a line 267 and li
18: Opening and ending tag mismatch: div line 263 and ul
19: Opening and ending tag mismatch: div line 252 and li
20: En

Any chance you can help to sort this out? I'm not really sure where to begin, all I can tell is it seems like the different columns being pulled may not be matching up properly.

@datawookie
Copy link
Owner

Hi,

Thanks for the message. I have just tried to access https://projectreporter.nih.gov in my browser and found that it's a normal web page rather than a RSS feed. Unless I am missing something... There is, however, a link to a RSS feed on that page. Is that what you are looking for?

Please shout if you have further issues (for example, the linked RSS page not parsing due to weird date formats, which is pretty common!).

Thanks,
Andrew.

@datawookie
Copy link
Owner

Actually, I just checked and it has yet another date format. I have updated the package. Please install again (using the version on GitHub!) and try again.

library(feedeR)

feed <- feed.extract("https://projectreporter.nih.gov/_readers/new_RePORTER_projects.xml")

That works for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants