Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

relative links get wiped out #58

Closed
elliotf opened this Issue Apr 16, 2013 · 1 comment

Comments

2 participants

elliotf commented Apr 16, 2013

I'm working through it now, but it appears that relative links in entries are getting null()ed out. The feed I'm having issues with is:

http://substack.net/blog.xml

The offending entry is like

<link rel="self" href="/weaning_yourself_off_jquery" />

I'm going to try to make a failing spec then poke around lines https://github.com/danmactough/node-feedparser/blob/master/main.js#L444

If you have any suggestions, please let me know.

Owner

danmactough commented Apr 17, 2013

Thanks for the report. There are several errors with that feed and some omissions that I'm not sure how to deal with (or if I should try).

Nothing is actually getting nulled out -- the feed actually fails to provide the appropriate links:

  • feed section contains no link element whatsoever. It should contain a <link rel="self"... to point to the feed's canonical url and a <link rel="alternate"... to point to the website url
  • entry sections contain <link rel="self"... (which means this xml representation of this post) instead of <link rel="alternate"... (which means the html representation of the post)
  • relative urls are used, but no xml:base is declared to allow the xml parser to resolve the canonical url; feedparser can handle this if you pass the feed's url in the options, e.g.:
request.get('http://substack.net/blog.xml').pipe(feedparser( { feedurl: 'http://substack.net/blog.xml' })).on( ...

I try to follow Postel's Law, so I'll see if some accommodation makes sense. For example, if at the end I see that I have no link but I have an atom:link @rel="self", I could probably safely assume that the author intended that to be a link to the blog post. I could do the same with id if it looks like a url, I guess. Maybe it's not a bad idea to take a pass through once the parsing is "done" and try to intuit which mistakes are meant to mean what. I dunno.

You could contact the author (I'm not going to ping him here -- I don't want to be rude).

For frame of reference, here's what that feed looks like to Firefox:
substack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment