Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Date.parse() falls down a lot #53

Closed
russellbeattie opened this Issue · 5 comments

3 participants

Russell Beattie Dan MacTough rdbcci
Russell Beattie

Hey Dan,

Thanks for this project! I'm using it for http://magnet.io and it's saving me a lot of effort.

Maybe the project should include moment.js or another date parsing library for the pubdate parsing. There's a lot of feeds where the various date entries (dc:date, pubdate, etc.) just aren't detected correctly and thus return a null. I ran into this example last night:

http://www.spinner.ca/rsscanada.xml

The date is formatted like:

"2013-03-17T12:44:00 00:00"

Which is almost but not quite what Javascript likes - that space at the end causes Date.parse() to fail, where a "-" would be fine. There's lots and lots of other badly formatted dates, but this shows how touchy Date.parse() can be. Moment seems to be able to parse it well enough, though really something more like Simplepie's date parsing script might be called for:

https://github.com/simplepie/simplepie/blob/master/library/SimplePie/Parse/Date.php

If I get around to converting that to Javascript, I'll send you a pull request. ;-)

As a workaround now, when I see a null, I dive into the meta object and see if moment.js can parse the date and it seems to be working well enough, but it should probably be in your library directly.

Thanks again!

-Russ

Dan MacTough
Owner

Thanks, Russ. I'm on the fence about this. I have this personal reaction, "If the publisher can't be bothered to publish valid data, then wtf do I care?!" But, feedparser is a library, not my application, so I can see where being more liberal in what we receive is the way to go here. If I can simply plug in moment.js and have "slightly invalid" dates work (such as in your example feed), then I'm cool with that. SimplePie's script is ridiculous, though.

rdbcci

At the least, in these edge cases, it seems best to not throw away original data in case somebody wants it.

Russell Beattie

Just to get some real numbers, I went through the 14,178 feeds I have right now and only 149 of them returned null in the pubdate. Rechecking those feeds gave me only a handful that were badly formatted, the rest are just non-existant.

So sorry - it doesn't seem to be as big of a deal as I initially thought. I ran into the spinner.ca thing early and assumed it would be more common, but it doesn't look that way. I don't think it's a an issue any more.

Thanks again!

Dan MacTough
Owner

@russellbeattie Russ, thanks for following up. I still wouldn't object if moment could make the date parsing more reliable.

Dan MacTough
Owner

@rdbcci The original data is always in the un-normalized property. For example, with an Atom 1.0 feed, each post's "pubdate" property would have the normalized Date object (unless it cannot be parsed), and the "atom:published" property would retain whatever string was in the feed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.