Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSS feeds do not validate #789

Closed
snej opened this issue Jan 12, 2015 · 19 comments
Closed

RSS feeds do not validate #789

snej opened this issue Jan 12, 2015 · 19 comments

Comments

@snej
Copy link

snej commented Jan 12, 2015

The RSS 2.0 feeds generated by Hugo do not validate. For example, look at feedvalidator.org's report on the spf13.com blog.

Sorry
This feed does not validate.

line 9, column 4: Undefined channel element: author [help]

    <author>Steve Francia</author>
    ^
line 18, column 27: Invalid email address: Steve Francia (15 occurrences) [help]

      <author>Steve Francia</author>
                           ^
line 165, column 44: pubDate must be an RFC-822 date-time: Tue, 01 Jul 2014 00:00:00 UTC (13 occurrences) [help]

      <pubDate>Tue, 01 Jul 2014 00:00:00 UTC</pubDate>
                                            ^
In addition, interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

Your feed appears to be encoded as "utf-8", but your server is reporting "US-ASCII" [help]

line 110, column 0: description should not contain relative URL references: /templates/list (14 occurrences) [help]

</description>
line 1619, column 0: Non-html tag: figcaption (4 occurrences) [help]

        &lt;img src=&#34;/media/pingdom-old.png&#34; alt=&#34;Pingdom of for ...

I spent several years working on a feed reader at Apple, and malformed feeds were the bane of my existence. It's not hard to check that you're generating a valid feed; please do so, for the sake of the feed readers!

@anthonyfok anthonyfok added the Bug label Jan 12, 2015
@anthonyfok anthonyfok self-assigned this Jan 12, 2015
@anthonyfok anthonyfok added this to the v0.13 milestone Jan 12, 2015
@anthonyfok anthonyfok removed their assignment Jan 12, 2015
@anthonyfok anthonyfok modified the milestones: future, v0.13 Jan 12, 2015
anthonyfok added a commit that referenced this issue Jan 13, 2015
RSS 2.0 requires the email be listed in `<author>`,
and `UTC` as a timezone is not accepted, but `UT` or `GMT` are.

See #789 for more information.  Thanks to @snej for the report!
@anthonyfok
Copy link
Member

Thank you for reporting this issue, @snej.

For those who are interested, here is what I have learned thus far. According to the RSS 2.0 Specification:

  1. <channel> does not have an <author> element, but <managingEditor> and <webMaster> are available.
  2. <managingEditor>, <webMaster> and <author> all expect an email address followed by an optional full name in parentheses, e.g. geo@herald.com (George Matesky)
  3. <pubDate> etc. must strictly adhere to RFC-822 Date and Time Specification, which accepts UT, GMT or Z as valid and equivalent timezones, but not UTC.

{{ .Date.Format "MST" }} would generate UTC when no time is given in date in the front matter, e.g. date = "2015-01-12". Nowadays, however, hugo new post/test.md would automatically put in the current date/time with timezone, e.g. date = "2015-01-12T14:33:38-07:00", so most users won't see this bug. However, with TZ=GMT hugo new post/test.md, the resulting timestamp would be date = "2015-01-12T14:33:38Z", which becomes UTC in the feed XML file.

I have modified the rss.xml template in hugo/tpl/template_embedded.go accordingly, committed as 700c2b8. Now, the W3 Feed Validation now comes out clean for the RSS feed on my simplistic personal website.

However, the other errors aren't as trivial:

  • Your feed appears to be encoded as "utf-8", but your server is reporting "US-ASCII" [help]

  • line 31, column 0: description should not contain iframe tag (6 occurrences) [help]

  • line 207, column 0: description should not contain relative URL references: /templates/list (4 occurrences) [help]

    </description>
  • line 2407, column 0: description should not contain script tag (3 occurrences) [help]

It would seems that to get feeds of pages with extras like <iframe> and <script> to validate, tags like these will need to be filtered out, and all relative URL references will need to be converted to full URLs, a kind of sanitization run, so to speak.

I am too new to Go and to the Hugo team to tackle that. Should something like this (probably an enhancement rather than a bugfix) be added to Hugo? Please discuss. :-)

@snej
Copy link
Author

snej commented Jan 13, 2015

The UTF-8 vs ASCII issue is a well-known thorny one. It's an HTTP-level problem, so it's really outside the purview of a static site generator. The deploying site has to do something to set the correct Content-Type header when serving the file. It can be done with a .htaccess file if the server is Apache, but there's no general fix.

The limits on the description contents … those aren't really part of the RSS "spec" (such as it is, cough) either. It's not good for security reasons to put those in a data format that's going to be rendered by a feed reader, but in reality, any feed reader has to sanitize the article content itself because it can't trust the feed generator not to put in something dangerous. So it really won't make a difference whether you put those elements in or not; they'll get stripped by any well-written reader.

@snej
Copy link
Author

snej commented Jan 13, 2015

FYI, the feed generated for my website by Hugo v0.12 has different validation errors that aren't listed above:

line 5, column 6: Unexpected uri attribute on generator element [help]

      <generator uri="https://hugo.spf13.com">Hugo</generator>
      ^
line 10, column 4: Undefined channel element: updated [help]

    <updated>Sun, 11 Jan 2015 21:25:58 PST</updated>
    ^
line 758, column 2: Missing channel element: description [help]

  </channel>

@anthonyfok
Copy link
Member

FYI, the feed generated for my website by Hugo v0.12 has different validation errors that aren't listed above

Yes, indeed, though there is currently no mechanism to fix bugs for old stable release, i.e. no v0.12.1 would be made to fix bugs, so we will have to look forward to v0.13, which will hopefully be released soon.

@anthonyfok
Copy link
Member

line 110, column 0: description should not contain relative URL references: /templates/list (14 occurrences)

It turns out other users have reported this specific problem in #725 and #728 too! And already fixed by the reporter @popthestack in commit 7dacc99 22 days ago! (Thank you @popthestack!) So one more step towards perfect RSS feed output! :-)

@snej
Copy link
Author

snej commented Jan 17, 2015

😀 👍

bep added a commit that referenced this issue Jan 19, 2015
@bep
Copy link
Member

bep commented Jan 19, 2015

There is some work involved getting these all valid.

On my site, it complains about script and iframe tags ...

@snej
Copy link
Author

snej commented Jan 19, 2015

@bjornerik: As I mentioned above, the complaints about the [escaped] tags used in the HTML descriptions are pretty minor. Technically a feed's descriptions shouldn't contain "unsafe" HTML, but in the real world it happens and the feed readers can protect themselves against it in one way or another.

@anthonyfok anthonyfok modified the milestones: future, v0.13 Jan 19, 2015
@anthonyfok
Copy link
Member

@snej: Thanks for the note!

Though there is always room for improvement in our strive for perfection, so I think it is likely that we will keep this issue open, but leaving it as a milestone for some future release beyond v0.13, say, v0.50? :-)

@k4rtik
Copy link
Contributor

k4rtik commented Jan 25, 2015

Unsure what remains to be done with RSS validation, most of the validation errors seem to have been fixed with various commits referenced in this issue.

Among my 4 primary feeds at techglider.in, I see a warning with just one as follows, but all 4 do validate on feedvalidator.org:

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

line 178, column 0: description should not contain embed tag [help]

&lt;p&gt;&lt;embed type=&#34;application/x-shockwave-flash&#34; src=&#34;htt ...

As @snej mentions, these warnings should be ignorable from the perspective of hugo development.

@bst27
Copy link

bst27 commented Apr 4, 2015

I also had to replace the {{.URL}} with {{.Url}} in the feed

@bep
Copy link
Member

bep commented Apr 4, 2015

@bst27 what do you mean by having to replace URL with Url? Url is depracated in 0.14, but shouldn't be relevant for 0.13... Unless you use 0.14 docs in 0.13?

@bst27
Copy link

bst27 commented Apr 4, 2015

I copied the given code from the docs and saved it in an rss.xml file (to customize it in a theme).

Hugo gives me:

ERROR: 2015/04/04 template: theme/rss.xml:11: function "safeHTML" not defined
ERROR: 2015/04/04 Error while rendering section articles rss: html/template: "theme/rss.xml" is an incomplete template
ERROR: 2015/04/04 Error while rendering homepage rss: html/template: "theme/rss.xml" is an incomplete template

After removing safeHTML Hugo gives me:

ERROR: 2015/04/04 Error while rendering section articles rss: template: theme/rss.xml:12:23: executing "theme/rss.xml" at <.URL>: URL is not a field of struct type *hugolib.Node
ERROR: 2015/04/04 Error while rendering homepage rss: template: theme/rss.xml:12:23: executing "theme/rss.xml" at <.URL>: URL is not a field of struct type *hugolib.Node

After changing {{.URL}} to {{.Url}} Hugo builds without error.

@bep
Copy link
Member

bep commented Apr 4, 2015

Woopps.

Then some Hugo 0.14 syntax has slipped into the 0.13 docs, sorry for that.

bep added a commit that referenced this issue Apr 4, 2015
@bep
Copy link
Member

bep commented Jun 11, 2016

Is this still the case?

@bep
Copy link
Member

bep commented Jul 21, 2016

What is the status on this?

@bep bep modified the milestones: v0.17, future Jul 21, 2016
@bst27
Copy link

bst27 commented Jul 23, 2016

I do not use Hugo at the moment so I cannot give you some feedback.

@dimo414
Copy link
Contributor

dimo414 commented Jul 24, 2016

According to both FEED Validator and the WC3 validator Hugo now generates valid RSS feeds.

Both services point out an encoding issue (UTF-8 vs. ASCII) but it sounds like the servers are misconfigured, not Hugo.

@bep bep closed this as completed Jul 24, 2016
tychoish pushed a commit to tychoish/hugo that referenced this issue Aug 13, 2017
RSS 2.0 requires the email be listed in `<author>`,
and `UTC` as a timezone is not accepted, but `UT` or `GMT` are.

See gohugoio#789 for more information.  Thanks to @snej for the report!
tychoish pushed a commit to tychoish/hugo that referenced this issue Aug 13, 2017
@github-actions
Copy link

github-actions bot commented Apr 4, 2022

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants