Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facebook's RSSes aren't displayed correctly #754

Closed
ghost opened this issue Jan 17, 2015 · 17 comments
Closed

Facebook's RSSes aren't displayed correctly #754

ghost opened this issue Jan 17, 2015 · 17 comments

Comments

@ghost
Copy link

ghost commented Jan 17, 2015

One picture is worth a thousand words :

frss

Here is the RSS that causes problems :

https://www.facebook.com/feeds/page.php?format=rss20&id=166081103477984

@aledeg
Copy link
Member

aledeg commented Jan 17, 2015

It looks like an encoding problem. When I open the feed with Firefox, I have the same problem. I think it is not specific to FRSS.
I checked also the feed on the W3C validator and it is not valid. It is enough valid to be parsed though.

@marienfressinaud
Copy link
Member

It is enough valid to be parsed though.

Yes and SimplePie demo and Liferea work great: that's why I ask @AugierLe42e to open a ticket. I didn't inspect code deeper but it could be a problem from FRSS. By the way, this problem reminds me another but I cannot find it in ticket archive.

@Alwaysin
Copy link
Contributor

Maybe this one : #661 ? It is still opened.

@Alkarex
Copy link
Member

Alkarex commented Jan 17, 2015

This RSS2 feed uses a CDATA container together with HTML-entities in the <title> section which is supposed to be text-only (see comment below), not HTML. As far as I can see, the result in FrehsRSS is as per the specifications (XML, RSS2).
Note however that I cannot immediately find an official reference stating explicitly that <title> is text-only and not HTML (except the note from the W3C validator, which is already quite a good reference). This is what I lack to be sure.
If <title> is indeed text-only, then I do not believe it is possible to support this feed without breaking other valid feeds and without doing an exception for Facebook.

There is also the Atom 1.0 version of the same feed: https://www.facebook.com/feeds/page.php?format=atom10&id=166081103477984
In this case it is clear, and the Facebook Atom feed is wrong: http://tools.ietf.org/html/rfc4287#section-3.1.1

If the "type" attribute is not provided, Atom Processors MUST behave as though it were present with a value of "text"

@ghost
Copy link
Author

ghost commented Jan 17, 2015

Here is another one causing problem :

http://www.atterres.org/rss.xml

@Alkarex
Copy link
Member

Alkarex commented Jan 19, 2015

Similar problem with http://www.atterres.org/rss.xml : among the many errors, this feed is using HTML in <title>. http://validator.w3.org/feed/check.cgi?url=http://www.atterres.org/rss.xml

@ghost
Copy link
Author

ghost commented Jan 19, 2015

Would this be difficult to detect and sanitize the title ? E.g by using regexp ?

@Alkarex
Copy link
Member

Alkarex commented Jan 19, 2015

@AugierLe42e The problem would be to avoid breaking other valid feeds (it is a valid use-case to want to write e.g. &amp; in a title, and this would not be possible anymore if we blindly decoded everything). Currently, what I am lacking is an authoritative reference of whether RSS2's <title> is plain-text or HTML, in addition to what is told by the validator.

Alkarex added a commit that referenced this issue Jan 20, 2015
Caused searches such as "intitle:&amp;" to fail after paging, and
possible XSS vulnerabilities.
Discovered during #754
@ghost
Copy link
Author

ghost commented Mar 16, 2015

Any news from this ? This is really annoying.

@Alkarex
Copy link
Member

Alkarex commented Mar 16, 2015

I will look at it. For reference, it relates to SIMPLEPIE_CONSTRUCT_MAYBE_HTML in SimplePie https://github.com/FreshRSS/FreshRSS/blob/beta/lib/SimplePie/SimplePie/Sanitize.php#L250

@Alkarex Alkarex self-assigned this Mar 16, 2015
Alkarex added a commit to Alkarex/FreshRSS that referenced this issue Mar 16, 2015
FreshRSS#754
Needs to check with many feeds to see if this does not introduce
incompatibilities with some valid feeds.
@Alkarex
Copy link
Member

Alkarex commented Mar 16, 2015

Here is a patch: #813
Please give it a try.
We will have to check with many feeds to ensure that this change has not broken any previously-supported valid feed.

@ghost
Copy link
Author

ghost commented Mar 16, 2015

I just made the change. I'll let you know when I have feeds update if it worked. Thanks.

Edit : Change was successful.

github3

Alkarex added a commit to Alkarex/FreshRSS that referenced this issue Mar 21, 2015
@Alkarex
Copy link
Member

Alkarex commented Mar 21, 2015

@AugierLe42e I have made a new version, which now always decode XML-escaped content of type SIMPLEPIE_CONSTRUCT_MAYBE_HTML. It should have a more stable and predictable behaviour. Please test if you can #813 (comment)

@Alkarex
Copy link
Member

Alkarex commented Apr 5, 2015

BTW, there is a bug report in SimplePie simplepie/simplepie#350 so we could consider sending this patch upstream if it works well.

@Alkarex
Copy link
Member

Alkarex commented Apr 5, 2015

So, it seems to work fine, but feel free to re-open in case of problem.

@Alkarex Alkarex closed this as completed Apr 5, 2015
Alkarex added a commit to Alkarex/simplepie that referenced this issue Apr 5, 2015
Patch for simplepie#350

It is quite common for feeds (e.g. Facebook as reported above) to have
some MAYBE_HTML section HTML-encoded, which is currently not handled by
SimplePie
```xml
<title><![CDATA[ L&simplepie#39;alpha 11 est arriv&#xe9;e...]]></title>
```

This is the approach currently used (with success) in FreshRSS:
FreshRSS/FreshRSS#754
FreshRSS/FreshRSS#813
@Alkarex
Copy link
Member

Alkarex commented Apr 6, 2015

simplepie/simplepie#400

@ghost
Copy link
Author

ghost commented Apr 6, 2015

Yeah, work fine now. Thanks !

@Alkarex Alkarex added this to the 1.1.1 milestone May 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants